Not that Smart, not that Agent

From hands-on to hindsight

Jul 21, 2025

Three years ago, I set out to build [to some extent] autonomous AI systems. Through increasingly ambitious experiments—from project estimation to customer support to full e-commerce automation— I discovered that the gap between AI's theoretical capabilities and practical reality remains stubbornly wide.

My journey reveals three challenges that current AI systems haven't overcome:

Hidden Human Scaffolding: Even "autonomous" systems require extensive human architecture, prompting, and oversight.
The Autonomy Paradox: The more autonomous we try to make AI systems, the more complex human work we create.
The Cost of Almost-Intelligence: Building systems that appear intelligent demands significant investment in infrastructure, training, and ongoing maintenance, often exceeding the cost of traditional solutions

What follows is a practical examination of where AI truly adds value and where it creates new forms of work rather than eliminating them.

Simulating professionals: Speed and Comfort

When I first gained access to the initial ChatGPT models in 2023, I was astonished by the outputs and the natural flow of the relationship you could establish with it.

My first real test of AI's capabilities came at an e-commerce consulting agency where I worked as a Project Delivery Manager. The agency's profitability hinged on accurate project estimations: get them wrong, and we'd work at a loss.

When OpenAI launched ChatGPT, the agency owner, decided to get the ChatGPT premium subscription and share the credentials with some team members, including me. His rationale was, “... you either learn how to use AI or you’ll get left behind.”

So, we started using it.

First, to help craft a quote for an online store. We needed to estimate the effort required to orchestrate various Shopify apps, together with custom code for other bits, and pixel-perfect design.

We wanted ChatGPT to estimate the time and material effort of developing that tech-solution, including challenges, and provide realistic deadlines we could meet.

This kind of worked.

We managed to deliver very detailed and plausible estimations.

But getting these estimations wasn’t an effortless job. We had to prompt, verify, validate, and ensure quality. And this took a similar amount of time. We tend to imagine about LLMs (Large Language Models) and our relationship with them as effortless.

The results were plausible, but not revolutionary. What was interesting, though, was the psychological effect:

These estimations provided a sense of assurance. The ChatGPT-generated predictions gave the team and the owner a sense of relief. If he was hoping, in a way, to outsource the responsibility, these estimates provided an artificial stamp of approval and, in this way, a kind of comfort.

The experiment revealed our first paradox: while the AI didn't improve accuracy or save time, it provided psychological comfort, at a cost.

I think what we experienced using GPT was an interesting, fast iteration to get where we thought (and I highlight, we, the human beings with technical knowledge and experience), was good enough to obtain what we needed, a new client and a production horizon.

LLMs APIs and Agentic AI: the customer support case

My next experiment came at a Colombian Crypto Fintech where, as Chief Growth Officer, I helped build an AI customer support system. The CEO — convinced that technology could solve their scaling problems— personally architected an agent using n8n and OpenAI APIs.

By the end of the year, the agent was taking care of between 60%-70% of customer support queries.

That sounds amazing, but it wasn’t: we had created an innovative delivery method for FAQs through WhatsApp - nothing more.

The AI agent could handle these queries precisely because most questions were basic and repetitive. Anything complex, like onboarding or technical issues, was escalated to human support.

But even that FAQ case brought an additional problem: loyalty.

I found out that the agent recommended a customer to buy products from another company, one of our direct competitors. This happened in our production environment, not in any sandbox.

It is possible to deal with this by improving your prompt, in this case, your systemic prompt, up until you face the context window limitation. Even then, it did not always work.

The system revealed our second paradox: the more we tried to make the AI autonomous, the more oversight it required. Every expansion of capability brought new risks—from hallucination to disloyalty—that demanded human intervention.

My verdict on this case is that the agent didn’t deliver the agentic promise: it didn’t take any agency, it regurgitated what we taught it with fancy expressions and often off cultural tone1.

A renewed e-commerce experiment with Agentic AI

For my final experiment, I partnered with a full-stack developer who owned an online store with 300+ products. Our goal: build an AI system to handle the entire customer buying journey.

We chose this use case because the customer journey of an e-commerce store, while globally standardized, varies for each business. In her case, she runs her store as a side business with just one person taking care of any customer request, from consulting about the catalogue to refunds.

Wouldn't it be perfect to have an agent taking care of the vast majority of your side business?

Yes, it would. That is what we thought as well.

We built three agents in orchestration: a triage system for incoming WhatsApp messages and two specialized tools handling different phases of the customer journey.

The agent was ‘smart’ (or very well structured by us) to understand what the customer wanted by asking two to three questions and suggesting up to three products from the entire catalogue. It not only recommended products, but it also delivered photos of the products, and even used e-commerce tactics such as scarcity and urgency to invite customers to buy, depending on the inventory. As a result, it reduced customers’ cognitive load of choosing what to buy and improved conversion rates.

The results looked impressive on the surface: the accuracy in recommending products was good.

We had, in fact, managed to create a fictional sales representative with a blend of architecture, prompting, process understanding, customer experience design, and a human-first approach. An achievement!

We still failed to create a fully autonomous agent capable enough not to depend on humans.

The full customer journey agency wasn’t possible, both because of costs and real capacity to detect human intent.

The most challenging aspect proved to be the triage agent: the system responsible for understanding human intent and routing conversations appropriately. Despite our best efforts, we couldn't achieve perfect accuracy in intent detection. While we could have created three separate specialized agents, we wanted a more elegant solution with a single intelligent triage system.

We discovered that fine-tuning OpenAI's models on our specific dataset could dramatically improve accuracy. However, this solution came with significant drawbacks. Fine-tuning requires extensive training data, considerable time investment, and ongoing costs. More importantly, it creates a form of vendor lock-in: once you've invested in training a model on OpenAI's platform, that trained model remains their intellectual property. You're essentially building critical business infrastructure on rented land.

A preliminary and well-informed inference from this last experience is that AI corporations all around the globe need businesses, consultants, and capable professionals to believe that adopting AI is a matter of life or death. They are asking your primary instincts of survival to install the necessary belief for mass-adoption. They also need you, or your business, to find real-use-cases while they get computationally cost-efficient. They succeeded in installing that belief.

I can tell it is not survival-necessary; we don’t need to be extremely frightened about AI replacing us. Take it from somebody who’s been testing AI to some extent in real-life professional environments and working to find use cases in the quest for valuable impact.

Is that it?

After three years of chasing AI's promise, I've reached a sobering conclusion.

What appeared "smart" wasn't true intelligence or agency – it was simply the product of careful human architecture, precise prompting, and operation within a narrow, standardized domain.

Each of my experiments showed that behind every "autonomous" AI system lies a complex web of human effort: often more costly and complicated than the problems they're meant to solve.

In spite of the seduction of the capabilities of AI I’m now more convinced that what’s more valuable and what will be more valuable in the near-mid future is what’s human: friction, ownership, critical thinking, struggle, conflict, creativity, empathy, and emotional management.

Thinking deeply, non-action thinking about where AI is going to add real value in our daily life and professional spaces. These are more important now than just jumping into the hype and starting to build non-sense that you can easily solve with better processes, simple existing tools, or by hiring a human HR manager to help you hire better.

We are the ones supposed to be in charge of designing our future. Why are we letting LLMs and AI dispossess us of our instinct, intuition, cognitive capacity, and sometimes even our creativity?

For now, I have this piece of advice: please don’t think AI is going to make your company, your professional life, or anything in your daily life easier or more profitable just like that.

If you feel this, just contact us; that’s your brain and humanity melting down. No, just kidding, but take it seriously, it requires discernment and a lot of experimentation. Keep your qualities and talents intact, believe in yourself and your criteria, and try to potentiate with this.

It's not gonna be easy, though.

In Latin America, we call the customization of products to our cultural context tropicalization, and this was not out of the box. So we prompted our way to that with the limitations of the context window and the increased cost of sending more tokens, the more you expand on it. An expanded context means a higher cost for each query you send because the agent will always consult the system's prompt.

Rayol AI Solutions

Jul 22

Thank you for this piece Sebastian. I completely agree with the fact, that we try to make things apparently easier and more effcient by building costly, resource-consuming, and complex technology to solve problems that a) not necessarily matter; b) to create new amazing processes (on top of those that are long-time broken instead of re-engineering what is there), forgetting that great models cannot fix broken processes and c) use technology to solve problems that do not necessarily require technology but human critical thinking skills, empathetic behaviour and just common sense. After 20+ years in tech solving business efficiency challenges, I have learnt that it’s not always about more tech, it’s about smarter choices, and more intentional use of it keeping the humans at the center of ethical design, implementation and deployment. Technology is a powerful tool, but it won’t solve all our human problems unless it's designed with care, intention, and a clear focus on solving what truly matters.

Expand full comment

FRED GRAVER

Jul 21

Loved this take... but it also forced me to sit down and articulate why I write with AI every day. In short -- the bugs for most are features for writers and creatives.

https://aiwritersroom.substack.com/p/why-i-write-with-ai?r=5sl6

1 reply

5 more comments...

Phi / AI

Discussion about this post