Chatbot, copilot, or agent?
Chatbots respond.
Copilots help.
Agents do work within boundaries.
An AI agent is a system that does part of a business process on its own: it reads a ticket, an invoice or a record, picks the next step and works in your tools (CRM, inbox, spreadsheets). It operates inside boundaries you set, and hands anything it is not sure about to a human. This page is your test: how to tell a real agent before someone sells you a repackaged chatbot.
In short
- An AI agent runs the process itself: it chooses the next step and uses tools in a loop, until it reaches the goal or hands the case to a person.
- A chatbot answers, a copilot assists, an automation runs on a fixed path. What sets an agent apart is that the model picks the next step, within the boundaries you set.
- An agent pays for itself only where the next step depends on the content of the case; the rest is handled more cheaply by a script or an app.
- An "agent" is not a marketing label but a system that passes seven checks: work, context, tools, boundaries, escalation, measurement, trace. Miss one and it is not an agent yet.
What an agent is made of
Agent =work+context+tools+boundaries+escalation+measurement+trace
If one element is missing, it is not an agent. We build a simpler script, integration, or LLM application instead, and we say so plainly.
Agent-washing
Most "agents" on the market are a repackaged chatbot.
The same word now sticks to a chatbot, a RAG app, a copilot, and a script with a model. So we do not start from the label on a slide; we start from the work and the boundaries, and what you are actually getting you settle with the seven criteria below. And the reverse: even a real agent costs more and carries more risk than a script or an app, so we propose one only where the next step genuinely depends on the content of the case. Everywhere else we point you to a simpler, cheaper form, even when that means a smaller job for us.
The market in numbers: adoption and agent-washing.
8.4%
of Polish companies used AI in 2025 (EU average: 20%, highest: Denmark 42%). A low bar, a real edge for whoever deploys it properly.
Eurostat 2025>40%
of agentic AI projects will be canceled by end of 2027: cost, unclear ROI, and weak risk controls. So pin down scope, cost, and risk control before you start, not after.
Gartner 20256%
how far token prices fell in 2026 (against 39% in the second half of 2025): deflation stalled and buyers keep shifting toward pricier premium models. That is why we meter run cost and cap it, instead of assuming it keeps falling.
YipitData 2026~130
real agent vendors Gartner counted among thousands. The rest is agent-washing: rebranded chatbots and RPA.
Gartner 2025
The market calls five different things an agent.
The market calls five different things an agent.
The same word gets attached to a chatbot, a copilot, a workflow, a team of agents, or a full runtime. Before you trust the word "agent," pin down which of these levels someone means.
- 1
Chatbot or RAG renamed as an agent
Answers questions, does not do work.
Chatbot with RAG, not an agent.
- 2
Copilot in a process
Helps a person; the person does the work.
Copilot, not an agent.
- 3
Operational tool-using agent
Sees context, calls APIs, escalates.
Can be an agent if all seven criteria are present.
- 4
Multi-agent orchestration
Many roles, shared memory, queues.
A team of agents, often overhyped.
- 5
Full agentic runtime
Persistent, long-running, tool permissions, lifecycle.
Operational stack. We do not claim to own a proprietary runtime.
Seven questions that turn a demo into a production system.
This is your detector. Ask anyone selling you an “agent” these seven questions: if they cannot answer them, it is not an agent yet, just an interface to a model. With the answers you know what it does, what it must not do, and when it hands a case to a human.
Scope
Work
A good answer
One process, one trigger, one measurable outcome.
Common mistake
“It handles customers.” Too broad to measure anything.
Context
A good answer
Named data sources, the rules, and the list of exceptions.
Common mistake
Passing the full conversation history on every call. The model drifts and the token bill climbs.
Tools
A good answer
Least privilege, read-only first, dedicated service accounts, an isolated environment.
Common mistake
The agent inherits someone's credentials, sessions, and files.
Control
Boundaries
A good answer
A hard “must not” list enforced in code, not in the prompt. External content is data, never a command.
Common mistake
The rule lives only in the prompt. Under load the agent skips it.
Escalation
A good answer
Explicit triggers, production writes behind approval, handoff with full context.
Common mistake
The agent decides its next move with no gate.
Proof
Measurement
A good answer
Weekly metrics, including Cost Per Query and Escalation Rate.
Common mistake
“It works,” with no numbers.
Trace
A good answer
Every action logged: what, when, on what basis. A central, replayable log.
Common mistake
State kept in model memory. Yesterday's run can't be reconstructed.
This is not a marketing checklist: the same seven points recur in the scan, the contract, and the deployment report, so you can hold the promise against the proof.
EU AI Act mapping
EU AI Act mapping
We map the seven agent criteria onto the EU AI Act requirements that feed operator documentation: trace, transparency, human oversight and monitoring. It organizes the technical documentation, but it is not a legal classification of the system or legal advice.
- Art. 12 · Record-keeping
- Trace
- Art. 13 · Transparency
- Work + Context
- Art. 13 + 26 · Tools and deployer obligations
- Tools
- Art. 14 · Human oversight
- Boundaries + Escalation
- Art. 17 + 72 · Quality and monitoring
- Measurement
Technical input to operator documentation, not legal advice: we do not classify the system as high-risk and do not determine your obligations. Most of these requirements apply to high-risk systems; separately, Art. 50 (telling a user they are talking to AI) binds providers and deployers from 2 August 2026, regardless of risk class. EUR-Lex ↗
These seven questions come from our production deployments, including real-estate lead intelligence. Where a number is confidential, we describe the shape of the work, not the value. See deployments →
Since February 2025, Art. 4 of the EU AI Act requires providers and deployers to ensure sufficient AI literacy among the people who use it. That's why our training is run by engineers. AI training for companies →
Five forms, one word
Agent, chatbot, copilot, RAG, or plain automation?
One word describes five different things. The difference is not the model; it is who runs the process and who performs the step.
Chatbot
What it doesAnswers questions in conversation, turn by turn.
The test that separates it from an agentIt never executes multi-step work on its own between your messages.
Copilot
What it doesSuggests as you work, while a human approves every step.
The test that separates it from an agentThe human drives and executes; the system only assists.
RAG app
What it doesSearches your documents and answers from them, with the source cited.
The test that separates it from an agentThe developer fixes the retrieval path, not the model. Even when it retrieves many times, it follows a fixed, programmed route and never decides on its own when or why to fetch.
Automation
What it doesRuns known steps along rules written in code.
The test that separates it from an agentA developer sets the path, not the model.
AI agent
What it doesRuns the process from event to outcome, within set boundaries.
The test that separates it from an agentThe model chooses the next step and tool, in a loop, until the goal or an escalation.
How it works
Purchase-invoice reconciliation, run as an agent
The same work, broken into the seven checks that let us call a system an agent:
- WorkRuns each supplier invoice from arrival to a posting-ready entry: a three-way match against the order and goods receipt, reconciled line by line, with every discrepancy flagged.
- ContextSees the invoice, the matching order and goods receipt, the supplier's terms and prior invoices, and your posting and approval rules.
- ToolsReads the PDF or structured e-invoice, queries the ERP for the order and receipt, normalises the supplier across aliases, and drafts the journal entry with its account coding.
- BoundariesPosts nothing to the ledger and releases no payment on its own. Above a set amount or below a confidence threshold, it stops at a human gate instead of guessing.
- EscalationMismatches, duplicates and first-time suppliers go to a controller with the proposed entry, the cited evidence and the reasoning already attached.
- MeasurementWe track the share of invoices reconciled without a correction and the time from arrival to posting-ready.
- TraceEvery match, decision and source lands in an immutable log, so the audit trail assembles itself as the work runs.
Running cost
What does running an agent actually cost?
As much as the work you give it. An agent doesn't run one query, it runs a loop: it plans, reaches for tools, checks the result and corrects, so it burns 5 to 30 times more tokens per task than a chatbot. The unit that matters is cost per completed task, not cost per prompt.
Cheaper tokens don't mean a cheaper agent: unit prices fall, but consumption rises faster, so the bill rises. The biggest drain is a gateless loop that retries on and on.
How we keep cost under control
Cache
Repeated context computed once.
Model routing
A cheaper, smaller model for simple steps, the frontier model for the hard ones.
Hard budget
A token-and-turn budget at the escalation gate.
We meter cost from day one (Cost Per Query). The same threshold that makes the agent safe guards the bill.
Safety
What stops an agent when someone tries to hijack it?
Not a filter at the door, but boundaries at the exit. An agent acts with real privilege, and the most common attack is prompt injection: a hidden instruction in an email, a document, or a page it reads starts steering it. It's the #1 risk for agents. No filter catches everything, so we design it so even a hijacked agent can't do anything dangerous.
OWASP 2026Even a hijacked agent can't do anything dangerous
Least privilege, read-only first.
ToolsA "must not" list enforced in code; external content is data, not a command.
BoundariesIrreversible steps wait for a human's approval.
EscalationEvery action logged and replayable.
TraceThese are the same four criteria that separate an agent from a fake. Boundaries aren't an add-on to safety: they are the safety.
Where it runs
In the EU by default (RODO/GDPR), on a major cloud (AWS, Azure, Google Cloud), or in your own cloud account if governance prefers. Code, prompts and data stay with you, no vendor lock-in.
Choosing the form
Agent or plain automation?
The shape of the work points to the form and the entry price. The four most common situations map directly onto our service lines:
Five questions before you decide
Can you write every step and exception on one sheet of paper?
If yes, that's automation. A script or an integration will do it cheaper and more predictably than a model.
Is the input unstructured text: emails, PDFs, notes?
A model can read, classify and extract while the flow stays closed. That's automation with AI, not yet an agent.
Does the next step depend on what's in the case?
That's agent territory: the system leads the case, picks the next step and reaches for tools, inside boundaries you set.
Must a human make the call: law, risk, relationship?
Then we build a copilot or an app: the system prepares, a human approves the effect.
Can you say what the system must never do and when it hands the case over?
If not yet, we fix the process first. Without boundaries and escalation we don't deploy an agent.
A repeatable process with known rules: inbox, documents, reports, syncing systems.
from 15 000 PLN
A tool with an LLM inside: a copilot, RAG, document extraction, an internal panel. A person approves the consequence.
from 25 000 PLN
The system should run the process itself inside set boundaries: pick the next step, use tools, escalate exceptions.
from 25 000 PLN
There is a process, but no obvious place to start: the form and the order still need to be set.
0 PLN
From zero to production
How an agent that actually does the work gets built.
We build in three stages. Each one ends at a gate: a concrete result in hand and your decision whether to go on. No "commit now, see it later."
Stage 01
Audit: the process first, not the model
A free scan: 30 minutes with an engineer on the process that costs you the most. For a complex case we write an Implementation Specification, a process map, architecture, and a fixed quote.
Gate
The takeaway and the specification are yours. Sometimes the best call is not to build an agent, and we will say so plainly.
Stage 02
Pilot: proof on your own data
The agent runs on a slice of real work, inside boundaries, with a human at the gate. We write down one measurable target before any code.
Gate
You decide after the pilot, not before. A result guarantee, not a promise.
Stage 03
Production: a system that runs the work
We deploy the agent in an isolated environment, with least-privilege access, a full trace, and escalation of exceptions.
Yours
The code, prompts, and data are yours. Maintenance and development scale with the process.
Evolution
From automation, through apps, to agents
Each year moved the boundary of what "AI" can do inside a company. The agent is the newest form, not the only one: below are the three years that shaped it.
- 2022. Automation: Fixed rules: it does exactly what it was programmed to, with no exceptions and no context.
- 2023. LLM apps: It understands language and generates, but waits to be asked: one step per prompt.
- 2025. Agents: It plans and acts across steps in a loop, and stops at the human boundary.
After these questions, we do not sell technology. We give a recommendation.
After the scan or the implementation specification you get one of three answers:
Build
When the work, boundaries, and trace are clear enough, the recommendation can lead to a full build or a narrower pilot.
Narrow or clean up
When the process is promising but too broad or underspecified, we first reduce scope or clean up the inputs.
Do not build an agent yet
When plain automation, a script, or a human decision is the better answer, we write that plainly.
Free process scan
Start with a free process scan.
- 30 minutes with the engineer who would build it, not a salesperson.
- A review of the processes that cost you the most time and money.
- A written summary: what to automate, in what order, with cost ranges.
No sales deck and no obligations. If automation doesn't make sense, we'll write that too.
0 PLN
30 minutes · written takeaway within 2 business days
Bring one process. We will check whether it really needs an agent.
- 30 minutes with the engineer who would build it, not a salesperson.
- A review of the processes that cost you the most time and money.
- A written summary: what to automate, in what order, with cost ranges.
No sales deck and no obligations. If automation doesn't make sense, we'll write that too.
Qualification questions
Before you call something an agent, ask these.
This is a quick filter before any build conversation. It does not settle the full architecture, but it prevents selling an agent where a simpler pattern is enough.
Does every process need an AI agent?
No. Sometimes the right answer is a plain script, integration, AI-assisted automation, LLM app, copilot for a human, or no build.When is automation enough instead of an AI agent?
When the steps are fixed, the rules are known, and the result can be checked with a test or simple validation. Then we build a script, integration, or workflow, not an agent.When does an LLM app become an agent?
Only when the system performs named work with tools, chooses the next step based on context, operates within boundaries, escalates exceptions, measures the result, and leaves a trace.How is an AI agent different from a chatbot?
A chatbot answers questions in conversation. An agent does the work: it chooses the next step, uses tools, and carries the case to an outcome within boundaries, handing exceptions to a person.Does an AI agent run without human oversight?
No. An agent operates within the boundaries you set: production impact needs approval, and anything it is unsure of goes to a person. Full autonomy without boundaries is a risk, not a feature.How do you know an agent is working correctly?
From a measure set before the build and from the trace. We define what "works" means (for example, the share of cases handled without a correction), and every action lands in the log, so you can check what the system did and why.