Does every process need an AI agent?

No. Sometimes the right answer is a plain script, integration, AI-assisted automation, LLM app, copilot for a human, or no build.

When is automation enough instead of an AI agent?

When the steps are fixed, the rules are known, and the result can be checked with a test or simple validation. Then we build a script, integration, or workflow, not an agent.

When does an LLM app become an agent?

Only when the system performs named work with tools, chooses the next step based on context, operates within boundaries, escalates exceptions, measures the result, and leaves a trace.

How is an AI agent different from a chatbot?

A chatbot answers questions in conversation. An agent does the work: it chooses the next step, uses tools, and carries the case to an outcome within boundaries, handing exceptions to a person.

Does an AI agent run without human oversight?

No. An agent operates within the boundaries you set: production impact needs approval, and anything it is unsure of goes to a person. Full autonomy without boundaries is a risk, not a feature.

How do you know an agent is working correctly?

From a measure set before the build and from the trace. We define what "works" means (for example, the share of cases handled without a correction), and every action lands in the log, so you can check what the system did and why.

Chatbot, copilot, or agent?

Chatbots respond.
Copilots help.
Agents do work within boundaries.

An AI agent is a system that does part of a business process on its own: it reads a ticket, an invoice or a record, picks the next step and works in your tools (CRM, inbox, spreadsheets). It operates inside boundaries you set, and hands anything it is not sure about to a human. This page is your test: how to tell a real agent before someone sells you a repackaged chatbot.

In short

An AI agent runs the process itself: it chooses the next step and uses tools in a loop, until it reaches the goal or hands the case to a person.
A chatbot answers, a copilot assists, an automation runs on a fixed path. What sets an agent apart is that the model picks the next step, within the boundaries you set.
An agent pays for itself only where the next step depends on the content of the case; the rest is handled more cheaply by a script or an app.
An "agent" is not a marketing label but a system that passes seven checks: work, context, tools, boundaries, escalation, measurement, trace. Miss one and it is not an agent yet.

See the seven questions that settle it

What an agent is made of

Agent =work+context+tools+boundaries+escalation+measurement+trace

If one element is missing, it is not an agent. We build a simpler script, integration, or LLM application instead, and we say so plainly.

Agent-washing

Most "agents" on the market are a repackaged chatbot.

The same word now sticks to a chatbot, a RAG app, a copilot, and a script with a model. So we do not start from the label on a slide; we start from the work and the boundaries, and what you are actually getting you settle with the seven criteria below. And the reverse: even a real agent costs more and carries more risk than a script or an app, so we propose one only where the next step genuinely depends on the content of the case. Everywhere else we point you to a simpler, cheaper form, even when that means a smaller job for us.

The market in numbers: adoption and agent-washing.

8.4%
of Polish companies used AI in 2025 (EU average: 20%, highest: Denmark 42%). A low bar, a real edge for whoever deploys it properly.
Eurostat 2025
>40%
of agentic AI projects will be canceled by end of 2027: cost, unclear ROI, and weak risk controls. So pin down scope, cost, and risk control before you start, not after.
Gartner 2025
6%
how far token prices fell in 2026 (against 39% in the second half of 2025): deflation stalled and buyers keep shifting toward pricier premium models. That is why we meter run cost and cap it, instead of assuming it keeps falling.
YipitData 2026
~130
real agent vendors Gartner counted among thousands. The rest is agent-washing: rebranded chatbots and RPA.
Gartner 2025

The market calls five different things an agent.

The same word gets attached to a chatbot, a copilot, a workflow, a team of agents, or a full runtime. Before you trust the word "agent," pin down which of these levels someone means.

1
Chatbot or RAG renamed as an agent
Answers questions, does not do work.
Chatbot with RAG, not an agent.
2
Copilot in a process
Helps a person; the person does the work.
Copilot, not an agent.
3
Operational tool-using agent
Sees context, calls APIs, escalates.
Can be an agent if all seven criteria are present.
4
Multi-agent orchestration
Many roles, shared memory, queues.
A team of agents, often overhyped.
5
Full agentic runtime
Persistent, long-running, tool permissions, lifecycle.
Operational stack. We do not claim to own a proprietary runtime.

Seven questions that turn a demo into a production system.

This is your detector. Ask anyone selling you an “agent” these seven questions: if they cannot answer them, it is not an agent yet, just an interface to a model. With the answers you know what it does, what it must not do, and when it hands a case to a human.

Scope

What the agent should do and what it may reach.

Work

What specific work does the agent do?

A good answer

One process, one trigger, one measurable outcome.

Common mistake

“It handles customers.” Too broad to measure anything.

Context

What must the agent understand: data, rules, exceptions?

A good answer

Named data sources, the rules, and the list of exceptions.

Common mistake

Passing the full conversation history on every call. The model drifts and the token bill climbs.

Tools

Which systems does it use, and with what permissions?

A good answer

Least privilege, read-only first, dedicated service accounts, an isolated environment.

Common mistake

The agent inherits someone's credentials, sessions, and files.

Control

Where the agent stops and who takes the exception.

Boundaries

What must it never do on its own?

A good answer

A hard “must not” list enforced in code, not in the prompt. External content is data, never a command.

Common mistake

The rule lives only in the prompt. Under load the agent skips it.

Escalation

When does it hand off to a human?

A good answer

Explicit triggers, production writes behind approval, handoff with full context.

Common mistake

The agent decides its next move with no gate.

Proof

Whether the system is worth keeping and how to replay what it did.

Measurement

How do we know it's worth keeping?

A good answer

Weekly metrics, including Cost Per Query and Escalation Rate.

Common mistake

“It works,” with no numbers.

Trace

What does it record, and who can audit it?

A good answer

Every action logged: what, when, on what basis. A central, replayable log.

Common mistake

State kept in model memory. Yesterday's run can't be reconstructed.

This is not a marketing checklist: the same seven points recur in the scan, the contract, and the deployment report, so you can hold the promise against the proof.

EU AI Act mapping

We map the seven agent criteria onto the EU AI Act requirements that feed operator documentation: trace, transparency, human oversight and monitoring. It organizes the technical documentation, but it is not a legal classification of the system or legal advice.

Art. 12 · Record-keeping: Trace
Art. 13 · Transparency: Work + Context
Art. 13 + 26 · Tools and deployer obligations: Tools
Art. 14 · Human oversight: Boundaries + Escalation
Art. 17 + 72 · Quality and monitoring: Measurement

Technical input to operator documentation, not legal advice: we do not classify the system as high-risk and do not determine your obligations. Most of these requirements apply to high-risk systems; separately, Art. 50 (telling a user they are talking to AI) binds providers and deployers from 2 August 2026, regardless of risk class. EUR-Lex ↗

These seven questions come from our production deployments, including real-estate lead intelligence. Where a number is confidential, we describe the shape of the work, not the value. See deployments →

Since February 2025, Art. 4 of the EU AI Act requires providers and deployers to ensure sufficient AI literacy among the people who use it. That's why our training is run by engineers. AI training for companies →

Five forms, one word

Agent, chatbot, copilot, RAG, or plain automation?

One word describes five different things. The difference is not the model; it is who runs the process and who performs the step.

FormWhat it doesThe test that separates it from an agent

Chatbot
What it doesAnswers questions in conversation, turn by turn.
The test that separates it from an agentIt never executes multi-step work on its own between your messages.
Copilot
What it doesSuggests as you work, while a human approves every step.
The test that separates it from an agentThe human drives and executes; the system only assists.
RAG app
What it doesSearches your documents and answers from them, with the source cited.
The test that separates it from an agentThe developer fixes the retrieval path, not the model. Even when it retrieves many times, it follows a fixed, programmed route and never decides on its own when or why to fetch.
Automation
What it doesRuns known steps along rules written in code.
The test that separates it from an agentA developer sets the path, not the model.
AI agent
What it doesRuns the process from event to outcome, within set boundaries.
The test that separates it from an agentThe model chooses the next step and tool, in a loop, until the goal or an escalation.

The simpler the form, the cheaper and more reliable it is. We propose an agent only when a cheaper form won't do.

How it works

Purchase-invoice reconciliation, run as an agent

The same work, broken into the seven checks that let us call a system an agent:

Agent · ready

WorkRuns each supplier invoice from arrival to a posting-ready entry: a three-way match against the order and goods receipt, reconciled line by line, with every discrepancy flagged.
ContextSees the invoice, the matching order and goods receipt, the supplier's terms and prior invoices, and your posting and approval rules.
ToolsReads the PDF or structured e-invoice, queries the ERP for the order and receipt, normalises the supplier across aliases, and drafts the journal entry with its account coding.
BoundariesPosts nothing to the ledger and releases no payment on its own. Above a set amount or below a confidence threshold, it stops at a human gate instead of guessing.
EscalationMismatches, duplicates and first-time suppliers go to a controller with the proposed entry, the cited evidence and the reasoning already attached.
MeasurementWe track the share of invoices reconciled without a correction and the time from arrival to posting-ready.
TraceEvery match, decision and source lands in an immutable log, so the audit trail assembles itself as the work runs.

The example illustrates the pattern; it is not a specific client's result.

Running cost

What does running an agent actually cost?

As much as the work you give it. An agent doesn't run one query, it runs a loop: it plans, reaches for tools, checks the result and corrects, so it burns 5 to 30 times more tokens per task than a chatbot. The unit that matters is cost per completed task, not cost per prompt.

5–30×

more tokens per task than a chatbot

Gartner 2026

Cheaper tokens don't mean a cheaper agent: unit prices fall, but consumption rises faster, so the bill rises. The biggest drain is a gateless loop that retries on and on.

How we keep cost under control

Cache
Repeated context computed once.
Model routing
A cheaper, smaller model for simple steps, the frontier model for the hard ones.
Hard budget
A token-and-turn budget at the escalation gate.

We meter cost from day one (Cost Per Query). The same threshold that makes the agent safe guards the bill.

Safety

What stops an agent when someone tries to hijack it?

Not a filter at the door, but boundaries at the exit. An agent acts with real privilege, and the most common attack is prompt injection: a hidden instruction in an email, a document, or a page it reads starts steering it. It's the #1 risk for agents. No filter catches everything, so we design it so even a hijacked agent can't do anything dangerous.

OWASP 2026

Even a hijacked agent can't do anything dangerous

Least privilege, read-only first.

Tools

A "must not" list enforced in code; external content is data, not a command.

Boundaries

Irreversible steps wait for a human's approval.

Escalation

Every action logged and replayable.

Trace

These are the same four criteria that separate an agent from a fake. Boundaries aren't an add-on to safety: they are the safety.

Where it runs

In the EU by default (RODO/GDPR), on a major cloud (AWS, Azure, Google Cloud), or in your own cloud account if governance prefers. Code, prompts and data stay with you, no vendor lock-in.

Choosing the form

Agent or plain automation?

The shape of the work points to the form and the entry price. The four most common situations map directly onto our service lines:

Five questions before you decide

Can you write every step and exception on one sheet of paper?
If yes, that's automation. A script or an integration will do it cheaper and more predictably than a model.
Is the input unstructured text: emails, PDFs, notes?
A model can read, classify and extract while the flow stays closed. That's automation with AI, not yet an agent.
Does the next step depend on what's in the case?
That's agent territory: the system leads the case, picks the next step and reaches for tools, inside boundaries you set.
Must a human make the call: law, risk, relationship?
Then we build a copilot or an app: the system prepares, a human approves the effect.
Can you say what the system must never do and when it hands the case over?
If not yet, we fix the process first. Without boundaries and escalation we don't deploy an agent.

Shape of the workThe right formPrice

A repeatable process with known rules: inbox, documents, reports, syncing systems.
AI Automations
from 15 000 PLN
A tool with an LLM inside: a copilot, RAG, document extraction, an internal panel. A person approves the consequence.
AI Apps
from 25 000 PLN
The system should run the process itself inside set boundaries: pick the next step, use tools, escalate exceptions.
AI Agents
from 25 000 PLN
There is a process, but no obvious place to start: the form and the order still need to be set.
Free process scan
0 PLN

Net prices. After the scan you get a written takeaway within 2 business days.

From zero to production

How an agent that actually does the work gets built.

We build in three stages. Each one ends at a gate: a concrete result in hand and your decision whether to go on. No "commit now, see it later."

Stage 01
Audit: the process first, not the model
A free scan: 30 minutes with an engineer on the process that costs you the most. For a complex case we write an Implementation Specification, a process map, architecture, and a fixed quote.
Gate
The takeaway and the specification are yours. Sometimes the best call is not to build an agent, and we will say so plainly.
written takeaway within 2 business days
Stage 02
Pilot: proof on your own data
The agent runs on a slice of real work, inside boundaries, with a human at the gate. We write down one measurable target before any code.
Gate
You decide after the pilot, not before. A result guarantee, not a promise.
~6–8 weeks · target in writing
Stage 03
Production: a system that runs the work
We deploy the agent in an isolated environment, with least-privilege access, a full trace, and escalation of exceptions.
Yours
The code, prompts, and data are yours. Maintenance and development scale with the process.
production, measurement and maintenance

Evolution

From automation, through apps, to agents

Each year moved the boundary of what "AI" can do inside a company. The agent is the newest form, not the only one: below are the three years that shaped it.

Dates are market milestones, not our own timeline: ChatGPT (Nov 2022), GPT-4 (2023), the "year of agents" (2025).

After these questions, we do not sell technology. We give a recommendation.

After the scan or the implementation specification you get one of three answers:

Build

When the work, boundaries, and trace are clear enough, the recommendation can lead to a full build or a narrower pilot.

Narrow or clean up

When the process is promising but too broad or underspecified, we first reduce scope or clean up the inputs.

Do not build an agent yet

When plain automation, a script, or a human decision is the better answer, we write that plainly.

Each recommendation comes back with a price and scope, so you can compare it before you commission anything.

Free process scan

Start with a free process scan.

30 minutes with the engineer who would build it, not a salesperson.
A review of the processes that cost you the most time and money.
A written summary: what to automate, in what order, with cost ranges.

No sales deck and no obligations. If automation doesn't make sense, we'll write that too.

0 PLN

30 minutes · written takeaway within 2 business days

Book a free process scan (30 min)Prefer to write? No-obligation form

Bring one process. We will check whether it really needs an agent.

30 minutes with the engineer who would build it, not a salesperson.
A review of the processes that cost you the most time and money.
A written summary: what to automate, in what order, with cost ranges.

0 PLN30 minutes · written takeaway within 2 business days

Book a free process scan (30 min)

No sales deck and no obligations. If automation doesn't make sense, we'll write that too.

AI Process Audit: scan and specification

Qualification questions

Before you call something an agent, ask these.

This is a quick filter before any build conversation. It does not settle the full architecture, but it prevents selling an agent where a simpler pattern is enough.

Does every process need an AI agent?
No. Sometimes the right answer is a plain script, integration, AI-assisted automation, LLM app, copilot for a human, or no build.
When is automation enough instead of an AI agent?
When the steps are fixed, the rules are known, and the result can be checked with a test or simple validation. Then we build a script, integration, or workflow, not an agent.
When does an LLM app become an agent?
Only when the system performs named work with tools, chooses the next step based on context, operates within boundaries, escalates exceptions, measures the result, and leaves a trace.
How is an AI agent different from a chatbot?
A chatbot answers questions in conversation. An agent does the work: it chooses the next step, uses tools, and carries the case to an outcome within boundaries, handing exceptions to a person.
Does an AI agent run without human oversight?
No. An agent operates within the boundaries you set: production impact needs approval, and anything it is unsure of goes to a person. Full autonomy without boundaries is a risk, not a feature.
How do you know an agent is working correctly?
From a measure set before the build and from the trace. We define what "works" means (for example, the share of cases handled without a correction), and every action lands in the log, so you can check what the system did and why.

Chatbots respond.Copilots help.Agents do work within boundaries.

Agent =work+context+tools+boundaries+escalation+measurement+trace

Most "agents" on the market are a repackaged chatbot.

The market calls five different things an agent.

Seven questions that turn a demo into a production system.

Scope

Work

Context

Tools

Control

Boundaries

Escalation

Proof

Measurement

Trace

EU AI Act mapping

Agent, chatbot, copilot, RAG, or plain automation?

Purchase-invoice reconciliation, run as an agent

What does running an agent actually cost?

Cache

Model routing

Hard budget

What stops an agent when someone tries to hijack it?

Agent or plain automation?

How an agent that actually does the work gets built.

Audit: the process first, not the model

Pilot: proof on your own data

Production: a system that runs the work

From automation, through apps, to agents

After these questions, we do not sell technology. We give a recommendation.

Build

Narrow or clean up

Do not build an agent yet

Start with a free process scan.

Bring one process. We will check whether it really needs an agent.

Before you call something an agent, ask these.

Does every process need an AI agent?

When is automation enough instead of an AI agent?

When does an LLM app become an agent?

How is an AI agent different from a chatbot?

Does an AI agent run without human oversight?

How do you know an agent is working correctly?

Chatbots respond.
Copilots help.
Agents do work within boundaries.