Skip to content

Chatbot, copilot, or agent?

Chatbots respond.
Copilots help.
Agents do work within boundaries.

An AI agent is a system that does part of a business process on its own: it reads a ticket, an invoice or a record, picks the next step and works in your tools (CRM, inbox, spreadsheets). It operates inside boundaries you set, and hands anything it is not sure about to a human. This page is your test: how to tell a real agent before someone sells you a repackaged chatbot.

In short

  • An AI agent runs the process itself: it chooses the next step and uses tools in a loop, until it reaches the goal or hands the case to a person.
  • A chatbot answers, a copilot assists, an automation runs on a fixed path. What sets an agent apart is that the model picks the next step, within the boundaries you set.
  • An agent pays for itself only where the next step depends on the content of the case; the rest is handled more cheaply by a script or an app.
  • An "agent" is not a marketing label but a system that passes seven checks: work, context, tools, boundaries, escalation, measurement, trace. Miss one and it is not an agent yet.

What an agent is made of

Agent =work+context+tools+boundaries+escalation+measurement+trace

If one element is missing, it is not an agent. We build a simpler script, integration, or LLM application instead, and we say so plainly.

Agent-washing

Most "agents" on the market are a repackaged chatbot.

The same word now sticks to a chatbot, a RAG app, a copilot, and a script with a model. So we do not start from the label on a slide; we start from the work and the boundaries, and what you are actually getting you settle with the seven criteria below. And the reverse: even a real agent costs more and carries more risk than a script or an app, so we propose one only where the next step genuinely depends on the content of the case. Everywhere else we point you to a simpler, cheaper form, even when that means a smaller job for us.

The market in numbers: adoption and agent-washing.

  • 8.4%

    of Polish companies used AI in 2025 (EU average: 20%, highest: Denmark 42%). A low bar, a real edge for whoever deploys it properly.

    Eurostat 2025
  • >40%

    of agentic AI projects will be canceled by end of 2027: cost, unclear ROI, and weak risk controls. So pin down scope, cost, and risk control before you start, not after.

    Gartner 2025
  • 6%

    how far token prices fell in 2026 (against 39% in the second half of 2025): deflation stalled and buyers keep shifting toward pricier premium models. That is why we meter run cost and cap it, instead of assuming it keeps falling.

    YipitData 2026
  • ~130

    real agent vendors Gartner counted among thousands. The rest is agent-washing: rebranded chatbots and RPA.

    Gartner 2025

The market calls five different things an agent.

The same word gets attached to a chatbot, a copilot, a workflow, a team of agents, or a full runtime. Before you trust the word "agent," pin down which of these levels someone means.

  1. 1

    Chatbot or RAG renamed as an agent

    Answers questions, does not do work.

    Chatbot with RAG, not an agent.

  2. 2

    Copilot in a process

    Helps a person; the person does the work.

    Copilot, not an agent.

  3. 3

    Operational tool-using agent

    Sees context, calls APIs, escalates.

    Can be an agent if all seven criteria are present.

  4. 4

    Multi-agent orchestration

    Many roles, shared memory, queues.

    A team of agents, often overhyped.

  5. 5

    Full agentic runtime

    Persistent, long-running, tool permissions, lifecycle.

    Operational stack. We do not claim to own a proprietary runtime.

Seven questions that turn a demo into a production system.

This is your detector. Ask anyone selling you an “agent” these seven questions: if they cannot answer them, it is not an agent yet, just an interface to a model. With the answers you know what it does, what it must not do, and when it hands a case to a human.

Scope

What the agent should do and what it may reach.

Work

What specific work does the agent do?

A good answer

One process, one trigger, one measurable outcome.

Common mistake

“It handles customers.” Too broad to measure anything.

Context

What must the agent understand: data, rules, exceptions?

A good answer

Named data sources, the rules, and the list of exceptions.

Common mistake

Passing the full conversation history on every call. The model drifts and the token bill climbs.

Tools

Which systems does it use, and with what permissions?

A good answer

Least privilege, read-only first, dedicated service accounts, an isolated environment.

Common mistake

The agent inherits someone's credentials, sessions, and files.

Control

Where the agent stops and who takes the exception.

Boundaries

What must it never do on its own?

A good answer

A hard “must not” list enforced in code, not in the prompt. External content is data, never a command.

Common mistake

The rule lives only in the prompt. Under load the agent skips it.

Escalation

When does it hand off to a human?

A good answer

Explicit triggers, production writes behind approval, handoff with full context.

Common mistake

The agent decides its next move with no gate.

Proof

Whether the system is worth keeping and how to replay what it did.

Measurement

How do we know it's worth keeping?

A good answer

Weekly metrics, including Cost Per Query and Escalation Rate.

Common mistake

“It works,” with no numbers.

Trace

What does it record, and who can audit it?

A good answer

Every action logged: what, when, on what basis. A central, replayable log.

Common mistake

State kept in model memory. Yesterday's run can't be reconstructed.

This is not a marketing checklist: the same seven points recur in the scan, the contract, and the deployment report, so you can hold the promise against the proof.

EU AI Act mapping

We map the seven agent criteria onto the EU AI Act requirements that feed operator documentation: trace, transparency, human oversight and monitoring. It organizes the technical documentation, but it is not a legal classification of the system or legal advice.

Art. 12 · Record-keeping
Trace
Art. 13 · Transparency
Work + Context
Art. 13 + 26 · Tools and deployer obligations
Tools
Art. 14 · Human oversight
Boundaries + Escalation
Art. 17 + 72 · Quality and monitoring
Measurement

Technical input to operator documentation, not legal advice: we do not classify the system as high-risk and do not determine your obligations. Most of these requirements apply to high-risk systems; separately, Art. 50 (telling a user they are talking to AI) binds providers and deployers from 2 August 2026, regardless of risk class. EUR-Lex ↗

These seven questions come from our production deployments, including real-estate lead intelligence. Where a number is confidential, we describe the shape of the work, not the value. See deployments

Since February 2025, Art. 4 of the EU AI Act requires providers and deployers to ensure sufficient AI literacy among the people who use it. That's why our training is run by engineers. AI training for companies

Five forms, one word

Agent, chatbot, copilot, RAG, or plain automation?

One word describes five different things. The difference is not the model; it is who runs the process and who performs the step.

  • Chatbot

    What it doesAnswers questions in conversation, turn by turn.

    The test that separates it from an agentIt never executes multi-step work on its own between your messages.

  • Copilot

    What it doesSuggests as you work, while a human approves every step.

    The test that separates it from an agentThe human drives and executes; the system only assists.

  • RAG app

    What it doesSearches your documents and answers from them, with the source cited.

    The test that separates it from an agentThe developer fixes the retrieval path, not the model. Even when it retrieves many times, it follows a fixed, programmed route and never decides on its own when or why to fetch.

  • Automation

    What it doesRuns known steps along rules written in code.

    The test that separates it from an agentA developer sets the path, not the model.

  • AI agent

    What it doesRuns the process from event to outcome, within set boundaries.

    The test that separates it from an agentThe model chooses the next step and tool, in a loop, until the goal or an escalation.

The simpler the form, the cheaper and more reliable it is. We propose an agent only when a cheaper form won't do.

How it works

Purchase-invoice reconciliation, run as an agent

The same work, broken into the seven checks that let us call a system an agent:

Agent · ready
  1. WorkRuns each supplier invoice from arrival to a posting-ready entry: a three-way match against the order and goods receipt, reconciled line by line, with every discrepancy flagged.
  2. ContextSees the invoice, the matching order and goods receipt, the supplier's terms and prior invoices, and your posting and approval rules.
  3. ToolsReads the PDF or structured e-invoice, queries the ERP for the order and receipt, normalises the supplier across aliases, and drafts the journal entry with its account coding.
  4. BoundariesPosts nothing to the ledger and releases no payment on its own. Above a set amount or below a confidence threshold, it stops at a human gate instead of guessing.
  5. EscalationMismatches, duplicates and first-time suppliers go to a controller with the proposed entry, the cited evidence and the reasoning already attached.
  6. MeasurementWe track the share of invoices reconciled without a correction and the time from arrival to posting-ready.
  7. TraceEvery match, decision and source lands in an immutable log, so the audit trail assembles itself as the work runs.

The example illustrates the pattern; it is not a specific client's result.

Running cost

What does running an agent actually cost?

As much as the work you give it. An agent doesn't run one query, it runs a loop: it plans, reaches for tools, checks the result and corrects, so it burns 5 to 30 times more tokens per task than a chatbot. The unit that matters is cost per completed task, not cost per prompt.

5–30×

more tokens per task than a chatbot

Gartner 2026

Cheaper tokens don't mean a cheaper agent: unit prices fall, but consumption rises faster, so the bill rises. The biggest drain is a gateless loop that retries on and on.

How we keep cost under control

  1. Cache

    Repeated context computed once.

  2. Model routing

    A cheaper, smaller model for simple steps, the frontier model for the hard ones.

  3. Hard budget

    A token-and-turn budget at the escalation gate.

We meter cost from day one (Cost Per Query). The same threshold that makes the agent safe guards the bill.

Safety

What stops an agent when someone tries to hijack it?

Not a filter at the door, but boundaries at the exit. An agent acts with real privilege, and the most common attack is prompt injection: a hidden instruction in an email, a document, or a page it reads starts steering it. It's the #1 risk for agents. No filter catches everything, so we design it so even a hijacked agent can't do anything dangerous.

OWASP 2026

Even a hijacked agent can't do anything dangerous

Least privilege, read-only first.

Tools

A "must not" list enforced in code; external content is data, not a command.

Boundaries

Irreversible steps wait for a human's approval.

Escalation

Every action logged and replayable.

Trace

These are the same four criteria that separate an agent from a fake. Boundaries aren't an add-on to safety: they are the safety.

Where it runs

In the EU by default (RODO/GDPR), on a major cloud (AWS, Azure, Google Cloud), or in your own cloud account if governance prefers. Code, prompts and data stay with you, no vendor lock-in.

Choosing the form

Agent or plain automation?

The shape of the work points to the form and the entry price. The four most common situations map directly onto our service lines:

Five questions before you decide

  1. Can you write every step and exception on one sheet of paper?

    If yes, that's automation. A script or an integration will do it cheaper and more predictably than a model.

  2. Is the input unstructured text: emails, PDFs, notes?

    A model can read, classify and extract while the flow stays closed. That's automation with AI, not yet an agent.

  3. Does the next step depend on what's in the case?

    That's agent territory: the system leads the case, picks the next step and reaches for tools, inside boundaries you set.

  4. Must a human make the call: law, risk, relationship?

    Then we build a copilot or an app: the system prepares, a human approves the effect.

  5. Can you say what the system must never do and when it hands the case over?

    If not yet, we fix the process first. Without boundaries and escalation we don't deploy an agent.

  1. A repeatable process with known rules: inbox, documents, reports, syncing systems.

    AI Automations

    from 15 000 PLN

  2. A tool with an LLM inside: a copilot, RAG, document extraction, an internal panel. A person approves the consequence.

    AI Apps

    from 25 000 PLN

  3. The system should run the process itself inside set boundaries: pick the next step, use tools, escalate exceptions.

    AI Agents

    from 25 000 PLN

  4. There is a process, but no obvious place to start: the form and the order still need to be set.

    Free process scan

    0 PLN

Net prices. After the scan you get a written takeaway within 2 business days.

From zero to production

How an agent that actually does the work gets built.

We build in three stages. Each one ends at a gate: a concrete result in hand and your decision whether to go on. No "commit now, see it later."

  1. Stage 01

    Audit: the process first, not the model

    A free scan: 30 minutes with an engineer on the process that costs you the most. For a complex case we write an Implementation Specification, a process map, architecture, and a fixed quote.

    Gate

    The takeaway and the specification are yours. Sometimes the best call is not to build an agent, and we will say so plainly.

    written takeaway within 2 business days

  2. Stage 02

    Pilot: proof on your own data

    The agent runs on a slice of real work, inside boundaries, with a human at the gate. We write down one measurable target before any code.

    Gate

    You decide after the pilot, not before. A result guarantee, not a promise.

    ~6–8 weeks · target in writing

  3. Stage 03

    Production: a system that runs the work

    We deploy the agent in an isolated environment, with least-privilege access, a full trace, and escalation of exceptions.

    Yours

    The code, prompts, and data are yours. Maintenance and development scale with the process.

    production, measurement and maintenance

Evolution

From automation, through apps, to agents

Each year moved the boundary of what "AI" can do inside a company. The agent is the newest form, not the only one: below are the three years that shaped it.

  1. 2022. Automation: Fixed rules: it does exactly what it was programmed to, with no exceptions and no context.
  2. 2023. LLM apps: It understands language and generates, but waits to be asked: one step per prompt.
  3. 2025. Agents: It plans and acts across steps in a loop, and stops at the human boundary.

Dates are market milestones, not our own timeline: ChatGPT (Nov 2022), GPT-4 (2023), the "year of agents" (2025).

After these questions, we do not sell technology. We give a recommendation.

After the scan or the implementation specification you get one of three answers:

Build

When the work, boundaries, and trace are clear enough, the recommendation can lead to a full build or a narrower pilot.

Narrow or clean up

When the process is promising but too broad or underspecified, we first reduce scope or clean up the inputs.

Do not build an agent yet

When plain automation, a script, or a human decision is the better answer, we write that plainly.

Each recommendation comes back with a price and scope, so you can compare it before you commission anything.

Free process scan

Start with a free process scan.

  • 30 minutes with the engineer who would build it, not a salesperson.
  • A review of the processes that cost you the most time and money.
  • A written summary: what to automate, in what order, with cost ranges.

No sales deck and no obligations. If automation doesn't make sense, we'll write that too.

0 PLN

30 minutes · written takeaway within 2 business days

Bring one process. We will check whether it really needs an agent.

  • 30 minutes with the engineer who would build it, not a salesperson.
  • A review of the processes that cost you the most time and money.
  • A written summary: what to automate, in what order, with cost ranges.
0 PLN30 minutes · written takeaway within 2 business days
Book a free process scan (30 min)

No sales deck and no obligations. If automation doesn't make sense, we'll write that too.

Qualification questions

Before you call something an agent, ask these.

This is a quick filter before any build conversation. It does not settle the full architecture, but it prevents selling an agent where a simpler pattern is enough.

  • Does every process need an AI agent?

  • When is automation enough instead of an AI agent?

  • When does an LLM app become an agent?

  • How is an AI agent different from a chatbot?

  • Does an AI agent run without human oversight?

  • How do you know an agent is working correctly?