Skip to content

/ Przemysław Tarkowski

I built a production system of autonomous AI agents — and I run it every day.

Senior engineer, 11 years shipping real software — a 100M+ download game, solo-built AWS infrastructure. Now aimed at AI agents, with a working system to show, not slides.

TarkOS · open source — coming soon

system map · TarkOS harness → OS → capabilities

layer

Claude Code harness

agent OS

audito (agent OS)

01

Decisions (Decision-OS)

02

Memory / capture

03

Non-stop autonomy

tools built
140+
tools built
skills
50+
skills
sessions run
280+
sessions run
every change
eval-gated
every change

/ What the system does

Not a chatbot — an operating system for getting real work done autonomously.

  • 01 autonomous orchestration

    Runs on its own

    Give it a goal and it works across sessions while I'm away — inventing the next task, not waiting for prompts.

  • 02 human-in-the-loop · governance

    Decides like me — asks only when it must

    It owns the reversible calls and batches the irreversible ones into a decision tray for my sign-off.

  • 03 eval-driven

    Measures its own work

    Every change is regression-gated and retrieval is scored — the agent grades itself before I ever see it.

  • 04 self-evolving

    Gets better every session

    It captures my corrections into explicit rules, so the same steer is never needed twice.

Note The code lives inside my private work harness (open-sourcing soon). I'm happy to walk you through it live — architecture, decisions, the eval loop.

/ Case study · the system in depth

How I run agents unattended — and still trust the output

The problem

An agent left to work alone breaks the same way every time: it makes decisions it can't justify, does irreversible things without asking, and loses the thread the moment a context window fills. The gap between an impressive demo and a system you'd actually leave running is engineering, not prompting.

  1. Act 1 multi-agent · long-horizon

    Orchestration & continuity

    One orchestrator spawns and supervises worker agents across parallel workstreams, monitoring by work-signals rather than guesswork. When a worker approaches its context ceiling it relay-hops — closing cleanly and handing a successor the full thread (warm resume + lossless history load) — so long-running work survives across sessions, attended in the day or unattended overnight.

  2. Act 2 human-in-the-loop · governance

    Decision-OS — the trust layer

    Every decision is classified: reversible ones the agent makes itself, irreversible ones it batches into a tray for my sign-off (default-deny). An append-only, gap-detecting audit trail records each call with its rationale — tamper-evident, the way a regulated environment needs it. This is what turns 'runs by itself' into 'runs by itself, safely'.

The result — operated, not demoed

sessions operated
530+
sessions operated
fully-unattended runs
dozens
fully-unattended runs
decisions logged w/ audit trail
120+
decisions logged w/ audit trail
tools built around the loop
143
tools built around the loop

/ Where this comes from

Eleven years shipping — now pointed at agents.

The EU research I worked on was literally about autonomous orchestration of cloud/edge resources. Today I orchestrate autonomous agents. Same instinct, new substrate.

11 yrs
shipping production software as a senior engineer
100M+
downloads — Crazy Kick, published with Voodoo (LiveOps at scale)
2019–22
solo-designed & ran AWS multiplayer infrastructure (EC2/ECS/Lambda/DynamoDB)
EU H2020
autonomous orchestration of cloud/edge resources (ACCORDION · CHARITY)

02 / Capabilities

A short clip per feature: what it does and how it works.

Preview: Non-stop manager 01 / 03
clip coming soon

Non-stop manager

The agent works while you're away; decisions wait in a tray, not in the chat.

Preview: Decision queue 02 / 03
clip coming soon

Decision queue

A queue of decisions with consequences and a recommendation. One click instead of meta-work.

Preview: Capture / memory 03 / 03
clip coming soon

Capture / memory

Knowledge from every session lands in the right file with provenance. It's git-blame for knowledge.

03 / Skills layer

Agent systems, end to end

06 items

Where I'm strong today: the operating layer most agent projects lack.

  • Agent architecture: rules, skills, knowledge, memory
  • Decision routing with human-in-the-loop gates
  • Autonomous lifecycle: overnight runs, watchdogs, recovery
  • Multi-session orchestration on parallel workstreams
  • Verification by outcomes: tests as contracts, regression suites
  • Memory and capture with measured retrieval quality

Market baseline

04 items

What I'm building on top of that. Today AI writes the implementation under my architecture; I verify by outcomes.

  • Python AI-directed today
  • RAG / vector DB own evals built
  • LangGraph learning
  • React soon

05 / Connect

Open to interesting conversations: AI engineering, agent systems, devtools, applied AI.

If you're curious how all of this works together, or you want to talk about agents, devtools, or applied AI, get in touch. Happy to walk you through it.