/ Przemysław Tarkowski

I built a production system of autonomous AI agents — and I run it every day.

Senior engineer, 11 years shipping real software — a 100M+ download game, solo-built AWS infrastructure. Now aimed at AI agents, with a working system to show, not slides.

See how it works Email

TarkOS · open source — coming soon

system map · TarkOS harness → OS → capabilities

layer

Claude Code harness

agent OS

audito (agent OS)

Decisions (Decision-OS)

Memory / capture

Non-stop autonomy

tools built: 140+
skills: 50+
sessions run: 280+
every change: eval-gated

/ What the system does

Not a chatbot — an operating system for getting real work done autonomously.

01 autonomous orchestration

Runs on its own

Give it a goal and it works across sessions while I'm away — inventing the next task, not waiting for prompts.
02 human-in-the-loop · governance

Decides like me — asks only when it must

It owns the reversible calls and batches the irreversible ones into a decision tray for my sign-off.
03 eval-driven

Measures its own work

Every change is regression-gated and retrieval is scored — the agent grades itself before I ever see it.
04 self-evolving

Gets better every session

It captures my corrections into explicit rules, so the same steer is never needed twice.

Note The code lives inside my private work harness (open-sourcing soon). I'm happy to walk you through it live — architecture, decisions, the eval loop.

/ Case study · the system in depth

How I run agents unattended — and still trust the output

The problem

An agent left to work alone breaks the same way every time: it makes decisions it can't justify, does irreversible things without asking, and loses the thread the moment a context window fills. The gap between an impressive demo and a system you'd actually leave running is engineering, not prompting.

Act 1 multi-agent · long-horizon

Orchestration & continuity

One orchestrator spawns and supervises worker agents across parallel workstreams, monitoring by work-signals rather than guesswork. When a worker approaches its context ceiling it relay-hops — closing cleanly and handing a successor the full thread (warm resume + lossless history load) — so long-running work survives across sessions, attended in the day or unattended overnight.
Act 2 human-in-the-loop · governance

Decision-OS — the trust layer

Every decision is classified: reversible ones the agent makes itself, irreversible ones it batches into a tray for my sign-off (default-deny). An append-only, gap-detecting audit trail records each call with its rationale — tamper-evident, the way a regulated environment needs it. This is what turns 'runs by itself' into 'runs by itself, safely'.

The result — operated, not demoed

sessions operated: 530+
fully-unattended runs: dozens
decisions logged w/ audit trail: 120+
tools built around the loop: 143

/ Where this comes from

Eleven years shipping — now pointed at agents.

The EU research I worked on was literally about autonomous orchestration of cloud/edge resources. Today I orchestrate autonomous agents. Same instinct, new substrate.

11 yrs: shipping production software as a senior engineer
100M+: downloads — Crazy Kick, published with Voodoo (LiveOps at scale)
2019–22: solo-designed & ran AWS multiplayer infrastructure (EC2/ECS/Lambda/DynamoDB)
EU H2020: autonomous orchestration of cloud/edge resources (ACCORDION · CHARITY)

02 / Capabilities

A short clip per feature: what it does and how it works.

01 / 03

clip coming soon

Non-stop manager

The agent works while you're away; decisions wait in a tray, not in the chat.

02 / 03

clip coming soon

Decision queue

A queue of decisions with consequences and a recommendation. One click instead of meta-work.

03 / 03

clip coming soon

Capture / memory

Knowledge from every session lands in the right file with provenance. It's git-blame for knowledge.

03 / Skills layer

Agent systems, end to end

06 items

Where I'm strong today: the operating layer most agent projects lack.

Agent architecture: rules, skills, knowledge, memory
Decision routing with human-in-the-loop gates
Autonomous lifecycle: overnight runs, watchdogs, recovery
Multi-session orchestration on parallel workstreams
Verification by outcomes: tests as contracts, regression suites
Memory and capture with measured retrieval quality

Market baseline

04 items

What I'm building on top of that. Today AI writes the implementation under my architecture; I verify by outcomes.

Python AI-directed today
RAG / vector DB own evals built
LangGraph learning
React soon

05 / Connect

Open to interesting conversations: AI engineering, agent systems, devtools, applied AI.

If you're curious how all of this works together, or you want to talk about agents, devtools, or applied AI, get in touch. Happy to walk you through it.

przemotar@gmail.com

Fastest

/in/przemysław-tarkowski

Profil + recent posts

GitHub

@Przemotar

Public repos (selected)