Docker Deploys Autonomous AI Agent Fleet to Ship Code Faster, Revolutionizing Testing and Bug Fixing
Breaking News: Docker’s Coding Agent Sandboxes Team Unleashes ‘Fleet’ of Seven AI Agents for Autonomous Development
Docker has announced a groundbreaking initiative: a virtual team of seven AI agents—dubbed the Fleet—that autonomously tests products, triages issues, posts release notes, and even fixes bugs. All operations run entirely in CI, marking a major leap toward self-driving software development pipelines.

“The Fleet isn’t just automation; it’s a team of reasoning agents that investigate failures and make decisions in real time,” said a senior engineer at Docker. “This shifts CI from a passive script executor to an active problem solver.”
The initiative builds on Claude Code skills—role-definition files that give each agent a persona, responsibilities, and allowed tools. Unlike traditional scripts that execute step-by-step, a skill file tells an agent, “You are the build engineer; here’s how you reason.” That nuance is critical: when a test fails unexpectedly, a script stops dead, but a Fleet agent investigates the root cause on the fly.
Learn more about the technology behind the Fleet.
Background: Secure MicroVM Isolation Meets AI-Driven CI
The Coding Agent Sandboxes (sbx) project at Docker provides secure, microVM-based isolation for running AI coding agents such as Claude Code, Gemini, Codex, Docker Agent, and Kiro. Each sandbox gives an agent full autonomy (its own Docker daemon, network, and filesystem) without touching the host system.
Over the past several weeks, the team built the Fleet atop this infrastructure. Comprising seven distinct agent roles—including a /cli-tester, a triage specialist, a release manager, and a bug fixer—the Fleet operates continuously across macOS, Linux, and Windows. Every release now undergoes autonomous testing on all three platforms, including upgrade-path verification and sustained load testing to catch resource leaks.
How the Fleet Works: Skills That Think, Not Just Execute
Each Fleet agent is powered by Claude Code skills: markdown files that describe a persona, a set of responsibilities, and permitted tools. The same skill file works identically whether run on a developer’s laptop or in CI.
“We didn’t start by writing a GitHub workflow,” explained the Docker engineer. “We invoked the /cli-tester locally first—watched it build binaries, exercise CLI commands, find issues, and report them. Only after we got the behavior right did we wire it into CI.”
This local-first philosophy eliminates the painful commit-push-wait-read-logs cycle. Iteration on a skill takes seconds in a terminal. CI becomes merely another runtime for the same skill, with the workflow setting up the environment and calling it—no “CI version” or translation layer required.

Local First, CI Second: The Design Principle That Makes the Fleet Practical
The Fleet’s local-first approach ensures that debugging an agent is as fast as running it interactively. The /cli-tester skill that runs nightly on all three platforms is the exact same file invoked from a developer’s terminal. This consistency reduces complexity and accelerates iteration.
- Faster debugging: See the agent think in real time; fix confusion immediately.
- No translation layer: One skill, two runtimes—no diverging behaviors.
- Seamless scaling: Add new agents by writing a skill once, then running it anywhere.
Jump to ‘What This Means’ for the industry.
What This Means: The End of Traditional CI Scripts?
Docker’s Fleet signals a shift from static automation to autonomous, reasoning CI agents. Instead of maintaining brittle test scripts that fail silently, teams can deploy agents that adapt, investigate, and even fix problems without human intervention.
Industry observers note that this could dramatically reduce the toil of release management. “If a fleet of agents can triage a backlog and patch a bug in the same workflow, that’s a massive productivity multiplier,” commented an external AI/CI researcher. “We’re moving toward ‘self-healing’ pipelines.”
For Docker, the immediate benefit is faster shipping with fewer manual checks. The /cli-tester alone catches regressions across three OSes automatically. The triage agent reduces issue backlog without draining developer time. And the release-notes agent ensures daily visibility into what shipped.
Docker plans to open-source the Fleet’s skill structure, inviting other teams to adopt the pattern. If replicated widely, the traditional CI script—brittle, platform-specific, and mindless—could soon be a relic of the past.
Related Articles
- New Research Shows Users Overestimate AI Certainty — Experts Warn of Misplaced Trust
- How to Create Self-Improving AI with MIT's SEAL Framework
- How to Use Codex on Your Phone via the ChatGPT App
- AI Showdown: Which Chatbot Gives the Best Advice for Selling Your Car?
- 9 Key Capabilities of OpenAI’s GPT-5.5 on Microsoft Foundry
- How Docker Built a Virtual Agent Fleet to Ship Faster: Inside the Coding Agent Sandboxes Team
- U.S. Department of War Partners with Seven AI Giants for Secure LLM Deployment on Classified Networks
- How to Deploy AWS's Latest Agentic AI Tools for Your Business