Automating Documentation Testing for Open-Source Projects: A Step-by-Step Guide Using AI Agents

What You Need

Before you start, gather the following tools and resources:

Automating Documentation Testing for Open-Source Projects: A Step-by-Step Guide Using AI Agents — Source: azure.microsoft.com

GitHub Copilot CLI – the command-line interface for AI-assisted coding.
Dev Containers – to create reproducible, isolated environments.
A tutorial or getting-started guide you want to test (e.g., for an open-source project).
Docker – for container orchestration.
k3d – optional, if your tutorial uses Kubernetes-in-Docker.
Basic scripting skills (shell, Python, or similar) to orchestrate the agent.
A test environment (local or cloud) where you can run the agent without interfering with production.

Step-by-Step Instructions

Step 1: Identify the Documentation Gaps

Begin by understanding the two main reasons documentation breaks: the curse of knowledge and silent drift.

Curse of knowledge – Experts assume implicit context. For example, when you write “wait for the query to bootstrap,” a new user doesn’t know to run drasi list query or drasi wait.
Silent drift – Code changes (like renaming a config file) don’t fail the documentation. The doc stays outdated until a user complains.

To address both, you need to simulate a naïve, literal, and unforgiving user. This is what your AI agent will do.

Step 2: Set Up a Reproducible Environment with Dev Containers

Use a Dev Container to create an isolated, consistent sandbox for testing. This ensures the agent runs in the same environment every time, matching the conditions a real user would have.

Create a .devcontainer/devcontainer.json file that specifies the base image (e.g., Ubuntu with Docker, k3d, and your project’s dependencies).
Include a script to launch the tutorial environment (e.g., spin up a sample database, start Docker daemon).
Test that the container starts correctly and can run your project’s CLI commands.

This container is where your AI agent will operate. It eliminates “works on my machine” issues.

Step 3: Configure the AI Agent Using GitHub Copilot CLI

Install and set up the GitHub Copilot CLI. This tool will act as the brain of your synthetic user.

Install GitHub Copilot CLI via your package manager or from the GitHub CLI marketplace.
Authenticate with your GitHub account (requires a GitHub Copilot subscription).
In your Dev Container, configure the CLI to run in a non-interactive mode – you’ll feed it instructions from your tutorial script.
Write a wrapper script that calls Copilot CLI with the exact text from each step of your tutorial. For example:
```
copilot explain "execute: drasi init --database postgres"
```

Step 4: Define the Agent’s Behavior – Naïve, Literal, Unforgiving

Create a script that controls the agent’s actions:

Naïve – The agent has no prior knowledge. It cannot infer commands not explicitly stated. If the step says “run the setup script,” and you didn’t define which script, the agent must fail.
Literal – Every command is executed exactly as written. If the tutorial says “type: docker run …” and your agent types docker run--force (with a typo), it should retry only if the tutorial includes that typo.
Unforgiving – After each command, verify the output. If the doc says “You should see ‘Success’,” but the CLI returns nothing, flag an error.

Implement these rules in a test harness (e.g., Python with subprocess). Example:

def execute_step(command, expected_output):
    result = subprocess.run(command, shell=True, capture_output=True, text=True)
    if expected_output and expected_output not in result.stdout:
        raise AssertionError(f"Expected '{expected_output}' but got: {result.stdout}")

Step 5: Run the Agent Against Your Tutorial

Run the agent inside the Dev Container, following the tutorial from start to finish.

Execute the harness script. It will read each step sequentially.
After each step, check for failures:

If a command fails, the agent records the exact error message and the step number.
If output doesn’t match, it logs the discrepancy.
If the tutorial is ambiguous (e.g., no command given for “wait for the query to bootstrap”), the agent halts and reports.

Let the agent run multiple times if needed – reproducibility is key.

This process mimics a brand-new developer who has never seen your project before. Any break in the flow is a real documentation bug.

Step 6: Analyze Failures and Fix Documentation

Collect the logs from the agent and group them by:

Commands that failed – likely due to silent drift (e.g., Docker version upgrade changed a flag).
Missing steps or ambiguous instructions – signs of the curse of knowledge.
Output mismatches – the tutorial might have copy-paste errors or outdated screenshots.

For each issue, update your documentation:

If a command changed, update the command and the expected output in the guide.
If a step is unclear, add explicit details (like “run drasi list query and look for the ‘Running’ status”).
If dependencies changed (e.g., newer Docker version), add a compatibility note or update the tutorial requirements.

Step 7: Automate Regular Testing with CI/CD

To prevent future silent drift, integrate the agent into your continuous integration pipeline.

Schedule the agent to run daily or on every commit to your documentation repository.
Use GitHub Actions (or similar) to spin up a Dev Container, run the agent, and report failures.
Configure notifications to your team when the tutorial breaks.

This turns documentation testing from a manual chore into an automated monitoring system. You’ll catch issues before users do.

Tips for Success

Start small – Test the most critical getting-started guide first. Once you’ve proven the agent works, expand to other tutorials.
Record agent runs – Keep a log of each execution. Over time, you’ll build a history of changes that broke the doc, helping you anticipate future deprecations.
Pair with human reviewers – The agent finds technical bugs, but can’t judge tone, clarity, or logical flow. Use it to complement, not replace, human review.
Version your test environment – Matching the Docker, k3d, and project versions in your agent’s environment is crucial. Pin dependencies in devcontainer.json.
Share your results – Open-source communities love hearing how you’ve improved onboarding. Post a blog or issue about your agent to attract contributors.

By treating documentation testing as a simulation problem, you can leverage AI to catch silent drifts and knowledge gaps early. The payoff: a smoother onboarding experience, fewer frustrated users, and a healthier open-source project.