Agent Step

Execute AI CLI tools with full interactive capabilities including file editing, tool use, and multi-turn conversations.

Overview

The agent step runs Claude Code, OpenAI Codex, or Google Gemini CLI tools. Unlike the AI step, agent steps are interactive and can use tools (file editing, searches, commands).

Use when:

Need file editing capabilities
Want multi-turn interactive conversation
Task requires tool use
Implementation takes >5 minutes

Don't use when:

Just need quick AI response → Use ai step
Need structured data only → Use ai step with schema
Want faster execution → Agent steps timeout at 30min vs 5min for AI

Configuration

interface AgentStepConfig {
  agent: "claude" | "codex" | "gemini";
  prompt: string;
  workingDir?: string;
  context?: Record<string, unknown>;
  permissionMode?: "default" | "plan" | "acceptEdits" | "bypassPermissions";
  mcpConfig?: string[];
  json?: boolean;
  resume?: string;
}

Timeout: 30 minutes (1,800,000ms)

Parameters

agent (required)

"claude" - Claude Code CLI (most capable, recommended)
"codex" - OpenAI Codex CLI (good for code generation)
"gemini" - Google Gemini CLI (experimental, ~70% stable)

prompt (required)

Instruction for the AI agent
Can be multi-line, detailed
Include context, requirements, constraints

workingDir (optional)

Directory where agent executes
Defaults to event.data.projectPath
Agent can read/write files in this directory

context (optional)

Additional context object passed to agent
Useful for structured data (IDs, configs, etc.)

permissionMode (optional, default: "default")

"default" - Interactive, prompt for file edits
"plan" - Read-only, no file modifications
"acceptEdits" - Auto-approve all file edits
"bypassPermissions" - Skip all permission checks (dangerous!)

json (optional, default: false)

Extract JSON from agent response
Agent must return valid JSON in response
Useful for structured output

resume (optional)

Session ID from previous agent step
Continues the conversation
Context carries forward (files read, decisions made)

mcpConfig (optional, Claude only)

Array of MCP server configuration file paths
Enables Model Context Protocol servers for enhanced capabilities
Paths relative to workingDir (e.g., [".mcp.json.github", ".mcp.json.playwright"])
Only supported by Claude agent (ignored for Codex and Gemini)
See MCP example below

Basic Usage

Simple Agent Call

await step.agent("implement-feature", {
  agent: "claude",
  prompt: "Implement user authentication with JWT tokens",
  workingDir: projectPath,
});

With Permission Mode

// Read-only analysis
await step.agent("analyze-code", {
  agent: "claude",
  prompt: "Analyze this codebase for security vulnerabilities",
  permissionMode: "plan", // No file edits
});

With Auto-Approve Edits

// Auto-approve all edits (use carefully!)
await step.agent("refactor", {
  agent: "claude",
  prompt: "Refactor auth module to use async/await",
  permissionMode: "acceptEdits",
});

Advanced Usage

Extract JSON Response

interface AnalysisResult {
  complexity: number;
  issues: Array<{ severity: string; description: string }>;
  recommendations: string[];
}

const result = await step.agent<AnalysisResult>("analyze", {
  agent: "claude",
  prompt: `Analyze this codebase and return JSON with this structure:
{
  "complexity": <1-10 score>,
  "issues": [{ "severity": "high|medium|low", "description": "..." }],
  "recommendations": ["...", "..."]
}`,
  json: true,
  permissionMode: "plan",
});

// result.data is typed as AnalysisResult
console.log(`Complexity: ${result.data.complexity}`);
console.log(`Found ${result.data.issues.length} issues`);

Resume Previous Session

const ctx: { planSession?: string } = {};

// Phase 1: Plan
await step.phase("plan", async () => {
  const result = await step.agent("architect", {
    agent: "claude",
    prompt: "Design the authentication system",
    permissionMode: "plan",
  });
  ctx.planSession = result.data.sessionId;
});

// Phase 2: Implement (continues planning conversation)
await step.phase("implement", async () => {
  await step.agent("code", {
    agent: "codex",
    prompt: "Implement the auth design from the planning phase",
    resume: ctx.planSession,
  });
});

With Slash Commands

import { buildSlashCommand } from "agentcmd-workflows";

await step.agent("generate-spec", {
  agent: "claude",
  json: true,
  prompt: buildSlashCommand("/cmd:generate-feature-spec", {
    context: "User authentication with OAuth2"
  }),
});

With Additional Context

await step.agent("implement-endpoint", {
  agent: "claude",
  prompt: "Implement the GET /users/:id endpoint",
  context: {
    userId: "user-123",
    apiVersion: "v2",
    authRequired: true,
  },
});

With MCP Servers

Enable Model Context Protocol (MCP) servers to extend agent capabilities with custom tools and resources:

// MCP server configuration example
// .mcp.json.github
{
  "mcpServers": {
    "github": {
      "command": "npx",
      "args": ["-y", "@modelcontextprotocol/server-github"],
      "env": {
        "GITHUB_TOKEN": "ghp_..."
      }
    }
  }
}

// Use MCP servers in workflow
await step.agent("analyze-prs", {
  agent: "claude",
  prompt: "Analyze recent pull requests and create a summary report",
  mcpConfig: [
    ".mcp.json.github",
    ".mcp.json.playwright"
  ],
  workingDir: projectPath,
});

What MCP provides:

Custom tools - GitHub API access, database queries, browser automation
Resources - File system access, API endpoints, data sources
Enhanced prompts - Domain-specific prompt templates

Common MCP servers:

@modelcontextprotocol/server-github - GitHub API integration
@modelcontextprotocol/server-filesystem - Enhanced file operations
@modelcontextprotocol/server-postgres - PostgreSQL database access
@playwright/mcp - Browser automation and testing

Return Value

Agent steps return AgentStepResult:

interface AgentStepResult<T = unknown> {
  data: T;                    // Extracted data (JSON mode) or response text
  sessionId: string;          // Session ID for resumption
  exitCode: number;           // 0 = success, non-zero = error
  tool: "claude" | "codex" | "gemini";
}

Example:

const result = await step.agent("code", {
  agent: "claude",
  prompt: "Implement feature X",
});

console.log(result.sessionId); // "abc123..."
console.log(result.exitCode);  // 0
console.log(result.tool);      // "claude"

Error Handling

Try/Catch

try {
  await step.agent("risky-operation", {
    agent: "claude",
    prompt: "Refactor critical system",
  });
} catch (error) {
  await step.annotation("error", {
    message: `Agent failed: ${error}. Rolling back...`,
  });
  // Rollback logic
}

Timeout Override

await step.agent("long-task", {
  agent: "claude",
  prompt: "Implement entire feature from spec",
}, {
  timeout: 3600000, // 60 minutes (default is 30min)
});

Check Exit Code

const result = await step.agent("build-feature", {
  agent: "claude",
  prompt: "Implement feature X",
});

if (result.exitCode !== 0) {
  throw new Error("Agent execution failed");
}

Agent Comparison

Feature	Claude	Codex	Gemini
Stability	✓✓✓	✓✓✓	✓✓ (70%)
Planning	✓✓✓ Excellent	✓✓ Good	✓✓ Good
Coding	✓✓✓ Excellent	✓✓✓ Excellent	✓✓ Good
Reasoning	✓✓✓ Best	✓✓ Good	✓✓ Good
Speed	✓✓ Moderate	✓✓✓ Fast	✓✓ Moderate
Cost	$$	$$	$

Recommendation: Start with Claude for planning and complex tasks, use Codex for pure code generation.

Common Patterns

Claude Plans, Codex Implements

const ctx: { planSession?: string } = {};

// Claude analyzes and plans
const plan = await step.agent("plan", {
  agent: "claude",
  prompt: "Design the feature architecture",
  permissionMode: "plan",
});
ctx.planSession = plan.data.sessionId;

// Codex implements
await step.agent("implement", {
  agent: "codex",
  prompt: "Implement the design from planning",
  resume: ctx.planSession,
});

let session: string | undefined;

for (let i = 1; i <= 3; i++) {
  const result = await step.agent(`attempt-${i}`, {
    agent: "claude",
    prompt: session
      ? "Fix issues from previous attempt"
      : "Implement feature X",
    resume: session,
  });
  session = result.data.sessionId;

  // Check if tests pass
  const test = await step.cli("test", { command: "pnpm test" });
  if (test.exitCode === 0) break;
}

Multi-Agent Review

// Codex implements
const impl = await step.agent("implement", {
  agent: "codex",
  prompt: "Implement user authentication",
});

// Claude reviews
const review = await step.agent("review", {
  agent: "claude",
  prompt: "Review the implementation for security issues",
  resume: impl.data.sessionId,
  permissionMode: "plan",
});

// Codex fixes issues
await step.agent("fix", {
  agent: "codex",
  prompt: "Fix the issues found in review",
  resume: review.data.sessionId,
});

Best Practices

Write Clear Prompts

✅ Good - Specific, actionable:

prompt: `Implement JWT authentication for the /api/auth endpoints.
Requirements:
- Use jsonwebtoken library
- Tokens expire in 24 hours
- Store secret in environment variable
- Add tests for token generation and validation`

❌ Bad - Vague:

prompt: "Add authentication"

Use Appropriate Permission Modes

// Analysis/planning - read-only
permissionMode: "plan"

// Interactive development - get approval
permissionMode: "default"

// Automated workflows - auto-approve (carefully!)
permissionMode: "acceptEdits"

// Never use in production:
permissionMode: "bypassPermissions" // Dangerous!

Save Session IDs for Resumption

interface Context {
  sessions: {
    plan?: string;
    implement?: string;
  };
}

const ctx: Context = { sessions: {} };

const result = await step.agent("plan", { ... });
ctx.sessions.plan = result.data.sessionId; // Save it!

Match Tools to Tasks

Claude: Planning, architecture, complex reasoning
Codex: Pure code generation, implementation
Gemini: Experimental, cost-sensitive projects

Next Steps

Recursive Workflows

Advanced workflows with multiple agents

Context Sharing

Share data between workflow steps

Type-Safe Slash Commands

Build reusable prompts with type-safe args

AI Step

Simpler AI step for non-interactive prompts