Dev Workflows as Code
I've reached a point where I've automated the development process for a single feature. Claude Code will take a github issue, implement it…
I’ve reached a point where I’ve automated the development process for a single feature. Claude Code will take a github issue, implement it, run code reviews locally, raise a PR, respond to the PR feedback, and then tell me when it’s mergeable and ready for my review.
I was very happy with the setup, but not fully satisfied. It was reliable but not bullet proof. And there were still some niggles to iron out to make it solid.
To get to that solution, the answer was to use real code. The current approach relies on Claude running steps in order, aka--please-work
I’d extracted some logic to bash scripts to make it partly deterministic and leveraged subagents to decouple the main agent, but there was still room for the main agent to improvise (that’s a negative in this context).
With real code, this problem is solved. And there are many other advantages too. I recommend it.
Current challenges
As I explained in the last post, the current solution looks like this:
/complete-task
│
▼
Main agent runs pipeline directly
│
├── [Bash] Verify gate → FAIL → return to main
├── [Subagent] code-review subagent → FAIL → return findings + next steps
├── [Subagent] Task-check subagent → FAIL → return issues
├── [Bash] submit-pr subagent → captures output
├── Check reviewDecision (CHANGES_REQUESTED = not mergeable)
├── Address ALL CodeRabbit feedback (comments + nitpicks)
├── [Bash] Re-run submit-pr if fixes made
└── Return SUCCESS with PR ready to merge
Most of the steps are bash or subagent, removing some orchestration and decision-making from the main agent. But the main agent still has to orchestrate the workflow (because subagents cannot spawn other subagents… I tried that).
The main agent has just implemented a feature so it’s context is window is occupied. It’s not fully focused on orchestrating this process and following the precise instructions, it’s a bit tipsy wobbling from side-to-side.
It sometimes handles responses from each step in different ways, like not responding to CodeRabbit feedback, or re-running the submit-pr step without starting the full pipeline again.
Rather than keep trying to write better prompts and pleading harder with Claude, I decided to use real code.
Real Code
With real code you get determinism. The orchestration logic is mechanical based on simple rules. You don’t need an LLM for that. The only part of the process that needs an LLM is the code review. So I put all of the orchestration in code and used Claude Code SDK for the code review steps.
What I love about the solution is that it’s largely type safe. The input and output of each step of the workflow is strongly typed. I’m using Zod schemas in TypeScript.
Now, the workflow looks like this…
Claude Command
Rather than being given a workflow to orchestrate, Claude is just told to run an nx command in the /.claude/commands/complete-task command.
# Complete Task
Run the complete-task pipeline to verify, review, and submit your work.
## Instructions
Run this command with a **10-minute timeout**:
```bash
pnpm nx run dev-workflow:complete-task
```
This is a long-running command that:
- Runs local verification (lint, typecheck, test)
- Executes code review agents
- Submits PR and waits for CI checks
Parse the JSON response and follow the `nextInstructions` field exactly.
It’s now a black box. Claude doesn’t know about the mechanics, it just waits for a result indicating the task succeeded or failed. If it failed there will be a list of errors to deal with.
Dev workflow as code
Now the complete-task command is implemented like this:
runWorkflow<CompleteTaskContext>(
[
verifyBuild,
codeReview,
submitPR,
fetchPRFeedback,
],
buildCompleteTaskContext,
(result: WorkflowResult, ctx: CompleteTaskContext) => formatCompleteTaskResult(result, ctx.prUrl),
)
verifyBuild— runs build, lint, testcodeReview— spins up multiple Claude Code subagents to do the reviewsubmitPr— creates PR on GitHub and waits for checks to completefetchPRFeedback— queries the PR for the status of all checks and all unresolved conversations (both bots like CodeRabbit and humans)
I built a small library to extract away the mechanics so the workflows themselves are declarative (highly experimental don’t copy this!). And I use existing typescript SDKs for interacting with github.
Composable workflow steps
Each step in the worfklow conforms to the following types. This makes it easy to compose steps, and to decouple declarative steps from generic infrastructure for running steps.
export function runWorkflow(steps: Step[]): void {
executeWorkflow(steps).catch(handleWorkflowError)
}
export type Step = (ctx: WorkflowContext) => Promise<StepResult>
export type StepResult =
| { type: 'success' }
| {
type: 'failure'
nextAction: NextAction
details: unknown
}
A step looks like this:
export const verifyBuild: Step<CompleteTaskContext> = async () => {
const result = await nx.runMany(['lint', 'typecheck', 'test'])
if (result.failed) {
return failure({
type: 'fix_errors',
details: result.output
})
}
return success()
}
I did look for existing libraries but didn’t find anything that seemed lightweight enough. Let me know if there is something lightweight and robust already available.
Claude Code SDK
We still need AI to perform the code review, and that can’t be done in pure code (you can invoke external services like CodeRabbit via CLI, but here we want to do a local code review with our own agent first).
Instead of the main Claude Code agent spinning up subagents. Now, TypeScript will orchestrate Claude Code agents like this:
import { query } from '@anthropic-ai/claude-agent-sdk'
const result = await query({
prompt: opts.prompt,
options: {
model: opts.model,
maxTurns: 50,
outputFormat: {
type: 'json_schema',
schema: z.toJSONSchema(opts.outputSchema),
},
},
})
You pass in a prompt and your options and you get a result back. The outputFormat is extremely useful here. You can pass a json schema and the SDK will ensure the response from Claude is valid against the schema.
So my code review step looks like this:
const reviewerNames = ['code-review', 'bug-scanner']
const results = await runReviewers(reviewerNames, filesToReview, ctx.reviewDir, ctx.taskDetails)
async function runReviewers(
names: readonly string[],
filesToReview: string[],
reviewDir: string,
taskDetails?: {
title: string
body: string
},
): Promise<ReviewerResult[]> {
return Promise.all(
names.map(async (name) => {
const agentPath = `.claude/agents/${name}.md`
const basePrompt = await readAgentPrompt(agentPath)
const reportPath = `${reviewDir}/${name}.md`
const promptParts = [basePrompt, '\n\n## Files to Review\n\n', filesToReview.join('\n')]
if (name === 'task-check' && taskDetails) {
promptParts.push(
`\n\n## Task Details\n\nTitle: ${taskDetails.title}\n\nBody:\n${taskDetails.body}`,
)
}
const response = await claude.query({
prompt: promptParts.join(''),
model: 'sonnet',
outputSchema: agentResponseSchema,
outputPath: reportPath,
})
...
}),
)
}
The runReviewers method is based on a convention. It takes a list of reviewer names and for each one it looks in .claude/agents/ for a prompt matching the name which contains the review instructions.
Then it passes the prompt to the Claude SDK with the files to review and gets the output back in the valid JSON structure. You have to add instructions in the prompt to tell Claude to return the result in that format of course.
Workflow enforcement
One key point I think is worth mentioning is that when you automate your workflow, you don’t want Claude bypassing it and calling commands directly. That’s why hooks in Claude Code are important.
For example, I have these hooks that block Claude from directly running the gh cli tool and remind it of the correct dev workflow commands to run instead:
if [[ "$command" =~ (^|[[:space:]])git[[:space:]]+push($|[[:space:]]) ]]; then
jq -n '{
"hookSpecificOutput": {
"hookEventName": "PreToolUse",
"permissionDecision": "deny",
"permissionDecisionReason": "Blocked: Direct git push bypasses required workflow. Use /complete-task command instead, which runs the complete verification pipeline (lint, test, code review, PR submission) and prevents orphaned changes."
}
}'
exit 0
fi
if [[ "$command" =~ (^|[[:space:]])gh[[:space:]]+pr($|[[:space:]]) ]]; then
jq -n '{
"hookSpecificOutput": {
"hookEventName": "PreToolUse",
"permissionDecision": "deny",
"permissionDecisionReason": "Blocked: Do not use gh pr directly. Use:\n- /complete-task - Create/update PR, run reviews, submit, check CI\n- pnpm nx run dev-workflow:get-pr-feedback - Check PR feedback and status (mergeable?)"
}
}'
exit 0
fi
Very rough
The ideas in this post are the latest evolution of my software development workflow automation. I am learning and evolving the process every week.
I’m not telling you to do this, I’m just sharing that this is the best approach I have found so far. Conceptually, putting the orchestration in real code and shelling out to LLMs only where they are truly needed makes sense. And in practice it seems to be working.
Maybe there are tools that solve this in a better way or maybe I just need to learn how to write better prompts.
But regardless of the implementation, one thing seems inevitable to me: we’re going to see the SDLC being automated in 2026.