Build Your Own Pipeline
Conceptual guide for designing file discovery and plan generation workflows.
This guide distills the key architectural patterns from PlanToCode into a conceptual blueprint. Whether you want to build a similar system or understand why certain design decisions were made, this document covers the foundational patterns you can reuse or adapt.
Pipeline architecture map
Overview of the multi-stage pipeline from task input to plan output.
Key Architectural Patterns
Job Queue Pattern
All LLM-backed operations run as background jobs with status tracking, cancellation support, and retry logic. Jobs are persisted to SQLite so state survives app restarts.
Benefits
- Decouples UI responsiveness from LLM latency
- Enables cancellation mid-stream
- Provides audit trail of all operations
- Supports retry with exponential backoff
Pitfalls to Avoid
- Job status management adds complexity
- Need careful handling of stale jobs on restart
- Stream accumulation can consume memory for large responses
Workflow Orchestrator Pattern
Multi-stage workflows are coordinated by an orchestrator that schedules stages sequentially, passes intermediate data between them, and handles failures at any stage.
Components
- Definition loader reads workflow JSON specs
- Stage scheduler dispatches stages in order
- Payload builder constructs inputs from prior outputs
- Event emitter publishes progress for UI updates
Repository Pattern
All persistence goes through typed repositories that abstract SQLite operations. This provides a clean API, enables testing, and centralizes database access.
Benefits
- Typed access prevents SQL injection
- Repositories can be mocked for testing
- Centralized query optimization
- Consistent error handling
Pipeline Steps
1. Define your task model
Start by defining what constitutes a task in your system. PlanToCode uses sessions with task descriptions, file selections, and model preferences.
Store task metadata in a dedicated table with versioning for history tracking.
2. Build the job queue
Create a job queue that persists jobs to storage, emits status events, and supports cancellation. Jobs should track prompts, responses, tokens, and cost.
Use a semaphore-based concurrency limiter to control parallel LLM requests.
3. Implement processors
Each job type needs a processor that builds prompts, calls the LLM, and parses responses. Use streaming for long outputs.
Processors should be stateless and receive all context through job parameters.
4. Create the workflow orchestrator
For multi-stage workflows, build an orchestrator that schedules stages, manages intermediate data, and handles failures.
Store workflow definitions as JSON for easy modification without code changes.
5. Add the routing layer
Route LLM requests through a server proxy that normalizes payloads, manages API keys, and tracks usage.
Keep provider credentials on the server; never embed them in desktop clients.
Architecture Decisions
Should you use a local database or server-side storage?
Use local SQLite for job state and artifacts. This enables offline operation and fast queries. Sync to server only for billing and cross-device state.
Streaming vs non-streaming responses?
Use streaming for plan generation and any output shown progressively. Use non-streaming for short transformations like text improvement.
How to handle LLM provider failures?
Implement automatic retry with exponential backoff. Consider a fallback provider like OpenRouter for resilience.
Where should file content be loaded?
Load file content in the processor just before building the prompt. This ensures fresh content and avoids storing large blobs in job records.
What to Customize vs Reuse
Customize
- Prompt templates for your specific use case
- File discovery patterns for your project types
- Output format (XML, JSON, Markdown)
- Model selection per task type
Reuse
- Job queue architecture with status tracking
- Workflow orchestrator pattern
- Repository pattern for persistence
- Streaming response handling
- Provider routing and normalization
Common Pitfalls to Avoid
Embedding API keys in the client
Route all LLM requests through a server proxy that manages credentials securely.
Not persisting job state
Store every job with full prompt and response for audit and recovery.
Blocking UI on LLM calls
Use background jobs with event-driven UI updates for responsive interfaces.
Ignoring token limits
Estimate tokens before sending and chunk large inputs to stay within context windows.
No cancellation support
Check cancellation flags between streaming chunks and propagate to server.
Artifacts to Persist
- Full prompt sent to the LLM (for debugging and audit)
- Complete response including streaming accumulation
- Token counts from provider response
- Computed cost based on model pricing
- System prompt template identifier for versioning
- Workflow intermediate data for multi-stage flows
Implementation Notes
- Use SQLite with WAL mode for concurrent read/write access
- Implement graceful shutdown that marks running jobs as failed
- Add health checks for external dependencies before job processing
- Log all LLM errors with full context for debugging
- Consider caching file content with short TTL to avoid redundant reads