Claude Code for Promptfoo (2026)
The Setup
You are using Promptfoo for systematic LLM prompt evaluation — testing prompts against multiple models with assertion-based grading. Promptfoo helps you catch prompt regressions, compare model performance, and document prompt quality. Claude Code can write Promptfoo configurations, but it creates one-off scripts instead of structured evaluation configs.
What Claude Code Gets Wrong By Default
-
Tests prompts manually with one-off scripts. Claude writes Node.js scripts that call the API and print results. Promptfoo provides a structured YAML config with test cases, assertions, and comparison views.
-
Evaluates against a single model. Claude tests with one provider. Promptfoo runs the same prompts against multiple models simultaneously for side-by-side comparison.
-
Skips assertion-based grading. Claude visually inspects output. Promptfoo supports assertions:
contains,icontains,javascript,llm-rubric,similarfor automated quality checks. -
Does not track prompt versions. Claude overwrites prompts without versioning. Promptfoo evaluations are reproducible — the YAML config serves as version-controlled prompt documentation.
The CLAUDE.md Configuration
# Promptfoo LLM Evaluation
## Eval Framework
- Tool: Promptfoo (prompt testing and evaluation)
- Config: promptfooconfig.yaml at project root
- CLI: npx promptfoo eval, npx promptfoo view
## Promptfoo Rules
- Config in promptfooconfig.yaml (YAML format)
- Providers: list of models to test against
- Prompts: template strings with {{variable}} placeholders
- Tests: array of test cases with vars and assertions
- Assertions: contains, icontains, javascript, llm-rubric, similar
- Run: npx promptfoo eval (executes all tests)
- View: npx promptfoo view (opens comparison UI)
- Cache: results cached by default for reproducibility
## Conventions
- promptfooconfig.yaml committed to version control
- Test cases cover edge cases and expected behaviors
- Use llm-rubric for subjective quality evaluation
- Compare models: claude, gpt-4, llama in providers list
- Variable datasets for comprehensive testing
- Share results with npx promptfoo share
- CI integration: npx promptfoo eval --no-cache in pipeline
Workflow Example
You want to evaluate a customer support prompt across models. Prompt Claude Code:
“Create a Promptfoo config that tests a customer support agent prompt across Claude, GPT-4, and Llama. Include 5 test cases covering complaint handling, refund requests, and technical support. Add assertions for tone (polite), accuracy (mentions policy), and response length.”
Claude Code should create promptfooconfig.yaml with three providers, the system prompt as a template, five test cases with vars for different customer messages, and assertions using contains for policy references, javascript for length checks, and llm-rubric for tone evaluation.
Common Pitfalls
-
Missing variable syntax in prompts. Claude uses
${variable}JavaScript template literals. Promptfoo uses{{variable}}Mustache-style syntax in prompt templates. Wrong syntax means variables are not substituted and tests pass with empty values. -
Over-relying on exact match assertions. Claude uses
equalsassertions for LLM output. LLM responses vary between runs. Usecontains,icontains, orsimilarfor fuzzy matching, andllm-rubricfor semantic evaluation. -
Cache confusion during development. Claude modifies prompts but gets old results. Promptfoo caches by default. Use
npx promptfoo eval --no-cachewhen iterating on prompts, or clear cache withnpx promptfoo cache clear.
Find the right skill → Browse 155+ skills in our Skill Finder.
Related Guides
Try it: Paste your error into our Error Diagnostic for an instant fix.
- Claude Code for AI Agent Tool Calling
- Building Production AI Agents with Claude Skills 2026
- Claude Code for Embedding Pipeline Workflow
Related Articles
- Claude Code for Hygen Code Generation Workflow
- Claude Code for Translation Key Extraction Workflow
- Claude Code Portuguese Developer Coding Workflow Setup
- Claude Code for Production Profiling Workflow Guide
- Claude Code for Configure8 Portal Workflow Guide
- Claude Code for Gymnasium Workflow Tutorial
- Claude Code Solo SaaS Builder Launch Checklist Workflow
- How to Use Anvil Local Fork (2026)
Common Questions
How do I get started with claude code for promptfoo?
Begin with the setup instructions in this guide. Install the required dependencies, configure your environment, and test with a small project before scaling to your full codebase.
What are the prerequisites?
You need a working development environment with Node.js or Python installed. Familiarity with the command line and basic Git operations is helpful. No advanced AI knowledge is required.
Can I use this with my existing development workflow?
Yes. These techniques integrate with standard development tools and CI/CD pipelines. Start by adding them to a single project and expand once you have verified the benefits.
Where can I find more advanced techniques?
Explore the related resources below for deeper coverage. The Claude Code documentation and community forums also provide advanced patterns and real-world case studies.