MCPMark Benchmarks (2026)

What It Does

MCPMark is a benchmarking framework for evaluating MCP server performance, including token efficiency per operation. The benchmarks reveal that MCP servers vary dramatically in token cost for equivalent operations – some servers consume 3-8x more tokens than others for the same task. These measurements help developers choose the most token-efficient MCP servers and identify optimization targets in custom servers.

Installation / Setup

# MCPMark is available as an open-source benchmarking tool
npm install -g mcpmark
# Run benchmarks against a specific MCP server
mcpmark run --server "npx @modelcontextprotocol/server-filesystem /tmp" --output results.json
# Compare multiple servers
mcpmark compare results-server-a.json results-server-b.json

Configuration for Cost Optimization

The primary optimization insight from MCPMark is which MCP servers to use and how to configure them for minimal token overhead.

# CLAUDE.md -- MCP selection based on benchmark data
## MCP Server Selection
- Use servers with tool definitions under 1,000 tokens each
- Prefer servers that return structured JSON over verbose text
- Avoid servers with more than 8 tool definitions (overhead exceeds 8,000 tokens)
- For database access: direct SQL via Bash often costs fewer tokens than MCP

Usage Examples

Basic Usage

# Benchmark a filesystem MCP server
mcpmark run \
  --server "npx @modelcontextprotocol/server-filesystem /tmp" \
  --tasks read,write,list \
  --output fs-benchmark.json
# Output includes:
# - Tokens per tool definition
# - Tokens per operation (read, write, list)
# - Response size distribution
# - Error rate and retry costs

Advanced: Cost-Saving Pattern

Use benchmark data to choose between competing MCP servers for the same capability.

# Example benchmark comparison: two database MCP servers
Server A (generic SQL MCP):
  Tool definitions: 6 tools, 7,200 tokens total
  Average query response: 1,800 tokens
  Schema introspection: 3,500 tokens
  Per-session overhead (20 turns): 144,000 tokens
Server B (optimized Postgres MCP):
  Tool definitions: 3 tools, 2,100 tokens total
  Average query response: 600 tokens (structured JSON)
  Schema introspection: 800 tokens
  Per-session overhead (20 turns): 42,000 tokens
  Server B saves 102,000 tokens per session
  At Opus rates: $1.53 saved per session
  Monthly (20 sessions): $30.60 saved
// Choose Server B configuration
{
  "mcpServers": {
    "database": {
      "command": "npx",
      "args": ["-y", "optimized-postgres-mcp"],
      "allowedTools": ["query", "schema", "explain"]
    }
  }
}

Token Usage Measurements

Key findings from MCPMark benchmarks across common MCP server categories:

MCP Server Category Avg Tool Definition Size Avg Response Size Tools Exposed Session Overhead (20 turns)
Filesystem (basic) 800 tokens/tool 500 tokens 8 128,000 tokens
Filesystem (optimized) 400 tokens/tool 200 tokens 4 32,000 tokens
Database (generic) 1,200 tokens/tool 1,800 tokens 6 144,000 tokens
Database (optimized) 700 tokens/tool 600 tokens 3 42,000 tokens
Git/GitHub 1,500 tokens/tool 1,200 tokens 12 360,000 tokens
Git/GitHub (filtered) 1,500 tokens/tool 1,200 tokens 3 90,000 tokens
The token efficiency gap across implementations:
Best case (optimized filesystem, 4 tools):
  Definition overhead: 1,600 tokens/turn
  20-turn session: 32,000 tokens
  Opus cost: $0.48
Worst case (unfiltered GitHub, 12 tools):
  Definition overhead: 18,000 tokens/turn
  20-turn session: 360,000 tokens
  Opus cost: $5.40
  Gap: 11.25x more expensive for MCP overhead alone

Comparison with Alternatives

Evaluation Method Measures Effort Accuracy
MCPMark benchmarks Token count, latency, error rate Low (automated) High (real measurements)
Manual /cost tracking Session-level totals Medium (manual per session) Medium (no per-tool breakdown)
Documentation review Claimed tool count Low Low (claims vs reality)
Custom token logging Per-call token counts High (requires instrumentation) High

Troubleshooting

Benchmark results vary between runs – MCP server response sizes can vary based on data state. Run benchmarks 3 times and average the results. Use consistent test data for reproducible measurements.

MCPMark not detecting all tools – Some MCP servers expose tools dynamically. Use mcpmark run --discover to enumerate all available tools before benchmarking.

Results do not match real-world usage – Benchmarks measure isolated tool calls. Real-world sessions include context re-sending, which multiplies the tool definition overhead by the number of turns. Multiply benchmark definition sizes by expected session length for accurate projections.

Configure it → Build your MCP config with our MCP Config Generator.

Estimate tokens → Calculate your usage with our Token Estimator.

Try it: Estimate your monthly spend with our Cost Calculator.

Common Questions

How do I get started with mcpmark benchmarks?

Begin with the setup instructions in this guide. Install the required dependencies, configure your environment, and test with a small project before scaling to your full codebase.

What are the prerequisites?

You need a working development environment with Node.js or Python installed. Familiarity with the command line and basic Git operations is helpful. No advanced AI knowledge is required.

Can I use this with my existing development workflow?

Yes. These techniques integrate with standard development tools and CI/CD pipelines. Start by adding them to a single project and expand once you have verified the benefits.

Where can I find more advanced techniques?

Explore the related resources below for deeper coverage. The Claude Code documentation and community forums also provide advanced patterns and real-world case studies.