Documentation
Agent Cost Optimization
Reduce AI costs by 70%+ with multi-model routing, batch processing, response caching, and quality-based fallback strategies for agents and automation workflows.
Save 70%+ on AI Costs
Most agents waste money using expensive models for simple tasks. SkillBoss enables intelligent cost optimization.
Strategy 1: Multi-Model Routing
Route to cheapest model that meets quality requirements:
def cost_aware_request(prompt: str, min_quality: float):
models = [
{"name": "gemini/gemini-2.5-flash", "cost": 0.075, "quality": 0.85},
{"name": "deepseek/deepseek-r1", "cost": 0.14, "quality": 0.90},
{"name": "claude-4-5-sonnet", "cost": 15.00, "quality": 0.98}
]
# Select cheapest model meeting quality threshold
for model in models:
if model["quality"] >= min_quality:
return use_model(model["name"], prompt)
# Simple task: use Gemini Flash (200x cheaper than Claude)
result = cost_aware_request("Summarize this text", min_quality=0.80)
# Complex task: use Claude (worth the cost)
result = cost_aware_request("Write legal contract", min_quality=0.95)
Savings: 92% on average
Strategy 2: Batch Processing
Reduce API overhead by batching:
# ❌ Expensive: 1000 separate API calls
for item in items:
process(item) # $1.00 API overhead × 1000 = $1,000
# ✅ Cheap: 10 batch calls
for batch in chunks(items, 100):
process_batch(batch) # $1.00 × 10 = $10
# Savings: $990 (99%)
Strategy 3: Caching
Cache identical requests:
from functools import lru_cache
@lru_cache(maxsize=10000)
def cached_llm(prompt: str):
return skillboss.chat(prompt)
# First call: $0.01
result1 = cached_llm("What is AI?")
# Subsequent calls: $0.00 (cached)
result2 = cached_llm("What is AI?")
result3 = cached_llm("What is AI?")
# Savings: $0.02 per cache hit
Strategy 4: Quality-Based Fallback
Try cheap first, upgrade if needed:
def smart_request(prompt: str):
# Try Gemini Flash ($0.075/1M)
result = skillboss.chat(prompt, model="gemini-flash")
# Check quality
if quality_score(result) < 0.85:
# Retry with Claude ($15/1M)
result = skillboss.chat(prompt, model="claude-4-5")
return result
# 80% of requests succeed with cheap model
# Only 20% need expensive model
# Average cost: $3.075/1M vs $15/1M
# Savings: 79.5%
Strategy 5: Context Window Optimization
Use models with larger context windows to reduce API calls:
# ❌ Expensive: Multiple calls with small context
responses = []
for chunk in document_chunks:
response = skillboss.chat(chunk, model="gpt-4o") # 128K context
responses.append(response)
# Cost: 10 calls × $0.15 = $1.50
# ✅ Cheap: Single call with large context
response = skillboss.chat(entire_document, model="gemini-2.5-flash") # 1M context
# Cost: 1 call × $0.075 = $0.075
# Savings: $1.425 (95%)
Strategy 6: Precompute Common Tasks
Generate common responses once, reuse forever:
# Precompute 100 common responses
common_questions = load_faq()
precomputed = {}
for q in common_questions:
precomputed[q] = skillboss.chat(q, model="claude-4-5")
# One-time cost: 100 × $0.01 = $1.00
# Serve from precomputed cache
def answer_question(question):
if question in precomputed:
return precomputed[question] # $0.00
else:
return skillboss.chat(question) # $0.01
# 90% of questions are common
# Savings: 90% × $0.01 = $0.009 per request
Real-World Optimization Examples
Example 1: Content Creator Agent
Before optimization:
- Uses Claude 4.5 for all 50 posts/day
- Cost: 50 × $0.30 = $15/day
After optimization:
- 40 posts: Gemini Flash ($0.002 each) = $0.08
- 10 posts: Claude 4.5 ($0.30 each) = $3.00
- Total: $3.08/day
- Savings: $11.92/day (79%)
Example 2: Research Agent
Before:
- Claude 4.5 for all 100 documents/day
- Cost: 100 × $0.75 = $75/day
After:
- Batch process 100 docs with Gemini 2.5 Flash (1M context)
- Cost: $3.75/day
- Savings: $71.25/day (95%)
Cost Monitoring Dashboard
Track optimization impact:
analytics = skillboss.get_analytics(period="last_30_days")
print(f"""
Cost Optimization Report
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
Total spent: ${analytics['total_spent']}
Average cost/day: ${analytics['daily_average']}
Model Usage:
- Gemini Flash: {analytics['gemini_percent']}% (cheapest)
- DeepSeek: {analytics['deepseek_percent']}%
- Claude: {analytics['claude_percent']}% (most expensive)
Optimization Opportunities:
{analytics['recommendations']}
Potential monthly savings: ${analytics['potential_savings']}
""")
Next Steps
📄
Multi-Model Routing
Automatic model selection
📄
Agent Pricing
Full pricing breakdown
📄
Usage Tracking
Monitor your optimization
📄
Budget Management
Set spending limits