NLP Accuracy Tracker Sprint 3 → 4Gate: F1 ≥ 80%
6 models tracked · Claude labels DeepSeek → DeepSeek fine-tunes our models → weekly predictions improve accuracy
Scores shown as accuracy % · 🟢 ≥80% target met · 🟡 65–79% improving · 🔴 <65% needs work
⟳ Connecting…
Last updated
26 Apr 2026
Claude labels dataGround truth for all tasks
→
DeepSeek V3 trainedTeacher model on Claude labels
→
Our models fine-tunedDeepSeek + Claude labels → v0→v4
→
Weekly predictionsDeepSeek + our models predict
→
DeepSeek improves oursWeekly fine-tuning loop
Sentiment
—
—
Emotion
—
—
Topic
—
—
Intent
—
—
Toxicity
—
—
Model Version History
All 6 models · Accuracy % · ▲▼ = change vs previous version
≥80%
65–79%
<65%
| Model | Date | Sentiment % | Emotion % | Topic % | Intent % | Toxicity % | Avg % | vs v0 | Benchmark | Notes |
|---|
Sprint 4 Gate
Target: all tasks ≥ 80%
Timeline
Newest first
Accuracy Over Time — All Tasks
How each task improved across model versions
Version vs Version — Avg Accuracy
Green = improved · Red = declined
Sentiment Progress
v0 → latest
Emotion Progress
Hardest task · v0 → latest
Topic + Intent Progress
v0 → latest
Toxicity Progress
Target: 95%
Model Accuracy Solar System
Each planet = one model version · Orbit radius = avg accuracy · Size = tasks completed · Hover for details
Hover any planet to see model details · Planets orbit faster as accuracy improves
Pipeline Diagram — Phase 1, 1.5 & 2: Label → Diversify → Train
Click any node to expand · From the Sprint 4 Training Strategy document
Hover any node for details
Pipeline Diagram — Phase 3: Weekly Live Loop (Every Monday)
Three-tier routing system · Animated data flow · Click nodes to drill down
Hover any node for details