NLP Accuracy Tracker Sprint 3 → 4Gate: F1 ≥ 80%

6 models tracked  ·  Claude labels DeepSeek → DeepSeek fine-tunes our models → weekly predictions improve accuracy
Scores shown as accuracy %  ·  🟢 ≥80% target met  ·  🟡 65–79% improving  ·  🔴 <65% needs work

⟳ Connecting…
Last updated
26 Apr 2026
🏷️
Claude labels dataGround truth for all tasks
🤖
DeepSeek V3 trainedTeacher model on Claude labels
🔬
Our models fine-tunedDeepSeek + Claude labels → v0→v4
📊
Weekly predictionsDeepSeek + our models predict
⬆️
DeepSeek improves oursWeekly fine-tuning loop
Sentiment
Emotion
Topic
Intent
Toxicity
Model Version History
All 6 models · Accuracy % · ▲▼ = change vs previous version
≥80% 65–79% <65%
Model Date Sentiment % Emotion % Topic % Intent % Toxicity % Avg % vs v0 Benchmark Notes
Sprint 4 Gate
Target: all tasks ≥ 80%
Timeline
Newest first
Accuracy Over Time — All Tasks
How each task improved across model versions
Version vs Version — Avg Accuracy
Green = improved · Red = declined
Sentiment Progress
v0 → latest
Emotion Progress
Hardest task · v0 → latest
Topic + Intent Progress
v0 → latest
Toxicity Progress
Target: 95%
Model Accuracy Solar System
Each planet = one model version · Orbit radius = avg accuracy · Size = tasks completed · Hover for details
Hover any planet to see model details · Planets orbit faster as accuracy improves
Pipeline Diagram — Phase 1, 1.5 & 2: Label → Diversify → Train
Click any node to expand · From the Sprint 4 Training Strategy document
Hover any node for details
Pipeline Diagram — Phase 3: Weekly Live Loop (Every Monday)
Three-tier routing system · Animated data flow · Click nodes to drill down
Hover any node for details
>