Case Studies

Real Projects, Written Up Properly

A handful of recent annotation projects, in the depth our model teams actually want. The labeling schema we used, the workflow that produced the data, the numbers we hit, and the research that shaped each approach.

Clients are anonymized under NDA. The numbers describe what shipped, not what we hoped to ship at kickoff.

Generative Video / Image Quality Assessment

Video QualityIQA / VQAGenerative VideoPairwise Preference

Subjective Video Quality Scoring at 98% Agreement for a Generative Video Model Team

Cross-continental rater pool across the USA, UK, India, and Bangladesh scored 16,000 model-generated videos on noise, sharpness, exposure, color, and overall quality. Pairwise preferences across A/B variants and free-text reasoning fed reward modeling. The subjective brief made the 98% target the hard part.

Volume: 16,000 videos, ≈80,000 A/B pairs, ≈640,000 metric judgmentsDuration: 12 weeksPublished: 2025-06

Read the case study

Document AI / Financial Services

PDF ExtractionDocument AILayoutLMTables

Structured Extraction From 50,000 Financial Documents for a Document AI Vendor

How a layered annotation pipeline modeled on LayoutLMv3 and TableFormer raised field-level extraction accuracy from 71% to 94.3% on invoices, contracts, and bank statements.

Volume: 50,000 mixed PDFsDuration: 8 weeksPublished: 2025-04

Read the case study

Robotics / Imitation Learning

RoboticsImitation LearningAction SegmentationMulti-Camera

Action Trajectory Labeling for a Robotics Lab Training Manipulation Policies

Fine-grained per-frame action segmentation across 220,000 multi-camera frames raised held-out task success from 41% to 73% on a 7-DOF arm. Annotation schema drew from RT-1 and Open X-Embodiment.

Volume: 220,000 multi-camera framesDuration: 11 weeksPublished: 2025-03

Read the case study

Healthcare / Medical Imaging

Medical ImagingHistopathologySegmentationHIPAA

Whole-Slide Pathology Annotation for a Histopathology AI Vendor

Board-certified pathologists annotated 8,400 whole-slide images for tumor region segmentation and nuclei instance labeling, narrowing the model's hospital-by-hospital performance gap from 18% to 4%.

Volume: 8,400 whole-slide images (~120GB)Duration: 14 weeksPublished: 2025-02

Read the case study

Agentic AI / AI Safety Evaluation

Agentic AIAI Safety EvalDecision QualityIncident Response

Decision-Quality Annotation for an Agentic AI in Security Incident Response

Per-attribute appropriateness and visibility labels across 1,200 scenarios separated principled signal use from organizational pressure for an incident-commander agent. The result was a labeled benchmark the client used to train and evaluate decision behavior at scale.

Volume: 1,200 scenarios, ≈16,800 attribute annotationsDuration: 9 weeksPublished: 2025-05

Read the case study

Robotics / Vision-Language Foundation Models

RoboticsVision-Language-ActionVideo AnnotationMulti-View

Scaling Multi-View Robotic Video Annotation From Manual Process to 1,000-Hour Ramp

How a managed annotation pipeline replaced engineer-led labeling for a robotic foundation model team, hitting October readiness for a November training ramp on 1,000 hours of multi-view video with action, object, and spatial language labels.

Volume: 1,000 hours of multi-view robotic videoDuration: 8 weeks (pilot + ramp, October readiness for November training)Published: 2025-10

Read the case study

Have a project like these?

Share a sample task or project brief. We will recommend the right workflow, expert team, timeline, and pricing model.

Start a Pilot Explore Services