Client Success Story

Image Annotation for AI Agents: Preparing Training Data for Automated Food Order Verification Model

20,000+

Annotated Images
Delivered

98%

Annotation Accuracy
Maintained

Service

  • Image Annotation

Platform

  • CVAT
The client

An AI Company in the Restaurant and Food Delivery Technology Sector

This client provides operational intelligence solutions for prominent restaurant chains (such as McDonald’s and KFC) by leveraging artificial intelligence and computer vision. They build AI agents that support restaurants by enabling menu digitization, automated order verification, personalized recommendations, and similar operations. They are prominently recognized for driving measurable improvements in quality control and profitability across large-scale food service environments.

PROJECT REQUIREMENTS

Image Annotation Services to Train an AI Agent

The client provides AI agents to restaurant chains. These AI agents review order preparation in real-time, automatically detecting any missing items, documenting orders, and ensuring the accuracy of each delivery based on photos uploaded by the restaurant and delivery staff. They also serve as a dispute resolution mechanism, responding to customer claims or complaints.

To ensure the solution's reliable real-world performance, the client required our image annotation services.

  • Annotation and Classification: Accurate labeling of food items (e.g., burgers, fries, chicken, coffee, wraps, sauces, etc.) and their packaging to support the AI’s food recognition capabilities.
  • Polygon Segmentation: Precise polygon segmentation for irregularly shaped food items, ensuring they are accurately represented in the images, allowing the AI to handle complex and non-standard shapes.
  • High-Volume Annotation Support: Annotation of a large dataset comprising over 20,000 images, ensuring high consistency and accuracy across all items, to create a robust dataset for training the AI agents.
PROJECT CHALLENGES

Inconsistent Image Quality and Regional Variations

While the project's objectives were clear, the path to achieving them presented significant obstacles. The dataset's real-world nature—sourced from live restaurant operations across multiple international chains—introduced layers of complexity that required careful strategy and domain expertise to overcome.

Inconsistent Photography Conditions

The dataset included images from various restaurant chains, captured under a range of lighting environments, from bright kitchen stations to dimly lit delivery bags. Some images were professional marketing photos, while others were quick smartphone captures taken by restaurant staff. This inconsistency impacted the clarity and visibility of the food items, making it challenging to annotate them.

Overlapping Items in Crowded Images

Many images in the dataset featured crowded scenes with multiple food items, often captured in a single photo. Items were frequently partially visible or obscured by packaging, making it difficult to define clear boundaries for each item. Additionally, many food items were visually similar, requiring careful contextual differentiation (e.g., distinguishing between different types of burgers or sauce varieties) to ensure accurate annotations.

Regional Menu Variations

The dataset included images from restaurant chains operating across multiple regions, so the annotations had to account for regional differences in terminology, preparation methods, and ingredients. For example, a "McChicken" sandwich may be prepared with different sauces or toppings in different countries. Likewise, the same item might be called "fries" in the U.S. but "chips" in the U.K., or feature different sizes or cuts.

OUR SOLUTION

AI-Assisted, Human-Reviewed Polygon Segmentation

The variability that made this dataset valuable for AI training—multiple restaurant chains, diverse lighting conditions, regional menu differences—also made it exceptionally challenging to annotate. Our solution wasn't simply to apply more resources, but to develop a carefully designed workflow where technology, human expertise, and quality controls worked together to overcome these challenges. Each element of the workflow compensated for the limitations of the others, ensuring high-quality annotations and a reliable dataset for training AI models.

Here’s how we approached the solution:

1

AI-Assisted Polygon Segmentation using CVAT

To manage the large volume of images and ensure precise labeling, we utilized CVAT (Computer Vision Annotation Tool), which can be optimized for handling image annotation tasks at scale.

CVAT offers an AI-assisted polygon segmentation feature that we used to annotate irregularly shaped food items accurately. Our team also used CVAT's built-in brightness adjustment features to temporarily enhance visibility during annotation—without altering the original image data—ensuring accurate labeling even in challenging lighting scenarios while preserving data integrity for AI training.

2

Annotator Training by Subject Matter Experts

A specialized team of 10 annotators was assigned to the project, and a domain expert was aligned with them to ensure consistent handling of the high-volume dataset (20,000+ images). Each team member underwent comprehensive training on:

  • Restaurant-specific terminology across multiple chains (McDonald's, KFC, etc.)
  • Regional menu variations and packaging differences
  • Contextual differentiation techniques for visually similar items
  • Best practices for annotating crowded or partially obscured food items
3

Standardized Labeling Protocols and Classification Taxonomy

Given the diverse set of food items and packaging products, it was crucial to establish and follow uniform image annotation protocols. We developed a detailed labeling taxonomy for this client that accounted for the diverse food categories, packaging types, and regional variations present in the dataset, ensuring consistency across all annotators. This taxonomy included:

  • Predefined categories for standard food items (burgers, fries, chicken, coffee, wraps, sauces, etc.) and standard packaging
  • Clear guidelines for handling edge cases (obscured items, overlapping elements, poor lighting)
  • Region-specific naming conventions to accommodate international menu differences
4

Multi-Level Quality Assurance Process

High-volume annotation projects present a fundamental tension: speed often compromises precision, while excessive quality controls can hinder throughput. Our image annotation solution reconciled this by distributing quality assurance across the workflow rather than concentrating it at the end.

  • Peer Review: Cross-checking between annotators to catch initial errors
  • Internal QA: Subject matter experts reviewed batches for consistency and adherence to the standard labeling taxonomy
  • Client Collaboration: Regular feedback loops with the client's team to validate annotations against their AI training requirements and real-world performance expectations

Each image underwent a minimum of 2 review cycles before final approval. Quality checkpoints were conducted every 2,000 images to maintain consistency and check for any annotation drift (typical in an image annotation project at this scale). We also maintained an internal accuracy threshold of 98% before submitting labeled data to the client.

Raw image

Raw image

Annotated image

Annotated image

Project Outcomes

We completed the high-volume annotation project within the agreed timeline while maintaining quality standards. The client also used our standardized labeling taxonomy as a reusable framework, enabling consistent and efficient annotation across additional datasets.

20,000+ Annotated Images Delivered

Within the client’s expected timeline, and accepted without the need for any revisions or rework.

98% Annotation Accuracy Maintained

Through our multi-tiered quality assurance process for data labeling.

Enabled Multi-Chain AI Deployment

Enabling the client to deploy their AI agents without requiring client-specific retraining.

They understood the assignment and delivered clean annotations without us having to request any rework, in one go. That's rare.

- Head of Computer Vision

CONTACT US

Need High-Quality Training Data for Your AI Agents/Models?

Whether you're building computer vision systems, NLP applications, or multimodal AI agents—or struggling with inconsistent quality and delays from your current image annotation service provider—we can help.

Share your data annotation challenges with our team and get a specialized labeling solution designed for your industry, your data complexity, and your quality standards. Send a query to discuss this further with our team, or try a free sample to evaluate our image labeling quality firsthand.