This client provides operational intelligence solutions for prominent restaurant chains (such as McDonald’s and KFC) by leveraging artificial intelligence and computer vision. They build AI agents that support restaurants by enabling menu digitization, automated order verification, personalized recommendations, and similar operations. They are prominently recognized for driving measurable improvements in quality control and profitability across large-scale food service environments.
The client provides AI agents to restaurant chains. These AI agents review order preparation in real-time, automatically detecting any missing items, documenting orders, and ensuring the accuracy of each delivery based on photos uploaded by the restaurant and delivery staff. They also serve as a dispute resolution mechanism, responding to customer claims or complaints.
To ensure the solution's reliable real-world performance, the client required our image annotation services.
While the project's objectives were clear, the path to achieving them presented significant obstacles. The dataset's real-world nature—sourced from live restaurant operations across multiple international chains—introduced layers of complexity that required careful strategy and domain expertise to overcome.
The dataset included images from various restaurant chains, captured under a range of lighting environments, from bright kitchen stations to dimly lit delivery bags. Some images were professional marketing photos, while others were quick smartphone captures taken by restaurant staff. This inconsistency impacted the clarity and visibility of the food items, making it challenging to annotate them.
Many images in the dataset featured crowded scenes with multiple food items, often captured in a single photo. Items were frequently partially visible or obscured by packaging, making it difficult to define clear boundaries for each item. Additionally, many food items were visually similar, requiring careful contextual differentiation (e.g., distinguishing between different types of burgers or sauce varieties) to ensure accurate annotations.
The dataset included images from restaurant chains operating across multiple regions, so the annotations had to account for regional differences in terminology, preparation methods, and ingredients. For example, a "McChicken" sandwich may be prepared with different sauces or toppings in different countries. Likewise, the same item might be called "fries" in the U.S. but "chips" in the U.K., or feature different sizes or cuts.
The variability that made this dataset valuable for AI training—multiple restaurant chains, diverse lighting conditions, regional menu differences—also made it exceptionally challenging to annotate. Our solution wasn't simply to apply more resources, but to develop a carefully designed workflow where technology, human expertise, and quality controls worked together to overcome these challenges. Each element of the workflow compensated for the limitations of the others, ensuring high-quality annotations and a reliable dataset for training AI models.
Here’s how we approached the solution:
To manage the large volume of images and ensure precise labeling, we utilized CVAT (Computer Vision Annotation Tool), which can be optimized for handling image annotation tasks at scale.
CVAT offers an AI-assisted polygon segmentation feature that we used to annotate irregularly shaped food items accurately. Our team also used CVAT's built-in brightness adjustment features to temporarily enhance visibility during annotation—without altering the original image data—ensuring accurate labeling even in challenging lighting scenarios while preserving data integrity for AI training.
A specialized team of 10 annotators was assigned to the project, and a domain expert was aligned with them to ensure consistent handling of the high-volume dataset (20,000+ images). Each team member underwent comprehensive training on:
Given the diverse set of food items and packaging products, it was crucial to establish and follow uniform image annotation protocols. We developed a detailed labeling taxonomy for this client that accounted for the diverse food categories, packaging types, and regional variations present in the dataset, ensuring consistency across all annotators. This taxonomy included:
High-volume annotation projects present a fundamental tension: speed often compromises precision, while excessive quality controls can hinder throughput. Our image annotation solution reconciled this by distributing quality assurance across the workflow rather than concentrating it at the end.
Each image underwent a minimum of 2 review cycles before final approval. Quality checkpoints were conducted every 2,000 images to maintain consistency and check for any annotation drift (typical in an image annotation project at this scale). We also maintained an internal accuracy threshold of 98% before submitting labeled data to the client.
Raw image
Annotated image
We completed the high-volume annotation project within the agreed timeline while maintaining quality standards. The client also used our standardized labeling taxonomy as a reusable framework, enabling consistent and efficient annotation across additional datasets.
Within the client’s expected timeline, and accepted without the need for any revisions or rework.
Through our multi-tiered quality assurance process for data labeling.
Enabling the client to deploy their AI agents without requiring client-specific retraining.
They understood the assignment and delivered clean annotations without us having to request any rework, in one go. That's rare.
- Head of Computer Vision
Whether you're building computer vision systems, NLP applications, or multimodal AI agents—or struggling with inconsistent quality and delays from your current image annotation service provider—we can help.
Share your data annotation challenges with our team and get a specialized labeling solution designed for your industry, your data complexity, and your quality standards. Send a query to discuss this further with our team, or try a free sample to evaluate our image labeling quality firsthand.