Video Annotation For a Waste Classification Model

The client

One of the Most Reputed Institutions in India

The client — a top educational institute — is highly ranked both nationally and internationally for academic excellence and research contributions. Their research projects span multiple disciplines, including artificial intelligence, automation, materials science, and real-world industrial applications.

Within this broader research landscape, the institution is actively exploring waste management and recycling technologies. One of their research teams is developing AI-driven systems capable of automatically identifying and classifying waste items as they move along conveyor belts, aiming to improve sorting accuracy, reduce manual dependency, and enable more efficient recycling operations at scale.

PROJECT REQUIREMENTS

Video Annotation to Train a Waste Classification Model

To train an AI system that could automatically recognize different types of waste on a conveyor belt (like in recycling plants), the client needed video annotation services. They provided a large set of CCTV footage and kept adding to that dataset as the project progressed.

Our video labeling team had to look at each image/frame and:

Draw a bounding box around any waste item they could clearly see.
Label it with the correct category (from a fixed list of 16 waste types).

However, the client was very specific about avoiding bad labels. So they shared a mandatory guideline for conditions where annotation must be skipped:

We can only see less than half of the object.
Nothing is clearly visible (too blurry, too small, lighting issue, etc.)
It doesn’t fit into any of the given categories.
We are not sure which category it belongs to.
The image is a duplicate of another image that has already been annotated.
The conveyor belt is empty, or nothing is clearly visible.

PROJECT CHALLENGES

Maintain Consistent Quality across Thousands of Annotated Video Frames

The client did not just want “a lot of labeled data.” Their primary purpose was to avoid bad labels, which would degrade the AI’s performance. Therefore, we had to ensure that only clean and accurate examples were included in the AI training data.

However, our team faced certain hurdles:

Clearly Identifying Waste Items

Since the images were sourced from regular CCTV footage, the waste items did not look as a clean textbook example would. They often appeared crumpled, torn, wet, or overlapped with other materials. Several categories also shared similar visual characteristics (e.g., cardboard vs. paper, plastic film vs. foil). This made it difficult to assign the correct label confidently.

Not Labeling Near-Duplicate Frames

Since the conveyor belt moved slowly, many consecutive video frames were almost identical. If we annotated every frame, we would spend time labeling the same visual information repeatedly, which would reduce throughput without improving the training dataset. The challenge was to quickly identify which frames introduced new, useful visibility of waste items and which frames could be skipped without compromising accuracy or speed.

Ensuring Uniform Interpretation of “Skip Conditions”

The client’s guidelines mandated annotators to skip any object that could not be clearly identified. This forced annotators to use their judgment, rather than simply following a mechanical tagging workflow. But different annotators can interpret the same frame differently. For instance, a flattened plastic bottle could look similar to a piece of transparent plastic film in low-resolution footage. One annotator might label it as a plastic bottle, another might skip it due to an unclear shape.

OUR SOLUTION

Customized CVAT and Built a Semi-Automated Video Annotation Solution

The goal was not just to label data, but to create a repeatable, auditable process that could deliver precise annotations at scale — without compromising on the client’s strict research standards. So, we customized our video annotation services to meet the client’s needs. Our approach combined structured human judgment with intelligent automation, ensuring every annotated frame contributed meaningful information to the AI model. Here’s how we engineered this end-to-end system.

1

Establishing a Shared Visual Baseline

Before automation could be effective, our annotation team needed a shared understanding of how each waste category appeared in real-world CCTV footage. We created a visual reference guide for all 16 waste types using actual client images — showing variations such as torn paper, reflective foil, wet cardboard, or crushed bottles.

This reference set guided our annotation decisions, ensuring that all annotators labeled objects consistently. It was also embedded into the automated video annotation workflows we built.

2

Automated Frame Preprocessing to Eliminate Redundancy

To prevent the team from wasting time on labeling images that were almost identical, we asked annotators to compare each frame with its immediately previous one. If nothing changed, they could skip labeling the frame. If something changed (such as an item shifting, unfolding, separating, or becoming clearer), they could annotate the image.

However, at this scale, manually screening frames was very inefficient. Unfortunately, CVAT does not have a built-in feature that can automatically detect and skip "near duplicate" video frames during annotation.

To address this, we developed a custom preprocessing pipeline that ran outside CVAT, but was integrated with its upload workflow. This Python-based system, powered by OpenCV and scikit-image, automatically analyzed video sequences before they entered the annotation stage. It compared consecutive frames for visual similarity and motion to filter out near-duplicate frames. Only the most informative frames — those showing new object angles or improved clarity — were passed for video labeling.

3

Customized Annotation Interface for Judgment Consistency

There was no way in CVAT to natively enforce the client’s detailed “skip” conditions or record why a frame was skipped. So, we customized the interface to enable annotators to follow the rules more consistently.

We added a “Skip Reason” dropdown that required annotators to select why a frame or object was not labeled (for example, less than half visible, too blurry, or unclear category). We also included small on-screen tips reminding them of key video labeling rules. Each skip reason and annotator ID was automatically recorded in the task data, making every decision traceable.

With these modifications, we transformed CVAT from a generic data labeling tool into a domain-specific annotation control environment aligned precisely with our client’s research needs.

4

Dual-Layer Review for Edge-Case Validation

To ensure uniform interpretation of skip conditions and prevent human bias, we established a two-tier validation process. It helped ambiguous cases where annotators were at risk of making subjective (and hence, inconsistent) decisions based on their perception:

Annotators flagged uncertain frames instead of labeling them.
These flagged frames were then automatically sent to a senior reviewer.
The reviewer made the final decision based on the client’s guidelines.
Their notes and decisions were logged in CVAT for full transparency.

5

Structured QA Transparency and Feedback Loop

Every annotation event — including skips, flags, and corrections — was automatically logged within CVAT’s metadata. We extended this logging into a QA documentation report that summarized:

Frame acceptance/rejection ratios,
Common skip reasons, and
Category-level label confidence.

Weekly reports were reviewed jointly with the client’s research team. This collaborative audit process enabled them to refine category definitions, analyze which item types were most frequently skipped, and supply additional footage featuring those items, ensuring the model had more relevant examples to learn from.

6

Custom Data Export Pipeline for Model Readiness

Although CVAT provides standard COCO export options, the client’s AI training workflow required additional metadata and a unified dataset structure that the default format couldn’t support. We developed a custom export converter that merged multiple CVAT task outputs, appended client-specific fields (annotator ID, skip reason, timestamp, and frame sequence), and validated schema consistency prior to integration.

This ensured full traceability and a ready-to-train dataset that aligned precisely with the client’s TensorFlow pipeline.

7

Hybrid AI Assistance for Faster Bounding Box Creation

Where feasible, we leveraged CVAT’s AI-assisted pre-annotation feature. Early batches of annotated data were used to train a lightweight object detector, which then auto-generated preliminary bounding boxes for new frames. This automated pre-annotation approach handled most object detection tasks, while human annotators performed targeted verification and corrections as needed, achieving high accuracy with considerably less manual work.

Raw image

Annotated image

Project Outcomes

98-99% Labeling Accuracy Maintained

Through two-tier video labeling QC and a standardized reference guide.

42% Faster Annotation Throughput

Through automated frame filtering and AI-assisted pre-annotation.

21% Improved Early Model Precision

As recorded during initial training, compared to their previous training data.

CONTACT US

Looking for a Dependable Data Annotation Partner?

Not every training dataset fits a standard labeling playbook. Your model may require nuanced judgment, conditional rules, evolving edge cases, or annotation logic that shifts as research advances. That’s the point where most vendors break. Our data annotation services don’t.

Our text annotation, image annotation, and video annotation services are designed to adapt to your specific rules, constraints, and quality thresholds. Where tools fall short, we extend or customize. Where automation helps, we use it — always with human verification built in.

If your data labeling needs aren’t “regular,” you’ll notice the difference working with us. See it for yourself - start with a free sample.

Complex Video Annotation, Done Reliably