A leading Indian engineering and infrastructure consulting company with more than 25 years of industry experience, this client provides digital transformation, project management, design, and advisory services across sectors, including highways and bridges, tunnels, hydropower, solar energy, railways, metros, and urban infrastructure. The client also operates at the intersection of infrastructure delivery and public-sector mandates, contributing to projects aligned with national development priorities. They are currently building and maintaining large-scale data pipelines to support the government's growing focus on AI for smart infrastructure assessment and management.
The client was engaged in a Government-backed initiative spanning the National Highways Authority of India (NHAI) and a state-based road construction department. The initiative required systematic annotation of highway survey footage to support road condition assessment and maintenance planning at scale. The client required a dedicated image annotation service provider to support large-scale labeling of highway data.
Specialized survey vehicles captured highway images across multiple states, which were then shared in bulk with our data annotation team. All annotation activities were carried out on a CVAT platform hosted and managed by the client.
The project scope involved:
Executing an AI training data project backed by public-sector entities leaves zero room for approximation. Because the final dataset feeds directly into national and state infrastructure maintenance decisions, we had nearly no margin for error. Achieving this level of precision introduced several critical technical and operational challenges:
The client shared large image datasets in batches without a consistent schedule, requiring the team to manage unpredictable volume spikes while sustaining throughput. Each highway corridor captured in the footage differed in road type, surface condition, and surrounding environment, so the team could not rely on uniform labeling workflows to ensure efficiency — each batch required context-specific handling.
Our data annotation team had to follow the IRC82 standard, which defined 71 separate damage and asset categories (including cracking, patching, potholes, deformation, broken edges, etc.). Annotators had to reliably distinguish between categories that looked visually similar but carried different engineering significance. For instance, we had to determine if a crack qualified as a hairline crack, alligator crack, longitudinal crack, edge crack, shrinkage crack, or reflection crack.
The highway images (over 1000 kilometers of highway survey coverage) varied considerably in quality depending on weather conditions, lighting, road surface type, and geographic region. Images from poorly lit underpasses, sun glare on wet roads, and dusty, unpaved shoulders all affected visibility and hence, impacted annotation quality. Our team had to maintain consistent classification logic and labeling precision regardless of image quality.
When the project scope expanded after the successful pilot, we faced the risk of "silent inconsistency"—where batches from the pilot team and new members appeared identical on the surface but used different labeling logic. For instance, a seasoned annotator complying with IRC82 standards would analyze a crack's depth and irregular patterns, identify it as structural, load-induced deformation, and apply a precise polygon annotation. But a newly onboarded annotator might rely on basic visual logic, marking it as a standard superficial crack with a quick bounding box. While a surface-level quality review would show that both had completed their tasks, this underlying interpretative gap would ultimately compromise the integrity of the entire training dataset.
The project began with a pilot involving four experienced annotators and one quality reviewer. We selected team members with prior exposure to infrastructure-related training data preparation and trained them specifically to meet IRC82 compliance requirements. The pilot phase served two purposes: it provided a controlled environment to validate our annotation workflows against the client's CVAT platform, and it produced the initial reference dataset that would later serve as the quality baseline for the expanded team. Based on the results delivered during the initial phase, the project scope expanded significantly. We scaled the team to 35 full-time annotators and 7 quality reviewers — a total operating team of 42 specialists.
Every annotator completed a mandatory training program on IRC82 guidelines and the 71 damage and asset categories defined under the standard. Training included:
We classified images into three quality tiers: clear, degraded, and unworkable. Clear images followed the standard annotation workflow. In degraded images — affected by poor lighting, sun glare, low contrast, or dusty road shoulders — genuinely ambiguous cases were flagged and routed to the quality reviewer. Images classified as unworkable — where road surface features could not be reliably identified even on careful examination — were escalated to the client. The client then decided whether to re-survey the affected segment or exclude it from the dataset.
All annotation work was carried out on the client's CVAT platform, which the client hosted and administered. This kept the client in full control of data access, version history, and export pipelines, which matters when the end recipient is a government body with its own data governance requirements. We processed 350,000+ highway images using bounding-box and 4-point/multi-point polygon annotations. By strategically assigning 4-point boxes to uniform assets (~80% of the total annotations) and reserving intensive multi-point tracing for asymmetric road defects (~20%), our team further optimized the data pipeline, delivering pixel-level precision while maintaining the timeline.
The computer vision data annotation process ran through two integrated quality layers. Each annotator self-reviewed completed batches before submission, flagging uncertain annotations for peer review. The quality reviewer (subject-matter experts with backgrounds in Civil Engineering) then conducted a structured sample review across every batch, cross-checking against the IRC82 taxonomy and the reference dataset from the pilot phase. Discrepancies were documented and fed back into team calibration sessions to prevent recurring errors.
Following successful performance on the highway annotation project, we were engaged to support the Output and Performance-Based Road Maintenance Contract (OPRMC) initiative. Unlike traditional road contracts, where a contractor is paid based on the amount of materials used or the volume of work they do, an OPRMC contract pays the contractor based on the actual condition and performance of the road. To ensure the software can automatically determine whether a contractor is meeting their performance KPIs, we annotated over 300,000 additional highway images (1M+ annotations), marking defects and signs of deterioration.
3 Million+ Annotations Approximately 2 million covering national highway corridors, and approximately 1 million supporting OPRMC road maintenance data requirements
99% Annotation Accuracy Maintained throughout the engagement, including during and after the workforce scale-up from five to 42 specialists.
650,000+ Images Labeled Managed without any regression in annotation quality or deviation from IRC82 compliance standards for highway image annotation.
Whether you need image labeling services for road infrastructure, geospatial data annotation, satellite image annotation, drone data annotation, or aerial image labeling, SunTec India has the expertise and the domain specialists. Explore related domain-specific image labeling work we have done:
Request a free sample or schedule a consultation to discuss your annotation requirements.