Counterfeiting, web spoofing, fake listings, and product copies have witnessed a huge spike in recent years, impacting sales & growth of big brands in the global market. Identifying such online scammers from lakes of websites, marketplaces, and web portals is equivalent to finding a needle in a haystack.
Our client is one of the leading revenue recovery companies with proprietary AI-based software. This tool crawls the web on behalf of businesses to detect IP (intellectual property) infringements. This platform allows our client to support its clientele by identifying copyright violations, product or brand impersonation, product or content piracy, counterfeits, and distribution abuse. With this unique proposition, our client currently assists over a thousand online brands in monitoring fraud and enables them to take necessary legal action backed by evidence to recover their revenue.
SunTec India helped discover and bridge the gaps in the client's AI brand protection platform through multiple types of data support solutions.
Handling the avalanche of data being scraped by their AI system for all these brands had become overwhelming. So, the client decided to outsource a few data-specific operations.
During our discussions, we identified a few issues that needed immediate attention.
To enable the AI platform to function effectively, it needed to be fed with a continuous stream of high-quality data, thus creating a need for consistent data cleaning and data hygiene monitoring.
The scraped data (primarily containing links to counterfeit products) needed regular verification.
The data collected by their AI contained incomplete information (in certain cases where data crawling & extraction methods failed to extract all the information), so it needed manual data appending.
In cases where automated scraping failed to produce results, we needed to detect the online scammers through manual web research.
We proposed a dedicated team of data researchers to build on the knowledge to perfect the process.
This project ran in phases. We started out with a few basic tasks (mainly data validation) and moved on to data cleaning and web research.
Initially, the client assigned us a single task of asset authentication. We matched images with client guidelines and either validated, discarded, or reassigned the image category.
We completed the client's database by entering critical information present in the URLs collected by their AI system, such as the product title, seller URL, seller name, price, product images, and account identifiers.
We set up a web research team that went through websites, social media platforms, marketplaces, and other channels (following client-defined parameters). We prepared a consolidated sheet with flagged links and uploaded it to the client's portal.
We followed brand-specific guidelines to compare the client's content against the sources found on the web and created a list of URLs where we detected piracy.
We sorted the data evidence under categories (counterfeit, brand abuse, replica, copyright, and brand impersonation) to simplify the client's task of evidence assimilation.
The client extends its services to over a thousand brands. Initially, we were assigned only a few of them.
By the end of the first month, the brands assigned to us were doubled. Our dedicated team significantly reduced client involvement (for feedback and quality analysis) by nearly 70%. We also improved our accuracy rates from 72% to 95%.
For this particular project, our team showcased admirable resilience. We experimented with approaches, evaluated and reframed processes, and established quality analysis systems while evolving with the client's needs and workload.
Initially, when we started the project, we divided our teams based on the brands we were assigned. For example, from the entire manual data detection team, each resource was assigned a few brands to work on. When this approach resulted in quality issues and higher errors, we divided our teams based on processes for simplified management.
The goal of any AI-based tech platform is to be as close to precision as possible. Achieving this goal across sites is difficult since websites usually have distinct unique underlying patterns. Further, website structures change with time, so these AI-based solutions must be robust enough to adapt. The automated crawling methods, howsoever trained, cannot work unsupervised – they need to work in tandem with a human-in-the-loop approach to deliver the most optimal results.
SunTec understands this man-plus-machine approach very well and continues to effectively serve several AI-based tech platforms for their varied data management needs.