Jul 20, 2023 / 6 min read

How human talent helps dataplor validate international location Data

Blog

One of the most pressing challenges for companies hoping to capitalize on location data is data quality. A dataplor analysis of open-source Mexican data, for example, found that more than 70% of point of interest records contained inaccuracies. For example, a business’ record might have included an incorrect address or multiple different addresses. These inaccuracies may sound minor, but at scale, they can lead to poor decisions that cost companies using location data for logistics, site selection, and advertising millions of dollars.

This is why the most rigorous location data companies don’t just use machines to collect data at scale; they also use machines — and people — to verify it. Dataplor, for example, hires local experts to further validate the accuracy of data that has been collected by AI call bots and deduplicated with machine learning.

But what exactly does human validation add to the location data collection and verification process? What is the industry standard, how does human validation exceed it, and for what sort of scenario is human validation most useful?

Here’s how human validation helps dataplor provide the most accurate possible POI data.

How most location data companies collect information

Businesses and third-party providers are increasingly relying on geospatial intelligence to power their predictive modeling, expansion plans, and evaluations of market trends and competition. Location data companies provide this placed-based intelligence by compiling location information from a wide range of sources. This includes anonymized and aggregated data gleaned from mobile devices, applications, and POS and ad services. The result is rich datasets available for a wide range of potential use cases.

Across the industry, location data companies employ machine learning technology and AI to identify and analyze location data. Oftentimes, companies advertise the fact that they also engage human capital — but the industry standard for doing so isn’t always clear. 

Relying too much on human capital can result in overexposure to human error; in other cases, human validators add wasteful, imprecise, or inefficient complications rather than bolster existing tech. This can happen when companies neglect an enterprise approach that quickly and effectively integrates human capital with ML and AI processes and emphasizes consistent data management standards. 

In other words, human validation can be an essential part of the location data quality enhancement process. But most companies use it minimally or optionally. For example, they might allow academics to use their data for free and point out issues and errors. But this isn’t a proactive approach; it relies on the possibility that academics will find mistakes and correct them. Other companies hire a very select group of people to walk a small area and gather on the ground data. But that data is often interpolated to other areas, which is a highly assumptive, inaccurate approach.

A better approach would be to start with scalable, high-coverage, high-accuracy data and improve it even more with experts who are employed directly to systematically upgrade it. 

Why dataplor uses human validators

Technology drives 90% of dataplor’s approach to data collection, and human capital supplies the final 10%. What does this look like, and why does it lead to data that is more accurate and usable at scale? 

In short, human capital plays the essential role of fine-tuning tech-based data collection, ensuring that information is consistent and accurate. For example, many countries have different standards for information like zip codes, phone numbers, and street names. Plus, translation issues can lead to inaccuracies in data, like when an AI analysis might mistake a grocery store for a hardware store because of differences in local language or dialect. Dataplor’s human validators anticipate these issues for their local areas and use proprietary tools to fix inaccuracies and tag data based on its quality. 

The result is a combination of the power of machine-driven data collection at scale with the knowledge and innovation of local experts to ensure quality.

How human validation improved the accuracy of cafe identification in Japan 

For an example of a human validator increasing data accuracy, take Nel Ferrer, a regional operations manager at dataplor. Nel came to dataplor after working in the tech and finance industries, where he specialized in cross-cultural collaboration among data-oriented teams. He runs a multicultural group of validators focused on POI-related issues. One of their key contributions, Nel says, is “ensuring that AI is correctly tagging information in different places and that this information is 100% correct.” Specifically, much of his team’s current focus is on “how local culture affects the POI address structure,” which they do by providing what he calls a “human touch” that makes sure that AI is “perceiving the environment as consistently and correctly as possible.” In this way, Nel’s validators are quite literally AI’s eyes on the ground. 

Nel’s role also includes training and auditing the ongoing work of his validators. Training is one on one and walks validators through the process of understanding what to expect and how to handle various scenarios when in the field. Nel works in constant communication with his team to answer questions and solve problems as they arise. He also provides an additional level of review to the data collection and enhancement process by reviewing his team’s work weekly on an individual level. For example, he’ll choose five to ten random POI locations and make sure they are correctly tagged and free from duplications. 

Asked about a recent instance where his team was able to fix an inaccuracy, Nel recalls an example in Japan, where cafes with the word “cat” (“neko”) in their business name were being mislabeled as pet stores. 

This kind of advanced training of and detailed fixes from human capital add up to accurate, trustworthy, and actionable datasets. Location data can be consistently tagged and cross-checked, and automated identification processes can quickly correct for mistakes via validator feedback. 

The result is location data with global reach and local distinction. Dataplor’s approach to leveraging the additive benefit of human validators like Nel and his team make it possible to provide on-point geospatial intelligence at the scale that international businesses need.