Data is at the core of today’s most prolific AI technologies.From automated cars to retail chatbots, each of these innovations require mammoth amounts of specialized data to train their algorithms. As public demand for smart technologies increases, so too does the race to develop products that are better, cheaper and faster. The dataset generation industry is a behemoth, silently expanding.
A recent report predicts the worldwide dataset generation industry will sustain a 5x increase from today’s valuation by 2030, growing to USD $8,607 million (Research and Markets, 2022). The scope for engagement and profit is immense. Unfortunately, the majority profit goes towards dataset generation companies and NOT their workers. While most data workers make on average USD $0.22 (India) and USD $2 (USA) per hour of work; datasets are sold for USD $60-140 per hour of cleaned data!
THE HUMAN COST OF AI.
Let’s lift the curtain and look behind what it takes to make AI technologies to uncover the human labour element. At the visible forefront are your product managers, data engineers, scientists, and developers. They develop the product, write the algorithms, and prepare technologies for consumption. But let’s take a step back to writing algorithms. Technologies are taught to perform tasks and respond to stimuli by interacting with curated speech, text, image, and video training datasets. But where do these come from? While some models use free user-data, many technologies require the purchase of specific datasets from companies like Amazon’s Mechanical Turk (AMT), Appen, iMerit and Samasource.
In 2018 the ILO published a report highlighting the exploitative nature of digital work. This was soon followed by an article in the Atlantic exposing the AMT platform for significantly underpaying their workers. The unfortunate truth is painful – neither dataset generation companies nor most technology companies actively consider the livelihoods of dataset generation workers.
We believe that the data industry has the potential to be one of the driving forces of poverty reduction. With the industry currently valued at over USD $40 billion, there is substantial space for data workers to significantly benefit from these profits.
Our vision is a world where ethical data generation practices are the standard and not an exception. But this ecosystem cannot be created alone, we need the support of technology companies to advocate for the use of ethical data. Data workers need to be put at the heart of the ecosystem and prioritized in every decision. To this end, we have developed the Ethical Data Pledge, where we call on technology companies, data companies and compassionate individuals to help us create a world where digital work is ethical, sustainable and beneficial to workers!