Data, Labor, and the Global Supply Chain
Popular descriptions of AI present it as a technology that learns autonomously from data. This framing is misleading in a crucial way: behind nearly every high-performing AI system is a large, often invisible workforce of human beings who collected, labeled, filtered, and refined the data that made learning possible. This workforce is global, precarious, and largely unnamed in the press releases and research papers that celebrate AI's achievements. Understanding the global labor supply chain of AI is essential to a complete picture of what AI systems actually are and how they are actually built.
The Work AI Requires
Training a modern AI system requires multiple categories of human labor, each often outsourced to low-cost labor markets. Data collection involves gathering raw material: photographs, audio recordings, sensor measurements, text documents. This work ranges from web scraping (mostly automated) to targeted collection — paying workers to record their voice in specific accents, photograph specific objects, or walk through specific environments wearing sensor-equipped vests. Data annotation is the conversion of raw data into labeled training examples. This is among the most labor-intensive steps in AI development. An image dataset for a self-driving car requires every pixel in thousands of images to be labeled — this car, this pedestrian, this traffic cone. A sentiment analysis dataset requires thousands of reviews or social media posts to be read and rated by human annotators. A question-answering dataset requires humans to write both questions and verified answers. Annotation is painstaking, repetitive work done at scale by workers paid per task. Reinforcement learning from human feedback (RLHF) is a newer but now central category of AI labor. To train AI systems to be helpful and to avoid harmful outputs, human raters must compare pairs of AI outputs and indicate which is better, safer, or more aligned with human values. This work requires judgment, nuance, and repeated exposure to AI-generated text — including, in many cases, highly disturbing content. Content moderation involves human reviewers who examine user-reported content to determine whether it violates platform policies. While not always classified as AI work, content moderation is deeply intertwined with AI systems that flag content for review — and the humans doing this work are essential to the AI-content ecosystem.
Computer scientist Mary Gray and sociologist Siddharth Suri coined the term 'ghost work' to describe the on-demand, piecemeal human labor that powers AI and platform economies — labor that is systematically hidden from public view by the platforms that rely on it. Ghost workers have no employment security, no benefits, and often no recourse when they are abruptly removed from platforms.
Where This Labor Happens
AI data labor is globally distributed, typically following the geography of wage differentials and English-language capability. Major annotation hubs include Kenya (Nairobi has become a center for AI annotation work, with thousands of workers employed by companies like Sama and Scale AI on behalf of clients including Google, Microsoft, and Meta), the Philippines (a major site for content moderation and annotation, leveraging a large English-speaking workforce), Venezuela (where economic crisis has driven many professionals into microtask platforms like Amazon Mechanical Turk and Remotasks), India (a large and growing share of annotation work, particularly for data requiring English or South Asian language competence), and Pakistan, Bangladesh, and parts of Southeast Asia. This geography is not incidental. It reflects the same economic logic that offshored manufacturing: wage costs are dramatically lower, the work requires no physical proximity to the AI labs it serves, and workers have limited bargaining power relative to global tech platforms. A 2023 investigation by TIME magazine found that workers in Nairobi hired through Sama were paid between $1.32 and $2 per hour to read and label extremely disturbing content — including graphic depictions of violence and child abuse — as part of OpenAI's safety training for ChatGPT. Many workers reported lasting psychological harm from the exposure. Few received meaningful mental health support. The work was discontinued after the investigation, but the conditions it revealed were not unique to that contract.
Flashcards — click each card to reveal the answer
Labor Conditions and Ethical Questions
The conditions under which AI data labor is performed raise serious ethical questions that the industry has been slow to address. Wage levels: annotation workers on major platforms earn wages that, while locally competitive in some markets, are far below the market rate for equivalent cognitive labor in the countries where AI companies are headquartered. The value produced — enabling billion-dollar AI systems — is not reflected in the compensation. Psychological harm: content moderation and RLHF work regularly exposes workers to disturbing material. Studies of content moderators at Facebook's outsourced operations in Kenya found high rates of PTSD and other mental health conditions. Platform operators have been criticized for failing to provide adequate psychological support or hazard pay. Job security: ghost workers are typically classified as independent contractors, with no employment protections, no benefits, and the ability to be removed from platforms without notice or explanation. This makes organizing for better conditions extremely difficult. Transparency and credit: AI systems are rarely documented in ways that acknowledge the human labor that built them. Model cards — technical documents describing AI systems — almost never describe the annotation workforce, their working conditions, or their compensation. These are not abstract philosophical problems. They are active areas of labor advocacy, regulatory inquiry, and legal dispute. In Kenya, a group of former Sama workers filed a lawsuit against Meta in 2023 alleging inadequate mental health support and wrongful termination for union organizing.
Which of the following best explains why AI data annotation work is disproportionately located in countries like Kenya, the Philippines, and Venezuela rather than in the US or Germany?
The term 'ghost work' most precisely captures which feature of AI data labor?
Trace the Supply Chain
- Choose one widely used AI system — ChatGPT, Google Gemini, YouTube's recommendation system, TikTok's content moderation, or another AI application you use regularly.
- Research and map its data supply chain:
- 1. What types of training data does this system likely use? What public information is available about its training data?
- 2. What annotation or RLHF labor would have been required to build it? Find at least one documented source about who did this work.
- 3. Where geographically does this labor appear to be concentrated?
- 4. What do you know about the working conditions of the people who performed this labor?
- 5. Does the company publicly disclose information about its data labor practices? What, if anything, is visible in their model cards, sustainability reports, or news coverage?
- Present your supply chain map as a visual diagram with a 250-word written explanation of what you found and what remains opaque.