In the tiny kingdom of Bhutan, dozens of data scientists are perfecting artificial intelligence models from offices framed by majestic Himalayan peaks. iMerit’s employees aren’t there to train AI on rudimentary tasks like identifying “a brown cat on a windowsill” in an image. Instead, they’re teaching algorithms how to understand the anatomy of the human eye or how to detect changes in geospatial maps.
Backed by three Silicon Valley billionaires, iMerit is part of a growing group of companies building a more sophisticated, monetizable, and reliable version of AI, which is on track to add nearly $20 trillion to the global economy by 2030. As models get smarter, big businesses want to harness their power for increasingly specialized tasks, giving rise to dozens of data services startups dedicated to customizing applications in sectors like finance, healthcare, and defense.
There’s a lot at stake. While AI enthusiasm has spread across Silicon Valley, questions remain about whether the technology will actually be useful enough to pay for itself and ensure that AI model developers can make a profit. Of course, Nvidia Corp. has become the world’s most valuable company by selling AI chips. But the company’s biggest customers, including Microsoft Corp. and Alphabet Inc., are still losing money due to the exorbitant costs of building more advanced AI systems.
Radha Basu, founder and CEO of iMerit, drew parallels with the software coders who built the Internet, mobile phones and other modern tech platforms. “We are the coder equivalent of the AI revolution,” said the gray-haired entrepreneur, who is preparing to raise the next round of funding.
Bringing AI to cutting-edge expertise in disparate, sensitive and sometimes dangerous industries won’t be easy. The undertaking requires a deep bench of human experts willing to add to their daily work by training and refining models in technical fields.
In Kenya, a startup is developing technology to scan the bush for signs of predators. In Kazakhstan, medical experts are teaching models to identify the early stages of lung cancer. In India, Korea, Vietnam and elsewhere, linguists earning $65 an hour are helping models become proficient in languages other than English.
At iMerit, which employs 5,000 people in Bhutan, India and New Orleans, Yeshi Wangmo, a 23-year-old from a family of farmers, has spent years mastering a single task: correctly identifying weeds and debris in images of vast fields of corn and cotton. . Wearing colorful Bhutanese ghouls and kira wraps, Wangmo and his colleagues help companies like Deere & Co. subsidiary Blue River Technology build algorithms that improve accuracy when spraying pesticides and fertilizers, reducing usage by up to 90 percent.
“We’re seeing companies tackle more advanced but increasingly niche problems,” said Evan Lee, founder and CEO of data labeling solutions firm Datasaur Inc., whose clients include Netflix Inc. and the FBI. “Clients might need dentists who grew up in Tanzania or architects from France,” said Lee, whose teams work primarily out of Indonesia.
Data accuracy is a core part of their work. When ChatGPT launched two years ago, critics were quick to point out the platform’s flaws and shortcomings. Since then, many human experts have been hired to do quality control. The work is labor-intensive. Data labelers like Wangmo focus on scans, photos, videos, and text to build AI models. The goal is to improve generative AI systems that are trained on large data sets to analyze or create new content. Perfecting them closes the gap between AI’s potential capabilities and its actual performance in the real world.
Such skills are becoming increasingly important in high-stakes fields like those dealing with military intelligence, according to Kathleen Walch, director and general manager of research firm PMI Cognilytics.
Lower-level versions of this work are not new. The data services industry began about two decades ago. At the time, labelers based in places like the Philippines and India were mainly tagging small data sets that powered voice assistants on shopping websites or speech recognition for search engines. Critics worry that AI has created an exploitable underclass, pointing to salaries hovering around a few dollars a day in some pockets of the industry.
But as AI has improved over the years, most simple tasks have now been automated. Demand has shifted to hiring experts and paying them higher salaries and rates, though they are still far below the compensation packages for data scientists in Silicon Valley.
In India, radiologists who train AI models can earn as much as Rs. 1,00,000 ($1,200) for a few hours of work, said Hardik Dave, founder and chief executive of popular data labeling firm Indica AI. The average contractor earns about a third of that per month, he said.
Today, startups selling labeling services are attracting marquee investors. This summer, the biggest player, Scale AI, raised money from Meta Platforms Inc. and Amazon.com Inc. With a valuation of nearly $14 billion, the company has surpassed leading AI model builders like Mistral and Coher. In 2023, Sequoia’s list of the top 50 AI companies includes four labeling startups, up from just one the previous year. One company, Labelbox, is backed by Andreessen Horowitz and Kleiner Perkins. The other, Snorkel AI, is funded by Alphabet Inc.’s venture arm at a valuation of $1 billion.
More broadly, the data labeler market, which was worth about $20 billion in 2024, is projected to grow 20 percent annually through 2030, according to Grand View Research, a San Francisco-based market research firm.
The consequences of a mistake are also dire. An incorrectly labeled frame can cost a business millions of dollars, invite lawsuits, or even cause death. Cancer-scanning AI tools or self-driving cars are two sensitive areas.
“Less accurate AI can derail,” said Wendy Gonzalez, CEO of Los Gatos-based Samana, whose clients include Ford Motor Co. and Walmart Inc. “Businesses can’t afford that.”
Consider the alliance between Massachusetts General Hospital and Centaur Labs, a data-labeling startup with 50,000 freelancers based in the U.S., Kazakhstan and Vietnam.
In recent years, Boston-based Centaur Labs has improved the products it uses in hospitals, gradually bringing in highly skilled data experts. Some are related to everyday diseases. (The startup is working on a snoring-detection algorithm and an app for sleep apnea.) Others are straying into heavier topics, such as developing AI that can more accurately identify lung nodules on CT scans. Last month, the startup announced a capital injection from Accel, Y Combinator and others.
Polina Pilius, a radiologist in Kazakhstan, who teams up with a contractor for Centaur Labs.
Clients might need dentists who grew up in Tanzania or architects from France,” said Lee, whose teams work primarily out of Indonesia.
ReplyDelete