The unprecedented scale of Aadhaar’s data will make managing it extraordinarily difficult. One of Nadhamuni’s most important tasks is de-duplication, ensuring that each record in the database is matched to one and only one person. That’s crucial to keep scammers from enrolling multiple times under different names to double-dip on their benefits. To guard against that, the agency needs to check all 10 fingers and both irises of each person against those of everyone else. In a few years, when the database contains 600 million people and is taking in 1 million more per day, Nadhamuni says, they’ll need to run about 14 billion matches per second. “That’s enormous,” he says.
Coping with that load takes more than just adding extra servers. Even Nadhamuni isn’t sure how big the ultimate server farm will be. He isn’t even totally sure how to work it yet. “Technology doesn’t scale that elegantly,” he says. “The problems you have at 100 million are different from problems you have at 500 million.” And Aadhaar won’t know what those problems are until they show up. As the system grows, different components slow down in different ways. There might be programming flaws that delay each request by an amount too tiny to notice when you’re running a small number of queries—but when you get into the millions, those tiny delays add up to a major issue. When the system was first activated, Nadhamuni says, he and his team were querying their database, created with the ubiquitous software MySQL, about 5,000 times a day and getting answers back in a fraction of a second. But when they leaped up to 20,000 queries, the lag time rose dramatically. The engineers eventually figured out that they needed to run more copies of MySQL in parallel; software, not hardware, was the bottleneck. “It’s like you’ve got a car with a Hyundai engine, and up to 30 miles per hour it does fine,” Nadhamuni says. “But when you go faster, the nuts and bolts fall off and you go, whoa, I need a Ferrari engine. But for us, it’s not like there are a dozen engines and we can just pick the fastest one. We are building these engines as we go along.”
Using both fingerprints and irises, of course, makes the task tremendously more complex. But irises are useful to identify the millions of adult Indians whose finger pads have been worn smooth by years of manual labor, and for children under 16, whose fingerprints are still developing. Identifying someone by their fingerprints works only about 95 percent of the time, says R. S. Sharma, the agency’s director general. Using prints plus irises boosts the rate to 99 percent.
That 1 percent error rate sounds pretty good until you consider that in India it means 12 million people could end up with faulty records. And given the fallibility of little-educated technicians in a poor country, the number could be even higher. A small MIT study of data entry on electronic forms by Indian health care workers found an error rate of 4.2 percent. In fact, at one point during my visit to Gagenahalli, Nadhamuni shows me the receipt given to a woman after her enrollment; I point out that it lists her as a man. A tad flustered, Nadhamuni assures me that there are procedures for people to get their records corrected. “Perfect solutions don’t exist,” Nilekani says, “but this is a substantial improvement over the way things are now.”