Sunday, August 3, 2008

My Index is bigger than your Index

Should I start using Cuil search engine;

Cuil, the search engine that launched with widespread publicity Monday, claimed in the first paragraph of its introductory press release that it “has indexed 120 billion Web pages, three times more than any other search engine.” But it turns out that neither Cuil nor anyone else has enough information to verify that claim. It’s also not clear that a bigger index is better.

Representatives for Google, Yahoo and Microsoft — which together control 90% of the U.S. search market, according to Nielsen Online — told me they don’t reveal the size of their indexes. So Cuil based its claim on “past knowledge and tests,” Cuil spokesman Vince Sollitto told me. These tests could include counting search results for “searches for the intersection of rare words.” He added, “It is disappointing that others won’t state their index size publicly, as we think it is important that people know how much of the Web is being searched on their behalf.”

For Google, specifically, Cuil had some prior knowledge because its co-founder and president, Anna Patterson, previously worked on Google’s search index. “Anna knows how big it was when she built it a couple of years ago,” Mr. Solitto said.

But is it still that size? Last week, Google announced that it was processing one trillion unique links online. The Google index doesn’t include all of the pages found at these links. Mr. Solitto said Cuil’s research has found that the average Web page has almost 20 links, which would suggest that there are more than 50 billion pages in Google’s index. Google also removed an undisclosed number of duplicate pages.

No comments: