New ways to search the web
Apr 24, 1998
Trying to find the 'right' information on the Internet's 320 million pages is becoming increasingly difficult. A new search program called HITS (Hyperlink-Induced Topic Search) developed by Jon Kleinberg at Cornell University in the US could finally sort out which pages are most valuable to the user.
Search engines such as Hotbot and AltaVista record a list of pages with keywords in them, but cannot interpret how relevant the information is on the page. For example the word 'quantum' returns 778, 930 references through AltaVista, but doesn't state how many are physics related.
Kleinberg's program analyzes the links between Web sites by splitting the search algorithm into two sections: 'authorities' that have useful information about a topic; and 'hubs' that contain directory links to the topic. The best authorities, Kleinberg says, will be those that point to the best hubs, and the best hubs will be the ones that point to the best authorities. The relationship is recalculated several times to improve accuracy and to prevent circular patterns forming.
A search on HITS starts by collecting the first 200 pages from a keyword query on AltaVista. The program then looks at all the additional pages linked to this 'root' set and how these pages are connected together. Pages which are pointed to by many Web sites are assigned a high authority 'score', while pages that link to many external sites are given an additional hub 'score'. These last two calculations are repeated several times, with each cycle awarding more authority points to sites that link to high scoring hub sites, and more hub points to high scoring authority sites. Ten repetitions, Kleinberg says, are enough to return surprisingly focused lists of authorities and hubs.
The program is especially useful with vague terms such as 'quantum' as it will then sort the data into different communities, such as physics, software companies, equipment and so on, or with sites that have extreme views, such as 'cold fusion'. This is because each group will tend to link to similar groups, and not their opposing members.