The readings covered the structure of web search services. Modern search engines use a sophisticated combination of crawling algorithms, parallelism, and filtering to crawl web pages. Indexing algorithms then collect significant terms on the web page, as well as additional information such as the frequency the term appears and its position. A query processing algorithm then returns documents from the index that contain all the search terms where possible. Search engines use strategies to make the results more accurate and to speed queries up.
Most search engines do not crawl the "deep web," that is, the temporary web pages that are produced as a result of searches on commercial websites, etc. There is many times more information in the deep web than on the surface web, but most browsers do not search it because it would take to much time to create searches of these sites then index them. Some search engines, such as BrightPlanet attempt to search the deep web with the belief that there are more quality results to be found there than on the surface web.
Friday, November 7, 2008
Subscribe to:
Post Comments (Atom)

2 comments:
I thought the article about the deep web was interesting. I just don't understand why there are not more projects researching it if it is known that the information is much better. I would think it would be much more beneficial for everyone.
hey eric,
i did not know about bright planet. it seems like it is a pretty valuable resources... thanks for sharing
Post a Comment