The Deep Web
The Deep Web, also known as the "Invisible Web," refers to the pages on the World Wide Web that are not indexed by conventional search engines such as Google or Yahoo. This tutorial explains the differences between the deep web and the surface web, how to access the deep web, and what kind of information is found in the deep web.
Deep Web vs. Surface WebWhen you use a search engine like Google or Yahoo!, the information you get back is sometimes referred to as the "Surface Web" or the "Visible Web." However, there's a lot more information out there - There are millions of web pages that Google and Yahoo can't find. That's the Deep Web.
For example, a Google search will not pick up all information in the Library of Congress web pages. To find those web pages you would have to go to the Library of Congress home page and perform a search there.
Why can't you find those pages with your Google search? Deep web pages cannot be found by search engines like Google because they are within specialized databases; typical search engines just aren't able to access them. The Deep Web is made up of valuable material, like the information within the Library of Congress web pages. In January 2006, Marcus P. Zillman wrote "the Deep Web covers somewhere in the vicinity of 900 billion pages of information located through the world wide web in various files and formats that the current search engines on the Internet either cannot find or have difficulty accessing. The current search engines find about 8 billion pages".
What can you find within the deep web?Directories are part of the deep web. These can include things like:
- phone books
- "people finders" such as lists of professionals; doctors or lawyers
- dictionary definitions
- items for sale in a Web store or on Web-based auctions
- digital exhibits
- multimedia and graphical files
- job postings
- available airline flights, hotel rooms, etc.
- stock and bond prices, market averages, etc.
Finding Information in the Deep WebMost of the deep web is made up of information found in specialized databases. Each of these databases can be searched, much like searching Google, but the results are often delivered to you in web pages that are made just in answer to your search. These pages are not stored anywhere, rather they are created "on the fly." It is easier and cheaper for these databases to spontaneously generate the answer page for each search than to store all the possible pages containing all the possible answers to all the possible searches people could make to the database. Search engines cannot find or create these pages.
The trick is knowing where to go to search for this information in the deep web. A list of places to find information from the deep web can be found at the LSU Libraries, Internet Searching - Deep Web Search Engines page.
For example, go to the Complete Planet website (www.completeplanet.com), a deep web directory. Click on the link that says "Music," and you'll a list of databases where you can search for deep web information dealing with music. You can narrow your results even further by only choosing one of the subtopics, such as "lyrics." One of the sites listed is "Lyrics Robot" (www.lyricsrobot.com), a lyrics specialty search engine which will give you a more complete results list than a Google search.
Subject focused search engines are also very useful in finding deep web information in certain fields. For example:
- The Agriculture Network Information Center - "a voluntary alliance and partnership of nearly 50 member institutions and organizations working to offer quick and reliable access to quality agricultural information and sources"
- Artcyclopedia - "the guide to great art online"
- Educator's Reference Desk - "providing high-quality resources and services to the education community"
In summary, the deep web:
- consists of pages that are not part of the World Wide Web that you see. This information is invisible to the user.
- is also known as the "invisible web"
- is stored in special databases.
- is NOT picked up by typical search engines.
- often includes sites that require registration or have limited access. This is so that search engines can't browse them. For example, you have to log on to databases like Academic Search Premier to search or browse.
- contains information such as directories and breaking news.