Search Engines
29-49-2002
By: Gary McManus
The best thing about the Internet is that there are oodles of pages out there referencing all sorts of information. Unfortunately the worst thing about the Internet is that there are oodles of pages out there referencing all sorts of information. So how do I retrieve the information I need? Answer: Search Engines.
In the following series of articles I will endeavor to explain the concept of search engines and their use, both for retrieval and submission of information.
What is a Search Engine?
A search engine is a system that contains a coordinated set of programs that allow users to enter a search request and return a list of pages that reference this request. These programs include:
* A spider program that goes to every page on every Web site that wants to be searchable and reads it
* A program that creates an index (catalogue) from all the pages that have been read by the spider program
* A program that receives your search request, compares it to the entries in its search index, and returns resulting references to you
There are two primary methods of searching, keyword and concept. The most common method is keyword, with concept offering more of a challenge to search engine companies.
Keyword based systems perform their text query using retrieval of keywords. The web page developer can specify words for indexing, or the search engine, using a predefined method (e.g. first 20 lines) indexes the pages. Most search engines these days will index every word on every page, whereas others will index only certain parts of a page. These indexes can be built up using a subset data on the page (e.g. title, headings and subheadings, links or the first 'x' words in the document).
Concept based systems try to determine what you mean as opposed to what you say. These systems return references to documents that are 'about' the search request as opposed to exactly what you specified. Using this method, words are examined in relation to other words found nearby on the page. These methods use sophisticated linguistic and artificial intelligence theories to perform the search (too complicated to start explaining here). When certain words and phrases occur close together in a document the system concludes by statistical analysis that the document is 'about' a certain topic.
In the next article I will explain how to maximise the use of search engines, and retrieve pages most appropriate to your search area.
The TSSG recommends using Google (http://www.google.com) for keyword searches (it also has a downloadable taskbar for Microsoft's Internet Explorer which allows you to search at any time without having to go to the Google home page first, see http://taskbar.google.com/). For browsing through broad categories of information the TSSG would currently recommend using Yahoo! (either http://www.yahoo.com for the World, or http://www.yahoo.co.uk for the UK and Ireland). Interestingly, Yahoo! actually uses Google for their own keyword searches!
|