2.6 Search engines
The web contains a huge amount of material. Finding specific information is a huge problem; even in the early days of the internet this was a problem which threatened to slow the growth of the net. Fortunately a partial solution to this problem emerged: the search engine. This is a program which accesses a huge database of information about the World Wide Web; it contains individual words in web documents and the location of the documents containing the words. When the user of a search engine wants to find any document they type a query: a series of keywords joined by Boolean connectives such as ‘and’ and ‘or’ or, in some cases, a natural language sentence. For example, the query
Java & compiler
would return with the addresses of all those web documents which contain the words ‘Java’ and ‘compiler’.
In order to build up a search database the search engine will employ a program known as a spider. This will visit a website and access the web documents stored there, keep track of the address of the documents and the words that are stored in them and update the search engine's database. Spiders will not visit websites randomly: they will only visit those sites whose developers inform the search engine they want them linked to the engine's database. A developer will interact with the search engine site by requesting and filling in a form; this form will normally just ask for the address of the document to be indexed and a contact email address. After a few seconds the spider will visit the website and start the indexing process; usually, after a week or two, details of the website are added to the search engine's database. This is a description of those search engines which carry out automatic indexing. There are a few general search engines where the indexing is done manually by trained indexers. Two examples of this type of search engine are Yahoo and Ask.
One of the tricks used by companies to make sure that their website is placed first when a search engine retrieves the results of a query is to include certain keywords a large number of times in the web documents in the site. For example, the word ‘Java’ repeated a large number of times will ensure that the site is displayed prominently when any search using this word takes place. This technique is known as spamdexing. Companies go to huge lengths to disguise spamdexing as search engine companies look upon the practice with huge disdain and will de-list any spamdexed pages. One technique that is used is to have a web page displayed with a graphic that has a coloured background and write the repeated words in the same layer and in the same font as the background.
Search engines are big business on the internet. They mainly make money by displaying banner adverts or sponsored links. There are a wide variety of search engines on the internet ranging from those which catalogue any website to specialised search engines which catalogue websites which address a single area such as Shakespearean studies or the LINUX operating system.