| An Internet search engine is a tool specially-designed to search for information on the Internet. The search results are commonly presented in the form of a list and are commonly called hits. The data may comprise web pages, images, information and other types of files. Some search tools also gather data available in databases or open directories. In comparison to Internet directories which are maintained by human editors, search engines function algorithmically or are a mixture of human and algorithmic input.
Internet search tools operate by storing data about a huge number of web pages which they retrieve from the WWW. These pages are retrieved by a web crawler, or differently called a spider. It is an automatically-controlled Web browser which follows every link it sees. Afterwards the content of each page is analyzed to determine how to index it. Words, for instance, are extracted from titles, headings or special fields called meta tags. Data about web pages are saved and stored in an index catalogue for further use in queries. Some search engines, such as Google, save and store the whole or part of the source page (also called a cache) as well as information about web pages, while others, such as AltaVista, save and store every word of every page they have found. The cached page always comprises the initial search text, because it is the one that was actually indexed. So, it can be very useful since it holds information that may no longer be found anywhere else on the Internet.
Once a user has typed search terms in the search field, the tool carries out checks on its catalogue and provides a listing of the most suitable web pages according to its criteria, commonly with a brief summary containing the document's title and at times excerpts from the text. Some search engines offer an advanced feature called proximity search which allows users to determine the length between search terms.
The usefulness of a search engine rests on the relevance of the results it gives back. Since there may be millions of web pages that include a particular search word or phrase, web pages can be divided into relevant and irrelevant ones. Most search tools employ methods to rank the results to display the "best" results first.
The way a search engine ranks web pages varies from one engine to another. The methods also alter in time, because the use of Internet services alters and advanced techniques emerge. |