SEO techniques -Tips
You can find millions of Web site on the internet and its number is fast growing.In such a scenario We need to think about the possibilities of some pretty good strategies that make your site viewable to the Web world.
What is a Web Search Engine
A Web search engine is a search engine designed to search for information on the World Wide Web. Information may consist of web pages, images and other types of files.Commonly used search engines are Yahoo,Google,Msn,Altavista....
How Web Search Engines വര്ക്ക്
A search engine operates, in the following order
1. Web crawling
2. Indexing
3. Searching
Web search engines work by storing information about many web pages, which they retrieve from the WWW itself. These pages are retrieved by a Web crawler (sometimes also known as a spider) An automated Web browser which follows every link it sees. The contents of each page are then analyzed to determine how it should be indexed (for example, words are extracted from the titles, headings, or special fields called meta tags). Data about web pages are stored in an index database for use in later queries. Some search engines, such as Google, store all or part of the source page (referred to as a cache) as well as information about the web pages, whereas others, such as AltaVista, store every word of every page they find.
When a user enters a query into a search engine (typically by using key words), the engine examines its index and provides a listing of best-matching web pages according to its criteria, usually with a short summary containing the document's title and sometimes parts of the text.
Crawlers A web crawler is a program which automatically traverses the web by downloading documents and following links from page to page . They are mainly used by web search engines to gather data for indexing. Web crawlers are also known as spiders, robots, bots etc.
How Crawlers/Spiders work
Crawler-based search engines have three major elements. First is the spider, also called the crawler. The spider visits a web page, reads it, and then follows links to other pages within the site. This is what it means when someone refers to a site being "spidered" or "crawled." The spider returns to the site on a regular basis, such as every month or two, to look for changes.
Everything the spider finds goes into the second part of the search engine, the index. The index, sometimes called the catalog, is like a giant book containing a copy of every web page that the spider finds. If a web page changes, then this book is updated with new information.
Sometimes it can take a while for new pages or changes that the spider finds to be added to the index. Thus, a web page may have been "spidered" but not yet "indexed." Until it is indexed -- added to the index -- it is not available to those searching with the search engine.
Search engine software is the third part of a search engine. This is the program that sifts through the millions of pages recorded in the index to find matches to a search and rank them in order of what it believes is most relevant.
How to exclude site pages from Indexing
Exclusions can be made by the use of robots.txt. Based on the specifications in robot.txt
the specified files or directory will stay hidden from Indexing
A Sample robot.txt file
Here is what your robots.txt file should look like; _____________________________________________________________
# Robots.txt file created by http://www.webtoolcentral.com
# For domain: http://192.168.0.213
# All robots will spider the domain User-agent: * Disallow:
# Disallow Crawler V 0.2.1 admin@crawler.de
User-agent: Crawler V 0.2.1 admin@crawler.de Disallow: /
# Disallow Scooter/1.0 User-agent: Scooter/1.0 Disallow: /
# Disallow directory /cgi-bin/ User-agent: * Disallow: /cgi-bin/
# Disallow directory /images/ User-agent: * Disallow: /images/
______________________________________________________________
put this file in your root directory..
0 comments
Post a Comment
Please put your comments here. your questions, your suggestions, also what went wrong with me.