Wednesday

The Right Way To Make a Robots.txt file!


What is a robots.txt file? The robots.txt file is a simple text file that must be placed in your root directory (http://www.example.com/robots.txt). It tells the search engine spider which web pages on your website should be indexed and which web pages should be ignored.

You can use a simple text editor to create a robots.txt file. The content of a robots.txt file consists of so-called “records”.
A record contains the information for a special search engine. Each record consists of two fields: the user agent line and one or more Disallow lines.

Here’s an example:
User-agent: googlebot
Disallow: /cgi-bin/
Disallow: /uploads/
This robots.txt file would allow the “googlebot”, which is the search engine spider of Google, to retrieve every page from your site except for files from the “cgi-bin” directory. All files in the “cgi-bin” directory will be ignored by googlebot.

Which new commands is Google testing?
Webmasters have found out that Google seems to be experimenting with a Noindex commands for the robots.txt file. It basically seems to do the same as the Disallow command so it’s not clear why Google is using this command.
Other commands that might be tested by Google are Noarchive and Nofollow. However, none of these commands is official yet.

How does this affect your rankings on Google?
If you accidentally use the wrong commands then you might tell Google to go away although you want them to index your pages. For that reason, it is important that you check the content of your robotx.txt file.

How to check your robots.txt file
Open your web browser and enter http://www.yourdomain.com/robots.txt to view the contents of your robots txt file.

Here are the most important tips for a correct robots.txt file:
There are only two official commands for the robots.txt file: User-agent and Disallow. Do not use more commands than these.
Don’t change the order of the commands. Start with the user-agent line and then add the disallow commands:
  • User-agent: *
  • Disallow: /cgi-bin/
  • Disallow: /uploads/
Don’t use more than one directory in a Disallow line.
“Disallow: /support /cgi-bin/ /images/” does not work.
Use an extra Disallow line for every directory:
  • User-agent: *
  • Disallow: /support
  • Disallow: /cgi-bin/
  • Disallow: /images/
Be sure to use the right case. The file names on your server are case sensitve.
If the name of your directory is “Support“, don’t write “support” in the robots.txt file.

You can find user agent names in your log files by checking for requests to robots.txt. Usually, all search engine spiders should be given the same rights. To do that, use User-agent: * in your robots.txt file.

What happens if you don’t have a robots.txt file?
If your website doesn’t have a robots.txt file (you can check this by entering your http://www.yourdomain.com/robotx.txt in your web browser) then search engines will automatically index everything they can find on your site.

Checking your robots.txt file is important if you want search engines to index your web pages. However, indexing alone is not enough. You must also make sure that search engines find what they’re looking for when they index your pages.