Are you new to SEO and trying to learn how to crawl all pages of your website? We are going to explain the purpose of the robots.txt file and also share the standard rules that you might want to use to communicate with search engine robots, like Googlebot.
What is Robots.txt?
The primary purpose of the robots.txt file is to restrict access to your website by search engine robots or Bots. The file is quite literally a simple dot txt text file that can be opened and created in almost any notepad HTML editor or word processor. Google defines robots.txt files with these words: ” A robots.txt file tells search engine crawlers which pages or files the crawler can or can’t request from your site. This is used mainly to avoid overloading your site with requests.”
To make a start, name your file robots.txt and add it to the root layer of your website. This is quite important as all of the main reputable search engine spiders will automatically look for this file to take instruction before crawling your website.
What Codes Should You Write in Robots.txt File?
Now that you know the basic definition of robots.txt file, let’s give some examples with visuals.
So here’s how the file should look. To start with, on the very first line, add user agent colon star. This first command primarily addresses the instructions to all search bots. Once you have adjusted a specific, or in our case, with the asterisks all search bots, you come onto the allow and disallow commands that you can use to specify your restrictions. To simply ban the box from the entire website directory, including the homepage, you will add the following code: Disallow with a capital D, colon space and a forward slash. This first forward slash represents the root layer of your website.
In most cases, you don’t want to restrict your entire website; you will work on just specific folders or files. To do this, you specify each restriction on its own line, proceed with the disallow command. In our example, you can see the necessary code to restrict access to a folder called admin along with everything inside it. If you’re looking to restrict individual Pages or files, the format is very similar. For example, Line 4. We’re not restricting their entire secure folder, just one HTML file within it.
Also, you should bear in mind that these directives are case sensitive. You need to ensure that you specify on your robots.txt file the exact match against the file or folder name of your website. The first example showed you the basics. In contrast, we come onto slightly more advanced pattern matching techniques next.
They can be handy if you’re looking to block files or directories in bulk. And they can do so without having lines of commands on your robots file. These bulk commands are known as pattern matching. One example for the most common one that you might want to use would be to restrict access to all dynamically generated URLs that contain the question mark. For example, check Line 5. All you need to do to catch all of this type is the forward-slash asterisks and then a question mark symbol.
You can also use pattern matching to block access to all directories that begin with admin, for example. Furthermore, If you check Line 6, you’ll see how we’ve again used an asterisk to match all directories that start with admin. Have the following folders: “admin – panel,” “admin – files,” and “admin – secure” on your root directory. This line will block access to all three of these folders as they all start with “admin.”
The final pattern matching command is the ability to identify and restrict access to all files that end with a specific extension. Finally, On Line 7, you’ll see how the single short command. It will instruct the search engine Bots and spiders not to crawl and ideally cache any pages that contain the dot PHP extension.
So after your first forward slash, use the asterisks followed by a full stop and then the extension. To signify an extension instead of a string, you conclude the command with the dollar sign. As a result, the dollar sign will tell the Bots that the file to be restricted must end with this extension.
If you want to direct which pages and images will be crawled and which ones won’t even show, you need to use it. As a result, we can certainly say that robots.txt file is crucial for SEO.
It should contain Allow and Disallow commands.
It must be located at the root of your website host.
And this sums up our guide. Using those six different techniques, you should be able to put together a well-optimized robots.txt file. It can flag content to search engines that you don’t wish to be crawled or cached. We must point out that website hackers often check robots.txt files. So they can indicate a security vulnerability that may lie, which they might want to throw themselves at. Always be sure to password-protect and test the security of your Dynamic Pages. Particularly if you’re advertising their location on a robots.txt file.
Thank you for reading our article, we are glad if we could help you learn more about robots.txt files. If you want to read more about SEO related stuff, check our article about Sitemaps. Also, to learn more about robots.txt files, you can check Google Support Page, or this video out by The Big Marketer: