Creating and Using Your Website’s Robots.txt File — Optimizing Server Resource Consumption

A “robots.txt” file is a text document located in the root directory of your website that specifies which pages should or should not be indexed by web browsers and search engines. Here are the basic steps to create a robots.txt file:

1. Create a text file using a text editor and save it as “robots.txt”

2. Edit the file’s content as follows:

User-agent: *
Disallow: /gizli/
Allow: /izinli/

In this example, the “*” (asterisk) represents a rule that applies to all search engines. The “Disallow” command specifies the directory you want to block, while the “Allow” command specifies the specific directories within the blocked directory that you want to permit indexing. In this case, “/private/” is blocked, and “/public/” is allowed for indexing by search engines.

3. Upload the robots.txt file to the root directory of your website. It should be accessible at the end of your website’s main URL (e.g., “https://ahmetorhan.com/robots.txt“).

4. To verify the changes, you can access Google Search Console or the webmaster tools of other search engines.

Now, let’s examine the example below:

User-agent: Googlebot
Disallow:

User-agent: AdsBot-Google
Disallow:

User-agent: Googlebot-Image
Disallow:

User-agent: yandex
Disallow:

User-agent: uptimebot
Disallow:

User-agent: Amazonbot 
Disallow: /do-not-crawl

User-agent: PetalBot
Disallow: /

User-agent: Applebot
Disallow: /not-allowed/

User-agent: GPTBot
Disallow: /

An example robots.txt file contains rules for blocking or allowing directories specific to different user agents (web browser robots). Let’s examine this example in detail:

`User-agent: Googlebot`

This rule is for Googlebot. It targets Google’s web crawling robot.
It does not block any directories, meaning all pages can be indexed.

2. `User-agent: AdsBot-Google`

This rule is for Google AdsBot. It targets the robot used for Google’s advertising services.
It does not block any directories, so all pages can be indexed.

3. `User-agent: Googlebot-Image`

This rule is for Googlebot-Image. It targets Google’s image crawling robot.
It does not block any directories, allowing all images to be indexed.

4. `User-agent: Yandex`

This rule is for Yandex. It targets Yandex’s search engine robot.
It does not block any directories, meaning all pages can be indexed.

5. `User-agent: uptimebot`

This rule is for uptimebot, a robot responsible for checking the uptime of a specific service.
It does not block any directories, allowing all pages to be indexed.

6. `User-agent: Amazonbot`

This rule is for Amazonbot. It targets Amazon’s proprietary robot.
It blocks the “/do-not-crawl” directory, preventing pages in this directory from being indexed.

7. `User-agent: PetalBot`

This rule is for PetalBot. It targets Huawei’s Petal Search engine’s robot.
It blocks all directories, preventing all pages from being indexed.

8. `User-agent: Applebot`

This rule is for Applebot. It targets Apple’s search engine robot.
It blocks the “/not-allowed/” directory, preventing pages in this directory from being indexed.

9. `User-agent: GPTBot`

This rule is for GPTBot, targeting it.
It blocks all directories, preventing all pages from being indexed.

In conclusion, by making adjustments in the “robots.txt” file, you can effectively mitigate unnecessary resource consumption on your web server and associated database servers. You can also ensure that the most relevant bots are allowed, leading to more accurate results.

If you have any questions or details you would like to add, feel free to write me.

Leave a Reply Cancel reply