Check if your Site's URLs Can Be Indexed or Not with Robots.txt Tester

By default, all the URLs on your site can be crawled by Google. However, if we don't want Google to index some specific pages, you can use a robots.txt file.

In your Robots.txt file, you can request that Google not index certain pages by using this "disallow" rule:

Disallow: /dont-scan-this-url/

In this post, I'll show you how to use Google Search Console to check whether you have successfully blocked Google from indexing a particular URL.

Step #1. Check if a URL is disallowed

In order to use this tool, you will need to have your site verified in Google's Search Console.

  • Go to the Robots.txt Tester page.
  • Choose a verified property from the list. If your site is not listed, click "Add property now"; continue that process and come back to this tutorial when it's done.

Robots.txt Tester

The next screen will load the content from your robots.txt file, located at www.yoursite.any/robots.txt. The location of this file will be the same, whether you use WordPress, Drupal, Joomla, or another platform.

Robots.txt Tester

Below, type a URL to confirm if indeed was disallowed correctly in robots.txt.

Robots.txt Tester

  • Choose the search bot; leave "Googlebot" by default.
  • Click the "Test" button.

Robots.txt Tester

If a Disallow rule matches this URL, it will be shown in red, and "Test" switch to "Blocked".

Robots.txt Tester

This confirms the tested URL won't be indexed by Google.

How to Disallow URLs with a pattern

It is easy to disallow a single URL, however, what about disallowing a bunch of URLs that match a pattern?

Let's clarify with an example. I want to disallow these pages:

  • www.yoursite.any/en/component/content/
  • www.yoursite.any/en/component/weblinks/
  • www.yoursite.any/fr/component/content/
  • www.yoursite.any/fr/component/weblinks/

Certainly, I could just add 4 lines in the robots.txt file, one for each URL. But I can get the same result with a single line that targets all those pages by taking advantage of the pattern:

Disallow: /*/component/*

The syntax would match the 4 URLs above. In this context * are variables that replace the bold characters from the above list of pages.

Confirm the rule works using the process we explained in Step 1. I my example, the 4 URLs are successfully disallowed by the same rule:

  • www.yoursite.any/en/component/content/

Robots.txt Tester

  • www.yoursite.any/en/component/weblinks/

Robots.txt Tester

  • www.yoursite.any/fr/component/content/

Robots.txt Tester

  • www.yoursite.any/fr/component/weblinks/

Robots.txt Tester


About the author

Valentín creates beautiful designs from amongst the tequila plants of Jalisco, Mexico. You can see Valentín's design work all over this site and you can often find him helping members in support.