Joomla Tutorials and Blog Posts

How to Prevent Google Search Engine from Crawling Your Joomla URLs Using Robots txt

One of our students was having trouble removing URLs from Google and received this message:

"Your request has been denied because the webmaster of the site hasn't applied the appropriate robots.txt file or meta tags to block us from indexing or archiving this page. Please work with the webmaster of this site or select an alternate removal option from the webpage removal request tool"

So we created this tutorial for him, which shows how to edit Joomla's robot.txt file to block search engines from crawling certain URLs, as well as the whole site if desired.

Access robots.txt in Joomla Root

tutuploadstutuploadsmedia_1297118190482.png

Access your host's file manager, e.g. cPanel, plesk, etc.

In the root of your Joomla installation you will find a robots.txt file which you need to open and edit.

Default robots.txt

By default Joomla's robots.txt file should contain these rules for security measures:

User-agent: *
Disallow: /administrator/
Disallow: /cache/
Disallow: /components/
Disallow: /images/
Disallow: /includes/
Disallow: /installation/
Disallow: /language/
Disallow: /libraries/
Disallow: /logs/
Disallow: /media/
Disallow: /modules/
Disallow: /plugins/
Disallow: /templates/
Disallow: /tmp/

Explanation:

  • User-agent: specifies which search engine crawler 
  • asterisk (*): specifies, in this case, that we want to disallow all search engine crawlers
  • Disallow: specifies that we don't want the user-agent to crawl this specific directory
  • pound (#): If you see a pound symbol, it is a comment for people to add clarification. In the subsequent example, I am going to add a few comments for clarification.

How to block

Disallow: /pathto/page.html # blocks just this page
Disallow: /pathto/page* # blocks just this page including all suffixes, e.g. .html, .php, etc.
Disallow: /pathto/* # blocks all pages under this directory

For example, if you want to block www.yoursite.com/clients/testimonials/business.html use:

Disallow: /clients/testimonials/business.html #

Example:

User-agent: *
Disallow: /administrator/
Disallow: /cache/
Disallow: /components/
Disallow: /images/
Disallow: /includes/
Disallow: /installation/
Disallow: /language/
Disallow: /libraries/
Disallow: /logs/
Disallow: /media/
Disallow: /modules/
Disallow: /plugins/
Disallow: /templates/
Disallow: /tmp/
Disallow: /clients/testimonials/business.html # 

Once you are done, save the robots.txt file.

Comments

 
Manuel
#1 Manuel 2011-11-07 14:46

Good article, But what happen if I disable SEF in joomla, yoursite.com/.../business.html for sure would gets another and long name. it means should I add then two version of the entry?
 
 
Nick
#2 Nick 2011-11-07 20:43

Hi Manuel,

Thanks for visiting!

Yes, that's correct.

Kind regards,
Nick
 
 
Jon Openshaw
#3 Jon Openshaw 2012-07-24 19:58

What's the significance of the hash in line 16?
 
 
Ante
#4 Ante 2013-03-28 12:25

Hi,

Is there a way to allow only URLs with .html, and redirect those without it?

Ante.
 
 
Suchet
#5 Suchet 2013-04-07 21:11

Hi I have a question - I have a joomla website that has got a page rank 2 for its first page - however all the rest of the pages on the website have no page rank -

I am linking to the pages through the menu - someone has told me that the links need to be in articles as html links on the page -- but that wouldn't look right on a website -

I am puzzled -
 

Add comment


Security code
Refresh

blog-ad

Start Online Training

Members get access to all our video training. That's 1,142 training sessions in Joomla, Drupal, WordPress and Coding.

Latest Comments

The License for Our Tutorials

All of our tutorials are published under the Creative Commons Attribution-NonCommercial license. This means:

  • You can re-use these tutorials.
  • You can modify these tutorials.
  • You must link back to our original tutorial.
  • You can't use these tutorials commercially.

Click here to read the full license.

Open Source Training is not affiliated with or endorsed by the Joomla, WordPress or Drupal projects.
All product names and trademarks are the property of their respective owners.

Copyright 2013 Open Source Training, LLC. All rights reserved.