Keeping Google out of the WordPress backend

by Steve Gerencser

spider-botsOver the past few months we’ve noticed that more and more pages from inside the Wordpress backend are finding their way in to the Google index. This has always been a problem, but as Google seems to index more useless pages, and crackers get more sophisticated at finding vulnerabilities in Wordpress modules, it is important to protect your site from both the crackers and Google.

So what is the real harm?

The most obvious, and urgent harm, comes from exposing your website to potential comprimise. If a vulnerability is found in a Wordpress plugin it can take just a few seconds to find a host of web sitesĀ  to attack. Using Google’s inurl command a simple search of inurl:wp-content/plugins returns more than 8 million results for a cracker to start his or her search for likely targets.

wp-content-inurl

A dedicated cracker will comprimise your site, but there is no reason to make it easy for them.

Another less obvious problem is created by Google itself. In just this one simple search we’ve seen more than 8 million web pages that have no reason to be in the index. They serve no useful purpose other than to show how invasive Google can be with it’s crawler. It also demonstrates a duplicate content issue that needs to be addressed.

The real problem, however, is the harm this can cause each website this happens to.

It is known that Google may not index all of the pages in a website for various reasons. Assume you have a website with 100 pages. Yet Google decided to index 30 pages of your /wp-content or wp-admin/ folders. You have lost the postential for 30% of your pages to be indexed in favor of pages that should never have been in the index at all. I have seen sites with more than 50% of their indexed pages coming from the back end of Wordpress.

What can you do about it?

There are two things that you should do to help secure your site from search engines exploring where they don’t belong.

1. Robots.txt: With every Wordpress install I do these days I add this to my robots.txt file.

User-agent: *
Disallow: /blog/wp-admin/
Disallow: /blog/wp-content/
Disallow: /blog/wp-includes/

Be sure to adjust the URL for your site’s install folders.

2. Google’s Webmaster Tools: If you find these pages indexed for your site first install the robots.txt file. Once that is done you should enter your GWT account and remove those pages from the index. Once removed the robots.txt should keep them from being re-indexed.

Unfortunately, from then on you will see and error message in your GWT account. You can ignore this error.

What does all of this tell us? The biggest thing it tells us is that the Google spiders are not as smart as everyone, including Google, would like us to believe. Indexing these pages serves no purpose, and it shows that the bots can and will go to places that they really should not be in and you must be proactive in protecting your website from them. A person would know that there is no reason to index more than 8 million of the exact same pages. An algorithm cannot make that decision.

Share and Enjoy:
  • Digg
  • Sphinn
  • del.icio.us
  • Facebook
  • Mixx
  • Google
  • Pownce
  • Propeller
  • Reddit
  • StumbleUpon
  • Technorati
  • TwitThis
  1. 2 Responses to “Keeping Google out of the WordPress backend”

  2. By Found By Design Websites on Aug 13, 2009 | Reply

    Steve,
    This is a security issue too many don’t even know about. I have seen a suggestion that Wordpress developers should allow users to designate the URL structure for the backend as well, maybe by easily renaming key files.
    While it may be a bit more time consuming and will not allow “automatic updates” I have begun the practice of renaming such files. This requires a massive search and replace, but it does help sure up any WP installation I use!
    Great job bringing this to light!
    Ed

  3. By Daryll Coffman on Dec 19, 2009 | Reply

    Thanks for this tip. I just searched Google for my site and found 141 index pages… I am no where close to that many pages but when i looked i found wp-includes and a lot of other wp pages index. I searched the internet for about an hour before i found somewhere to help me get rid of these.

    Thanks
    Daryll

Post a Comment