Block Unwanted Bots, Spiders, Crawlers etc from Magento Site

If you need an all-purpose way to block certain crawlers, bots, scrapers etc from your Magento site, here’s a simple way to get the job done.

Backup your .htaccess file and add the folowing lines at the top of the file:

Block unwanted Crawler Bots #####

RewriteEngine On
RewriteCond %{HTTP_USER_AGENT} ^.*(Baidu|spider|Yandex|robot|crawl|wget).*$ [NC]
RewriteCond %{HTTP_USER_AGENT} !^.*(Google|MSN|Yahoo|Bing).*$ [NC]
RewriteRule .* – [F,L]

RewriteEngine On
RewriteCond %{REMOTE_HOST} ^.*(host1|host2|host3).*$ [NC]
RewriteRule .* – [F,L]

######################################

 

  1. The first condition will be satisfied in the user agent string contains any of the text enclosed in the parentheses.
  2. The second condition excludes (allows) any string with the enclosed text (for example, if the user agent is  “Bing Robot”, if would be allowed because although it blocked it on condition #1, it also negated it from the block because “Bing” is also there – in other words, it’s ok because of the “Bing” (the “!^.*” is key – “!” is “not”)
  3. If you want to block specific hosts, you can also add the 2nd block of code

Note: the [NC] is “no case”, so “Google” or “google” would be allowed on the site

Modify the criteria as fit, upload to your site and then check your work with any user-agent switcher (ie the chrome add-on extension)

 

Leave a Reply