2 Replies - 7807 Views - Last Post: 27 August 2010 - 02:29 AM

#1 Guest_MadLogger*


Reputation:

Trying to block ALL bots, crawlers, etc with .htaccess

Posted 27 August 2010 - 01:57 AM

Hi,
I am trying to block all bots and crawlers from accessing my site, including good bots like googlebot, etc.
I just want actual people to be able to access my files as I must monitor closely the file accesses from my raw access logs, but bots keep polluting my logs by requesting robot.txt and other things so I just want to block them out. Annoying sneaking things.

I really am not skilled at all with htaccess so I tried the following trying to make something out from multiple examples from the net :
Options +FollowSymlinks
RewriteEngine On
RewriteBase /
RewriteCond %{HTTP_USER_AGENT} ^.*(bot|crawler|spider|mail).*$ [NC]
RewriteRule ^(.*)$ [F,L]



Basically, I wanted to block anything that had the string "bot" or "crawler" or "spider" or "mail" in their user agent. It didn't seem to work as I saw yandex, baidu and googlebot show up.

I also tried this :
SetEnvIfNoCase User-Agent ".*(Yandex|bot).*" bad_bot
Order Deny,Allow
Deny from env=bad_bot


Which was attempting to do the same but didn't work either.

I would prefer this last method as it's said to be less resource intensive (blocking instead of rewriting)? But I am at loss as to how to do it.

Another thing is that for some reason, host names began to show in my logs instead of raw access logs. Why is this?
Anyway. I tried to block bots with their host name by doing the following :
deny from .*yandex\.ru.*
deny from .*baidu\.com.*
deny from .*search\.msn\.com.*
deny from .*googlebot\.com.*


But they kept showing up...


To sum it up I would like to know how I can block any user agent which has "some string" in it as well as blocking a host name like I tried above.

Help matter would be very appreciated.

Thank you

Is This A Good Question/Topic? 0

Replies To: Trying to block ALL bots, crawlers, etc with .htaccess

#2 no2pencil  Icon User is online

  • Toubabo Koomi
  • member icon

Reputation: 5247
  • View blog
  • Posts: 27,070
  • Joined: 10-May 07

Re: Trying to block ALL bots, crawlers, etc with .htaccess

Posted 27 August 2010 - 01:59 AM

Why not just use robot.txt? That's what's it's designed for.

User-agent: *
Disallow: /


Was This Post Helpful? 0
  • +
  • -

#3 Guest_Guest*


Reputation:

Re: Trying to block ALL bots, crawlers, etc with .htaccess

Posted 27 August 2010 - 02:29 AM

View Postno2pencil, on 27 August 2010 - 12:59 AM, said:

Why not just use robot.txt? That's what's it's designed for.

User-agent: *
Disallow: /



View PostMadLogger, on 27 August 2010 - 12:57 AM, said:

but bots keep polluting my logs by requesting robot.txt and other things so I just want to block them out.



And there are retarded robots like Yandex that do not obey it... nor even take a look at robot.txt.
Was This Post Helpful? 0

Page 1 of 1