11 Replies - 1427 Views - Last Post: 26 September 2017 - 06:55 AM

#1 nesir28   User is offline

  • D.I.C Head

Reputation: 4
  • View blog
  • Posts: 53
  • Joined: 11-August 17

Using ht.access to block backlink checkers?

Posted 22 September 2017 - 07:12 AM

HI guys thanks for the help on the php , python issue .
I am here once again with a question ive seen sme strange activity on my analytics for my site , bots mainly and then urloperners hiting my site. I suspect somebody is checking my backlinks which inevitably would happen i know.

Now i havent come here with nothing i did some searching on the subject and there are about 7 sites that come up with the same two responses really ht.access block and a rbt.txt block . I will also add that many people have warned against using this as it might cause unforseen issues with search engines. So i am going to try it out on a dummy site and see if indeed se are affected.

Now i know i should have backlinks that are not easy to obtain and on this site that they are on they arent, not in a the usual sense but in the fact that it took me hours to get one or two of those links. Finding the right place was the hard part. I know i cant halt the progress of anybody and bot change but i dont want to just give away months work in a matter of days , i should atleast make it a bit of a task for my competitor right?

The code that i implement in my ht access file is as follows:


RewriteEngine on
RewriteCond %{HTTP_HOST} ^Mydomain\.co\.za$
RewriteCond %{REQUEST_URI} !^/[0-9]+\..+\.cpaneldcv$
RewriteCond %{REQUEST_URI} !^/[A-F0-9]{32}\.txt(?:\ Comodo\ DCV)?$
RewriteCond %{REQUEST_URI} !^/\.well-known/acme-challenge/[0-9a-zA-Z_-]+$
RewriteRule ^/?$ "http\:\/\/www\.Mydomain\.co\.za\/" [R=301,L]

RewriteBase /
RewriteCond %{HTTP_USER_AGENT} .*AhrefsBot.* [OR]
RewriteCond %{HTTP_USER_AGENT} .*SemrushBot.* [OR]
RewriteCond %{HTTP_USER_AGENT} .*MJ12Bot.* [OR]
RewriteCond %{HTTP_USER_AGENT} .*RogerBot.*
RewriteRule ^(.*)$ http://www.Dogfood.com/ [L,R=301]
Order Allow,Deny
Allow from all
Deny from 216.123.8.0/8
Deny from ....


<IfModule mod_deflate.c>
SetOutputFilter DEFLATE
<IfModule mod_setenvif.c>
# Netscape 4.x has some problems...
BrowserMatch ^Mozilla/4 gzip-only-text/html

# Netscape 4.06-4.08 have some more problems
BrowserMatch ^Mozilla/4\.0[678] no-gzip

# MSIE masquerades as Netscape, but it is fine
# BrowserMatch \bMSIE !no-gzip !gzip-only-text/html

# NOTE: Due to a bug in mod_setenvif up to Apache 2.0.48
# the above regex won't work. You can use the following
# workaround to get the desired effect:
BrowserMatch \bMSI[E] !no-gzip !gzip-only-text/html

# Don't compress images
SetEnvIfNoCase Request_URI .(?:gif|jpe?g|png)$ no-gzip dont-vary
</IfModule>
<IfModule mod_headers.c>
# Make sure proxies don't deliver the wrong content
Header append Vary User-Agent env=!dont-vary
</IfModule>
</IfModule>

However this brings me to a 503 error on my site when i try to access it i have played around with it for a bit and the results:
When i remove "Deny from ...." line i can access my site ( no 503) , but i can still do a back link check on my site.
Is that line supposed to have a value ip value ?
I tried t with a RewriteEngine on line above the second paragraph too although it seemed to me this wasnt needed.

Or are there some others faults in my code ive looked at all the suggested coding even on stack overflow and it seems to me to be correct , but at this i am very new i might be overlooking something really stupid.

Thanks for your assistance guys..

Is This A Good Question/Topic? 0
  • +

Replies To: Using ht.access to block backlink checkers?

#2 ArtificialSoldier   User is offline

  • D.I.C Lover
  • member icon

Reputation: 2221
  • View blog
  • Posts: 6,733
  • Joined: 15-January 14

Re: Using ht.access to block backlink checkers?

Posted 22 September 2017 - 10:07 AM

I don't understand the issue, why are you trying to block traffic on your site? It seems that if you have something online then you'd want people to access it, what's the problem with the traffic that you're trying to block?

Quote

When i remove "Deny from ...." line i can access my site ( no 503) , but i can still do a back link check on my site.
Is that line supposed to have a value ip value ?

If you literally have "Deny from ...." in that file, I don't see anything in the documentation which suggests that 4 periods is valid.
Was This Post Helpful? 0
  • +
  • -

#3 nesir28   User is offline

  • D.I.C Head

Reputation: 4
  • View blog
  • Posts: 53
  • Joined: 11-August 17

Re: Using ht.access to block backlink checkers?

Posted 22 September 2017 - 10:58 AM

View PostArtificialSoldier, on 22 September 2017 - 10:07 AM, said:

I don't understand the issue, why are you trying to block traffic on your site? It seems that if you have something online then you'd want people to access it, what's the problem with the traffic that you're trying to block?

Quote

When i remove "Deny from ...." line i can access my site ( no 503) , but i can still do a back link check on my site.
Is that line supposed to have a value ip value ?

If you literally have "Deny from ...." in that file, I don't see anything in the documentation which suggests that 4 periods is valid.


Thanks for your response well yes i figured that like i stated i dont know much about how the ht.acess file itself works ... im blocking bots from major crawlers as one of my competitors are starting to get links from sites that i have now. I know you cant block everything but i dont want to just hand it to than on a silver plate. Maybe that ip range iss not needed? this is the kinda of stuff i came to try figure out

Everything ive read sofar suggests things along these lines of coding but like i said it isnt working i can crawl my site with sem ahrefs ect ect. I pretty sure ive coded something wrong for the file
Was This Post Helpful? 0
  • +
  • -

#4 ArtificialSoldier   User is offline

  • D.I.C Lover
  • member icon

Reputation: 2221
  • View blog
  • Posts: 6,733
  • Joined: 15-January 14

Re: Using ht.access to block backlink checkers?

Posted 22 September 2017 - 11:43 AM

You can trim down some of those rules, I really doubt that you need things targeting Netscape 4, for example.

User agent blocking isn't all that reliable, because the user agent string is optional and whatever they want it to be. That will work for things which accurately identify themselves, but if some bot wants to tell your site that it's Google then it can. If you have a list of IP addresses or ranges then you can add an entry for each one, that should be pretty easy to find examples for. But try to understand what each line is doing, and remove any lines that you don't need.
Was This Post Helpful? 0
  • +
  • -

#5 nesir28   User is offline

  • D.I.C Head

Reputation: 4
  • View blog
  • Posts: 53
  • Joined: 11-August 17

Re: Using ht.access to block backlink checkers?

Posted 22 September 2017 - 11:47 AM

View PostArtificialSoldier, on 22 September 2017 - 11:43 AM, said:

You can trim down some of those rules, I really doubt that you need things targeting Netscape 4, for example.

User agent blocking isn't all that reliable, because the user agent string is optional and whatever they want it to be. That will work for things which accurately identify themselves, but if some bot wants to tell your site that it's Google then it can. If you have a list of IP addresses or ranges then you can add an entry for each one, that should be pretty easy to find examples for. But try to understand what each line is doing, and remove any lines that you don't need.


Thanks i dont really know what im doing i thought maybe there was a error somewhere in the coding thats causing it to not work , i do understand what the lines do the whole netscape part came default on my server the rel conical i did , the second paragraph ( blocking a few major bots is what i am struggling with art... im gona go tinker some more and see if i cant get my head around this
Was This Post Helpful? 0
  • +
  • -

#6 nesir28   User is offline

  • D.I.C Head

Reputation: 4
  • View blog
  • Posts: 53
  • Joined: 11-August 17

Re: Using ht.access to block backlink checkers?

Posted 23 September 2017 - 05:56 AM

ok ant so i spent most of yesterday learning the basics of htacess , the us of rewrite
s what the * ^ symbols mean when putting them before user agent strings and blocking by ip ip range and all that.

Now this has led me to another problem one which you foresaw i managed to block google ( As a test ) because i knew this is one bot which wont lie this was to assure myself that my coding works and redirects the intended bot it did great.... so i continued to try block mainly semrush, ahrefs and opensite user agents. It didnt work so i looked at my logs and saw that they arent identifying themselves ( As you said they would) . So they pass the ht access .

I then tried the ip ranges which was absolute hell to track and find and apparently these can be manipulated aswell ( by not giving information on who that ip is i dont know if im blocking something important or not.

to worsen my situation i also read that these guys keep a database once you have been scanned so in the case of a 403 wwhich is what im try give em they pull up the cache for the intended searcher...it seems im wasting my time with this.

One more solution that claims to work ..... paid plugins ( typical) :(
Was This Post Helpful? 0
  • +
  • -

#7 nesir28   User is offline

  • D.I.C Head

Reputation: 4
  • View blog
  • Posts: 53
  • Joined: 11-August 17

Re: Using ht.access to block backlink checkers?

Posted 23 September 2017 - 06:02 AM

Interestingly and i want your input on this i read that although these bots can ignore and bypass robot.txt and workaround htacess because of the explicit statement in the rbt.txt and ht.acess , it is cause for some kind of privacy violation and ive read of one gy who contacted semrush n particular to have his site listed as not to be crawled by any semrush bot and have his information removed from their database... whether its true i dont know but it kind of sounds like a good reason to me.
Was This Post Helpful? 0
  • +
  • -

#8 ArtificialSoldier   User is offline

  • D.I.C Lover
  • member icon

Reputation: 2221
  • View blog
  • Posts: 6,733
  • Joined: 15-January 14

Re: Using ht.access to block backlink checkers?

Posted 25 September 2017 - 11:49 AM

Like the user agent string, robots.txt is really only followed by the good actors. If someone doesn't want to abide by those rules, they won't. They can change their user agent string and ignore robots.txt, it's up to whoever programs the bot to decide whether they want to abide by the rules you've set. You can try to block by IP, but if they're using a service like Cloudflare VPN then you'll end up blocking the VPN endpoints, which may result in legitimate users being blocked.
Was This Post Helpful? 0
  • +
  • -

#9 nesir28   User is offline

  • D.I.C Head

Reputation: 4
  • View blog
  • Posts: 53
  • Joined: 11-August 17

Re: Using ht.access to block backlink checkers?

Posted 26 September 2017 - 03:55 AM

View PostArtificialSoldier, on 25 September 2017 - 11:49 AM, said:

Like the user agent string, robots.txt is really only followed by the good actors. If someone doesn't want to abide by those rules, they won't. They can change their user agent string and ignore robots.txt, it's up to whoever programs the bot to decide whether they want to abide by the rules you've set. You can try to block by IP, but if they're using a service like Cloudflare VPN then you'll end up blocking the VPN endpoints, which may result in legitimate users being blocked.


Yea well like i i realised this when i added google-bot to my list and tried fetching my website through gwt and all the redirects and 403's worked ... this means the bots arent identifying themselves as they should, i think it was you who told me if it wants to call itself googlebot then it can do that too. Yea i know they can ignore rbt.txt. Didnt know the same was the case for ht access.

The problem is simply identification througn ips names or whatever there are too many and this is where i think it gets risky adding massive amounts of ip ranges and strings to your files is sure to block something you dont want blocked.

Interestingly i see WordPress has spyder spanker and link privacy plugins. I see you can create a php file to identify the bot once it has crawled but since i believe they keep a database for any 403's or 301's they find to serve the searcher with something at least its a crappy battle.

Last root i see some people taking is deny all except a few select browsers or point links offsite and then redirecting them to their "money site".

It seems like a losing battle and a waste of productive time.

This post has been edited by nesir28: 26 September 2017 - 04:02 AM

Was This Post Helpful? 0
  • +
  • -

#10 no2pencil   User is offline

  • Professor Snuggly Pants
  • member icon

Reputation: 6727
  • View blog
  • Posts: 31,155
  • Joined: 10-May 07

Re: Using ht.access to block backlink checkers?

Posted 26 September 2017 - 06:45 AM

I would block bots & crawlers outside of the webservice. By the time you are blocking them with htaccess, your webserver has already eaten the bandwidth. Plus, if the bot tries to crawl 1000+ pages, your htaccess file eats 1000+ attempts to block them from content.

My suggestion is block them with a firewall & not allow the traffic to reach the web server process. If you can block it with a webserver that is prior to your OS, that's even better.

View Postnesir28, on 26 September 2017 - 06:55 AM, said:

The problem is simply identification througn ips names or whatever there are too many and this is where i think it gets risky adding massive amounts of ip ranges and strings to your files is sure to block something you dont want blocked.

Most bots, crawlers, & scrapers don't attempt to go rouge. I have a service that runs via crontab that twice daily grabs black list ips & imports into my firewall. Have been doing this for 10+ years, host a number of sites for companies, friends, & one government office, & have had ZERO requests to clear a blocked IP. Just to note, within these lists are entire range ips. I don't see any value in the worry of that one rare instance, vs allowing thousands of attempts to hit my site from known malicious ips.
Was This Post Helpful? 1
  • +
  • -

#11 nesir28   User is offline

  • D.I.C Head

Reputation: 4
  • View blog
  • Posts: 53
  • Joined: 11-August 17

Re: Using ht.access to block backlink checkers?

Posted 26 September 2017 - 06:50 AM

View Postno2pencil, on 26 September 2017 - 06:45 AM, said:

I would block bots & crawlers outside of the webservice. By the time you are blocking them with htaccess, your webserver has already eaten the bandwidth. Plus, if the bot tries to crawl 1000+ pages, your htaccess file eats 1000+ attempts to block them from content.

My suggestion is block them with a firewall & not allow the traffic to reach the web server process. If you can block it with a webserver that is prior to your OS, that's even better.

View Postnesir28, on 26 September 2017 - 06:55 AM, said:

The problem is simply identification througn ips names or whatever there are too many and this is where i think it gets risky adding massive amounts of ip ranges and strings to your files is sure to block something you dont want blocked.

Most bots, crawlers, & scrapers don't attempt to go rouge. I have a service that runs via crontab that twice daily grabs black list ips & imports into my firewall. Have been doing this for 10+ years, host a number of sites for companies, friends, & one government office, & have had ZERO requests to clear a blocked IP. Just to note, within these lists are entire range ips.


I assumed and read wrongfully that htacess blocks them at a server level denying the bandwidth, i have actually not read anything on firewalls , just php, ht acess , the WordPress plugins , redirects all the stuff i have mentioned , Do you have a link where i can read up more on this ( i will probably google it too ). Thanks guys this is proving to be helpful ..... BTW do you think your hosting provider could assist with an issue like this ?
Was This Post Helpful? 0
  • +
  • -

#12 no2pencil   User is offline

  • Professor Snuggly Pants
  • member icon

Reputation: 6727
  • View blog
  • Posts: 31,155
  • Joined: 10-May 07

Re: Using ht.access to block backlink checkers?

Posted 26 September 2017 - 06:55 AM

View Postnesir28, on 26 September 2017 - 09:50 AM, said:

BTW do you think your hosting provider could assist with an issue like this ?


I am my hosting provider.

The firewall is going to be dependent upon your OS & environment. For example, I have some IP chains blocked by my CDN, & others by the OS. I have both CentOS & FreeBSD hosting environments, & they are unique amongst themselves, as would be Ubuntu or Windows.

View Postno2pencil, on 26 September 2017 - 09:53 AM, said:

I assumed and read wrongfully that htacess blocks them at a server level denying the bandwidth,

As htaccess is a file, the request much go through dns, to your ip, to the OS, to the hosting process, which loads the htaccess to get the instruction to block, allow, or forward. I've never used htaccess for anything more than WordPress, or the occasional directory listing setup.

Lastly, as this isn't really web development, I've moved to Web Servers & Hosting.
Was This Post Helpful? 0
  • +
  • -

Page 1 of 1