High apache2 server load because of "semrush" bot

Linux howto's, compile information, information on whatever we learned on working with linux, MACOs and - of course - Products of the big evil....

High apache2 server load because of "semrush" bot

Postby gilthanaz » Mon Dec 19, 2016 3:35 pm

[Problem]
We've had an Apache2 server behave a bit weird lately, e.g. long loading times or capacity warnings. Checking the logs showed that a "semrush bot" was scanning all sites without any limits at maximum bandwidth. It appears to be some kind of marketing bot, and trash like that has to go. The bot also appears to often not respect the robots.txt file, thus we need a different method.

[Solution]
There are several approaches. Filtering by IP is not very useful as the bot may connect from any IP block that we're not aware of. Currently we're blocking it using .htaccess as can be found here: http://badbots.vps.tips/info/semrushbot

Code: Select all
# Bad bots filter code
# provided by http://badbots.vps.tips
SetEnvIfNoCase User-Agent "SemrushBot" bad_bots
<Limit GET POST HEAD>
   Order Allow,Deny
   Allow from all
   Deny from env=bad_bots
</Limit>


You'll still get a lot of log entries about the bot being blocked, but the capacity issues should be gone.
  • 0

User avatar
gilthanaz
Site Admin
 
Posts: 443
Joined: Fri Aug 29, 2003 9:29 pm
Reputation: 0

Return to Knowledge Base

Who is online

Users browsing this forum: No registered users and 3 guests

Who is online over last 24 hours

Users browsed this forum in the last 24 hours: No registered users and 13 guests

cron
Reputation System ©'