High apache2 server load because of "semrush" bot

Linux howto's, compile information, information on whatever we learned on working with linux, MACOs and - of course - Products of the big evil....
Post Reply
gilthanaz
Site Admin
Posts: 444
Joined: Fri Aug 29, 2003 9:29 pm
Contact:

High apache2 server load because of "semrush" bot

Post by gilthanaz »

[Problem]
We've had an Apache2 server behave a bit weird lately, e.g. long loading times or capacity warnings. Checking the logs showed that a "semrush bot" was scanning all sites without any limits at maximum bandwidth. It appears to be some kind of marketing bot, and trash like that has to go. The bot also appears to often not respect the robots.txt file, thus we need a different method.

[Solution]
There are several approaches. Filtering by IP is not very useful as the bot may connect from any IP block that we're not aware of. Currently we're blocking it using .htaccess as can be found here: http://badbots.vps.tips/info/semrushbot

Code: Select all

# Bad bots filter code
# provided by http://badbots.vps.tips
SetEnvIfNoCase User-Agent "SemrushBot" bad_bots
<Limit GET POST HEAD>
	Order Allow,Deny
	Allow from all
	Deny from env=bad_bots
</Limit>
You'll still get a lot of log entries about the bot being blocked, but the capacity issues should be gone.
Post Reply