22977 Posts in 11742 Topics by 2826 members
|Go to End|
22 April 2011 at 7:46am
I have a bad bot spidering one of my sites at the moment that can't construct URLs from the HTML source correctly and so I'm seeing stuff like this in the Apache logs:
"GET /blogs/%22http://upingtonairportcarrental.com/%22%3Ehere%3C/a%3E.%3Cdiv HTTP/1.1" 404 3291 "-" "Mozilla/5.0 (compatible; Purebot/1.1; +http://www.puritysearch.net/)"
and getting warnings like this logged and emailed to me:
parse_url(/blogs/"http://upingtonairportcarrental.com/">here</a>.<div) Line 541 of Director.php
I have everything from warning and up emailed to me (I prefer it that way) but would prefer not to be emailed this particular message. I've also seen it from another bot and there's also a piece of code out there in the wild that checks SSL servers but prefixes / before the domain name which also results in an error message like this one:
parse_url(/https://www.example.com/) Line 541 of Director.php
As I mentioned, I would prefer to keep the reporting level at warnings, but is there also a way I can suppress this particular error relating to parsing urls on Line 541 of Director.php from logging errors? Or perhaps that it can be rewritten slightly to prevent this cropping up?
Version used, SS 2.4.5
7 May 2011 at 10:18pm
Just as a follow up, this only affects PHP < 5.3.3. I hacked the Director.php file and added a @ to the parse_url call i.e. @parse_url(...)
12 May 2011 at 2:43pm
one other thing you may want to consider is to block the bot / ip address in the .htaccess.. otherwise the bot is still hitting your website and using up memory. here's a blog article to show you how to block either option:
|Go to Top|