Skip to main content

This site requires you to update your browser. Your browsing experience maybe affected by not having the most up to date version.

General Questions /

General questions about getting started with SilverStripe that don't fit in any of the categories above.

Moderators: martimiz, Sean, biapar, Willr, Ingo, swaiba, simon_w

parse_url error on Director.php - SS 2.4.5


Reply


3 Posts   1372 Views

Avatar
Chris Hope

Community Member, 18 Posts

22 April 2011 at 7:46am

I have a bad bot spidering one of my sites at the moment that can't construct URLs from the HTML source correctly and so I'm seeing stuff like this in the Apache logs:

"GET /blogs/%22http://upingtonairportcarrental.com/%22%3Ehere%3C/a%3E.%3Cdiv HTTP/1.1" 404 3291 "-" "Mozilla/5.0 (compatible; Purebot/1.1; +http://www.puritysearch.net/)"

and getting warnings like this logged and emailed to me:

parse_url(/blogs/"http://upingtonairportcarrental.com/">here</a>.<div) Line 541 of Director.php

I have everything from warning and up emailed to me (I prefer it that way) but would prefer not to be emailed this particular message. I've also seen it from another bot and there's also a piece of code out there in the wild that checks SSL servers but prefixes / before the domain name which also results in an error message like this one:

parse_url(/https://www.example.com/) Line 541 of Director.php

As I mentioned, I would prefer to keep the reporting level at warnings, but is there also a way I can suppress this particular error relating to parsing urls on Line 541 of Director.php from logging errors? Or perhaps that it can be rewritten slightly to prevent this cropping up?

Version used, SS 2.4.5

Avatar
Chris Hope

Community Member, 18 Posts

7 May 2011 at 10:18pm

Just as a follow up, this only affects PHP < 5.3.3. I hacked the Director.php file and added a @ to the parse_url call i.e. @parse_url(...)

Avatar
Chris_Bryer

Community Member, 35 Posts

12 May 2011 at 2:43pm

Hey Chris,
one other thing you may want to consider is to block the bot / ip address in the .htaccess.. otherwise the bot is still hitting your website and using up memory. here's a blog article to show you how to block either option:

http://blamcast.net/articles/block-bots-hotlinking-ban-ip-htaccess

-Chris