Skip to main content

This site requires you to update your browser. Your browsing experience maybe affected by not having the most up to date version.

General Questions /

General questions about getting started with SilverStripe that don't fit in any of the categories above.

Moderators: martimiz, Sean, Ed, biapar, Willr, Ingo, swaiba

Setup issue? All CMS-suggested URLs replace spaces with '20'


Go to End
Reply


4 Posts   930 Views

Avatar
Double-A-Ron

Community Member, 605 Posts

21 April 2014 at 7:11pm

We've just upgraded to SS 3.1.2 and seem to have a problem will all URLs that the CMS suggests based on a page's title (regardless of page type) is coming back with '20' everywhere a space is in the title. Sure, %20 is HTML encoded space, but I've not seen this issue on another 3.x site I run. So wondering if it's a setup issue. Problem is duplicated both on a WAMP server and a CENTOS Linux machine.

I've managed to trace where Silverstripe does this and the point of issue to /framwork/model/URLSegmentFilter.php. Method filter(). Line ~76.

$replacements = $this->getReplacements();
        
		// Unset automated removal of non-ASCII characters, and don't try to transliterate
		if($this->getAllowMultibyte() && isset($replacements['/[^A-Za-z0-9\-]+/u'])) {
			unset($replacements['/[^A-Za-z0-9\-]+/u']);
		}
		
        
        Debug::dump($name);
		foreach($replacements as $regex => $replace) {
			$name = preg_replace($regex, $replace, $name);
		}
        Debug::dump($name);

You can see my two debug dumps there. Which output the following respectively:

URLSegmentFilter.php:78 -
new%20blog%20entry%20-%20this%20is%20a

URLSegmentFilter.php:83 -
new20blog20entry20-20this20is20a

So either something is funky with the regex that is coming back in getReplacements(), or something else is going on. I'm inclined to be sus about allowMultibyte, but I am not sure what affect this will have on the rest of the site.

Any ideas what is going on here?

Cheers

Avatar
Double-A-Ron

Community Member, 605 Posts

21 April 2014 at 8:16pm

Found it in my .htaccess file. I had an old rule there from when the site was 2.4 to enforce all URLs to strip the trailing slash:

RewriteRule ^(.+)/$ /$1 [R=301,L]

Still looking into the impact of this however. A bit strange that that would alter what the URLSegmentFilter returns from what I can see.

Avatar
Double-A-Ron

Community Member, 605 Posts

22 April 2014 at 8:24pm

Edited: 22/04/2014 8:25pm

This is actually a little weird now that I see what the problem is.

Example title: "this is a test"

By default, when Silverstripe attempts to generate a URL from that, it calls this url:
/admin/pages/edit/EditForm/field/URLSegment/suggest/?value=this%20is%20a%20test

Which results in the URLSegment of "this-is-a-test"

But as soon as I introduce the .htaccess segment to strip trailing slashes, Silverstripe attempts to call this URL:
/admin/pages/edit/EditForm/field/URLSegment/suggest?value=this%2520is%2520a%2520test

Which results in the URLSegment of "this20is20a20test"

Easy to see that it is being urlencoded twice in the second URL. %20 is an encoded space. %25 is an encoded '%'. Hence a space is being encoded to %20 first, then the '%' is being encoded again to give me %2520.

The workaround is to add a condition to .htaccess where /admin is ignored from the rule. But I'm wondering if anyone can think of a logical reason for this?

The .htaccess block again:

RewriteRule ^(.+)/$ /$1 [R=301,L]

Avatar
HARVS1789UK

Community Member, 31 Posts

5 February 2016 at 1:00am

@Double-A-Ron firstly, thanks very much for your post. I also started seeing this issue on a number of sites recently. I had assumed it would be to do with spaces being URL encoded to %20 and then the preg_replace failing/only stripping the % symbol so I went off to investigate and found myself looking at the same file(s) as you.

Before wasting too much time error logging things out in core code (as I though I shouldn't be changing this even if I find the issue) I Googled the issue and found your post. It saved me a lot of time as I realised that the URL rewrite line you found to be the culprit is something I have recently added to a number of sites (low and behold the ones which started exhibiting this issue).

As you have correctly highlighted, because the original request is re-directed to the non-trailing slash version the %20 in the URL encoded query string is 'double encoded' to %2520 and hence the preg_replace in URLSegmentFilter fails to replace it with a hyphen (not too sure why or how it is able to remove the %25 part though, havnt bothered to investigate that).

Your suggested fix to implement a RewriteCond to exclude any URL's which start /admin/ would indeed work however I would like to propose another solution (for anyone else who comes across this issue) and that is to use the RewriteRule's NE (noescape) flag which prevents this 'double encoding' from taking place and leave your query string as is.

Apache Docs - RewriteRule NE Flag

By default, special characters, such as & and ?, for example, will be converted to their hexcode equivalent. Using the [NE] flag prevents that from happening.

This solved the issue for me and continued to let me re-direct all other URL's under /admin/ to exclude the trailing slash.

On a side note, the very existence of trailing slashes on URLs within SS is something that needs addressing and standardising (for the benefit of analytics). There was some initial discussion about this at the 2015 Silverstripe Europe conference so I know the SS team have it on their radar, hopefully things will be more standardised and uniform in SS 4.0!