Skip to main content

This site requires you to update your browser. Your browsing experience maybe affected by not having the most up to date version.

General Questions

General questions about getting started with SilverStripe that don't fit in any of the categories above.

Moderators: martimiz, Sean, biapar, Willr, Ingo, swaiba, simon_w

Setup issue? All CMS-suggested URLs replace spaces with '20'


Reply

3 Posts   251 Views

Avatar
Double-A-Ron

21 April 2014 at 7:11pm Community Member, 604 Posts

We've just upgraded to SS 3.1.2 and seem to have a problem will all URLs that the CMS suggests based on a page's title (regardless of page type) is coming back with '20' everywhere a space is in the title. Sure, %20 is HTML encoded space, but I've not seen this issue on another 3.x site I run. So wondering if it's a setup issue. Problem is duplicated both on a WAMP server and a CENTOS Linux machine.

I've managed to trace where Silverstripe does this and the point of issue to /framwork/model/URLSegmentFilter.php. Method filter(). Line ~76.

$replacements = $this->getReplacements();

      // Unset automated removal of non-ASCII characters, and don't try to transliterate
      if($this->getAllowMultibyte() && isset($replacements['/[^A-Za-z0-9\-]+/u'])) {
         unset($replacements['/[^A-Za-z0-9\-]+/u']);
      }
      

Debug::dump($name);
      foreach($replacements as $regex => $replace) {
         $name = preg_replace($regex, $replace, $name);
      }
Debug::dump($name);

You can see my two debug dumps there. Which output the following respectively:

URLSegmentFilter.php:78 -
new%20blog%20entry%20-%20this%20is%20a

URLSegmentFilter.php:83 -
new20blog20entry20-20this20is20a

So either something is funky with the regex that is coming back in getReplacements(), or something else is going on. I'm inclined to be sus about allowMultibyte, but I am not sure what affect this will have on the rest of the site.

Any ideas what is going on here?

Cheers

Avatar
Double-A-Ron

21 April 2014 at 8:16pm Community Member, 604 Posts

Found it in my .htaccess file. I had an old rule there from when the site was 2.4 to enforce all URLs to strip the trailing slash:

RewriteRule ^(.+)/$ /$1 [R=301,L]

Still looking into the impact of this however. A bit strange that that would alter what the URLSegmentFilter returns from what I can see.

Avatar
Double-A-Ron

22 April 2014 at 8:24pm (Last edited: 22 April 2014 8:25pm), Community Member, 604 Posts

This is actually a little weird now that I see what the problem is.

Example title: "this is a test"

By default, when Silverstripe attempts to generate a URL from that, it calls this url:
/admin/pages/edit/EditForm/field/URLSegment/suggest/?value=this%20is%20a%20test

Which results in the URLSegment of "this-is-a-test"

But as soon as I introduce the .htaccess segment to strip trailing slashes, Silverstripe attempts to call this URL:
/admin/pages/edit/EditForm/field/URLSegment/suggest?value=this%2520is%2520a%2520test

Which results in the URLSegment of "this20is20a20test"

Easy to see that it is being urlencoded twice in the second URL. %20 is an encoded space. %25 is an encoded '%'. Hence a space is being encoded to %20 first, then the '%' is being encoded again to give me %2520.

The workaround is to add a condition to .htaccess where /admin is ignored from the rule. But I'm wondering if anyone can think of a logical reason for this?

The .htaccess block again:

RewriteRule ^(.+)/$ /$1 [R=301,L]