Skip to main content

This site requires you to update your browser. Your browsing experience maybe affected by not having the most up to date version.

General Questions /

General questions about getting started with SilverStripe that don't fit in any of the categories above.

Moderators: martimiz, Sean, biapar, Willr, Ingo, swaiba, simon_w

Setup issue? All CMS-suggested URLs replace spaces with '20'


Reply


3 Posts   306 Views

Avatar
Double-A-Ron

Community Member, 604 Posts

21 April 2014 at 7:11pm

We've just upgraded to SS 3.1.2 and seem to have a problem will all URLs that the CMS suggests based on a page's title (regardless of page type) is coming back with '20' everywhere a space is in the title. Sure, %20 is HTML encoded space, but I've not seen this issue on another 3.x site I run. So wondering if it's a setup issue. Problem is duplicated both on a WAMP server and a CENTOS Linux machine.

I've managed to trace where Silverstripe does this and the point of issue to /framwork/model/URLSegmentFilter.php. Method filter(). Line ~76.

$replacements = $this->getReplacements();

      // Unset automated removal of non-ASCII characters, and don't try to transliterate
      if($this->getAllowMultibyte() && isset($replacements['/[^A-Za-z0-9\-]+/u'])) {
         unset($replacements['/[^A-Za-z0-9\-]+/u']);
      }
      

Debug::dump($name);
      foreach($replacements as $regex => $replace) {
         $name = preg_replace($regex, $replace, $name);
      }
Debug::dump($name);

You can see my two debug dumps there. Which output the following respectively:

URLSegmentFilter.php:78 -
new%20blog%20entry%20-%20this%20is%20a

URLSegmentFilter.php:83 -
new20blog20entry20-20this20is20a

So either something is funky with the regex that is coming back in getReplacements(), or something else is going on. I'm inclined to be sus about allowMultibyte, but I am not sure what affect this will have on the rest of the site.

Any ideas what is going on here?

Cheers

Avatar
Double-A-Ron

Community Member, 604 Posts

21 April 2014 at 8:16pm

Found it in my .htaccess file. I had an old rule there from when the site was 2.4 to enforce all URLs to strip the trailing slash:

RewriteRule ^(.+)/$ /$1 [R=301,L]

Still looking into the impact of this however. A bit strange that that would alter what the URLSegmentFilter returns from what I can see.

Avatar
Double-A-Ron

Community Member, 604 Posts

22 April 2014 at 8:24pm

Edited: 22/04/2014 8:25pm

This is actually a little weird now that I see what the problem is.

Example title: "this is a test"

By default, when Silverstripe attempts to generate a URL from that, it calls this url:
/admin/pages/edit/EditForm/field/URLSegment/suggest/?value=this%20is%20a%20test

Which results in the URLSegment of "this-is-a-test"

But as soon as I introduce the .htaccess segment to strip trailing slashes, Silverstripe attempts to call this URL:
/admin/pages/edit/EditForm/field/URLSegment/suggest?value=this%2520is%2520a%2520test

Which results in the URLSegment of "this20is20a20test"

Easy to see that it is being urlencoded twice in the second URL. %20 is an encoded space. %25 is an encoded '%'. Hence a space is being encoded to %20 first, then the '%' is being encoded again to give me %2520.

The workaround is to add a condition to .htaccess where /admin is ignored from the rule. But I'm wondering if anyone can think of a logical reason for this?

The .htaccess block again:

RewriteRule ^(.+)/$ /$1 [R=301,L]