Jump to:

325 Posts in 200 Topics by 157 members

Migrating a Site to Silverstripe

SilverStripe Forums » Migrating a Site to Silverstripe » UTF-8 problem when migrating data

What you need to know when migrating your existing site to SilverStripe.

Moderators: martimiz, Sean, biapar, Willr, Ingo, swaiba, simon_w

Page: 1 2 3
Go to End
Author Topic: 10638 Views
  • Fanta
    Avatar
    Community Member
    7 Posts

    UTF-8 problem when migrating data Link to this post

    Hello,

    I just created my first silverstripe website with my own template - and I like silverstripe!
    Now I'm trying to migrate the content from my old joomla site. I exported the content into CSV-file and converted it to the structure of SiteTree-table. Then I imported the CSV into my silverstripe database. All fine ... but now I am in trouble with the charset.

    - CSV-file is UTF-8
    - mySQL database is UTF-8
    - _config.php includes: ContentNegotiator::set_encoding('utf-8');
    - my page.ss includes <meta http-equiv="Content-type" content="text/html; charset=utf-8" >
    - I even checked the database content by SELECT HEX(TITLE) FROM SiteTree

    But the german umlaut characters (i.e. ä = U+00E4 = UTF-8 c3 a4) is still not displayed, neither in CMS nor in in frontend.
    The Content (HTML-field) is ok, but Title and MenuTitle is wrong.

    Then I tried to enter an "ä" into CMS and I was very astonished that silverstripe coded it as hex C3 83 C2 A4 into database.
    I don't think that this is UTF-8 !

    Why doesn't Silverstripe use UTF-8 ? I'm very confused.

    Thanks in advance
    Fanta

  • Ingo
    Avatar
    Forum Moderator
    801 Posts

    Re: UTF-8 problem when migrating data Link to this post

    Hm, we don't usually have problems with this, SilverStripe handles UTF8 fine - whats the collation of your database fields?

  • Fanta
    Avatar
    Community Member
    7 Posts

    Re: UTF-8 problem when migrating data Link to this post

    The collation is "utf8_unicode_ci" and i also tried "utf8_general_ci".

  • Fanta
    Avatar
    Community Member
    7 Posts

    Re: UTF-8 problem when migrating data Link to this post

    I made some experiments on three installations:
    1. local wampp under Windows XP
    2. ubuntu on a virtual system
    3. debian server of my host-provider
    In every case I created a page in CMS with ä ö ü Ä Ö Ü characters in the title.
    But there is no UTF-8 in any database. The umlaut-characters are always coded in 4 bytes instead of 2 bytes. Is it UTF-32?
    I cannot believe that SilverStripe uses UTF-8 at all.

    It is no problem if you only create pages in CMS. But I would like to know what charset SilverSripe really uses. It would make migrations easier.

  • Ingo
    Avatar
    Forum Moderator
    801 Posts

    Re: UTF-8 problem when migrating data Link to this post

    Uhm, I really can't follow you. Just tried a run of the mill 2.3 installation with umlauts in Title, MenuTitle and Content (all entered through CMS), they all render fine in a standard blackcandy theme with content-type delivered as utf8.
    Try it yourself on demo.silverstripe.com - if you're quick (next 30 min?), you can see my umlaut test page on there: http://demo.silverstripe.com/umlaut-test/.

    This is how silverstripe sets the collation for SiteTree->Content: mediumtext character set utf8 collate utf8_general_ci

    Here's the utf-8 setting which every site gets through the bootstrapper - admittedly its harder *not* to use utf8 than it is to have it working out of the box ;) http://open.silverstripe.com/browser/modules/sapphire/branches/2.3/main.php#L61

  • Fanta
    Avatar
    Community Member
    7 Posts

    Re: UTF-8 problem when migrating data Link to this post

    Hello Ingo,

    thank you for your reply. I agree with you, that there is no problem with umlaut-characters, if you enter it in CMS. There are displayed both on CMS and frontend very well.

    If you have a look on your MySQL database you will see, that the characters are not stored as utf-8 in SiteTree-table.
    I.e. make SELECT HEX(TITLE) WHERE ID = ...
    At least on my three installations the umlauts where codes in 4 bytes i.e. ä = C383C2A4. It isn't utf-8!
    Could you please try a SQL-Query on your demo-database to check it?

    Best regards,
    Fanta

  • Ingo
    Avatar
    Forum Moderator
    801 Posts

    Re: UTF-8 problem when migrating data Link to this post

    Hey Fanta, I've roughly traced through the execution of saving a dataobject through to the database column, can't find anything suspicious that would deviate from our UTF-8 defaults. We need to get better about using multibyte safe PHP5 functionality though, I've opened a new ticket about this: http://open.silverstripe.com/ticket/3746 - not sure if its related to your problem.
    Any help in this area is appreciated (both in-depth testing and patches).

    I found PHPWACT quite helpful to figure out UTF-8 in PHP: http://www.phpwact.org/php/i18n/utf-8
    Unfortunately you can't expect PHP to do the "right thing" by default apparently... the list on this page is fairly frightening

    Not sure why and how you got UTF-16 or even 32 characters in your db - can you give more details on your server setup, system locale, php locale, php.ini and mysql conf?

  • Fanta
    Avatar
    Community Member
    7 Posts

    Re: UTF-8 problem when migrating data Link to this post

    Hello Ingo,

    now I understand how the characters are coded:
    i.e. ä-umlaut is ASCII hex E4 -> converted to UTF-8 hex C3 A4
    If you interpret the two bytes again as ASCII, you can convert them again:
    ASCII hex C3 (Ã) -> UTF-8 hex C3 A3
    ASCII hex A4 (¤) -> UTF-8 hex C2 A4

    These are the 4 bytes (C3 A3 C2 A4) I found in my database. Silverstripe makes the ASCII->UTF-8 conversion two times!

    Unfortunatly I'm not strong enough in PHP-programming, but maybe somebody can find the bug by this hint.

    The main problem is, that if you change the program, all existing SilverStripe-installations will get problem with their content

    Best regards,
    Fanta

    10638 Views
Page: 1 2 3
Go to Top

Want to know more about the company that brought you SilverStripe? Then check out SilverStripe.com

Comments on this website? Please give feedback.