I recently upgraded a site from 2.3.x to 2.4.3. As part of the process I made a copy of the site including the mysql database and then performed the upgrade, additional mods my client required and some fixes to make my custom modules work correctly with 2.4. I then copied the database again (via a MySQL dump) to ensure I had the latest version of the content (as my client needed to be updating the site daily while I worked on the new version).
28 May 2011 at 10:08pm
(Last edited: 28 May 2011 10:09pm),
I've had that happen just recently, after an upgrate 2.3 -> 2.4. In mysite/_config.php there's a line that sets the characterset for the db connection to utf8. After I replaced it with latin1, all was well again.
Apparently utf-8 processing in 2.3 wasn't quite consistent:
read this post: http://silverstripe.org/migrating-a-site-to-silverstripe/show/5849
see this ticket: http://open.silverstripe.org/ticket/3746),
So the encoding of some characters in a 2.3 database might very well be multibyte but not quite utf8? Some of these multibyte characters might be recognized by 2.4 as being two separate characters, that are then both converted to utf8 when you (re)publish a page? This could result in a nonbreakable space being converted into a Â followed by a nonbreakable space. On the other hand some characters are suddenly interpreted correctly if you set the charset to latin. Things like this confuse me a lot...
That certainly seems consistent with what I have experienced. I now have over 500 pages some of which have errors and some that don't. Although it appears that no more of these characters are now getting introduced, there are some that have got saved there in the intervening period. I know I could try and remove them using an SQL query directly on the database, but I am very loathed to do that lest I make more problems for myself!
I've tried the MySQLDatabase::set_connection_charset('latin1'); fix, which at first appeared to solve the problem but have found it throws up a new error if there is a in the content, which gets inserted by TinyMCE in several cases mainly being inserted into empty table cells. I've included a screen shot of the error.
Any ideas of a way around with without hacking the core to change to  ?