Skip to main content

This site requires you to update your browser. Your browsing experience maybe affected by not having the most up to date version.

We've moved the forum!

Please use forum.silverstripe.org for any new questions (announcement).
The forum archive will stick around, but will be read only.

You can also use our Slack channel or StackOverflow to ask for help.
Check out our community overview for more options to contribute.

Migrating a Site to Silverstripe /

What you need to know when migrating your existing site to SilverStripe.

Moderators: martimiz, Sean, Ed, biapar, Willr, Ingo, swaiba

Desperate, migrating from "WordPress as lightweight CMS" to SilverStripe


Go to End


12 Posts   11464 Views

Avatar
vinha

Community Member, 1 Post

9 June 2009 at 12:20am

Edited: 09/06/2009 12:26am

Hi everyone and nice to meet you,
I'm new to SilverStripe and interested to migrate quite large WordPress-site to it.

Here is the current situation:
I work in travel agency. Our subdomain is running WordPress-installation, and we've build hundreds of pages catalog based on WordPress pages (not posts as in blogs usually). By using plugins, we managed to build
- nice hierarchy system (URL:s are formatted like: http://site/europe/france/destination-page)
- tagged all destination pages like spa, golf etc so that it is possible to search interest groups, not only by hierarchy
- easily editable special offerings boxes for each continent (Africa-related offers shown when browsing any Africa page)
- individual themes for different types of locations (can be selected manually when creating a page)
- multilingual content
- map-based navigation
- lots of other functions

However we're facing lots of problems with WordPress not scaling for our use and the plugins not working together (for example the qTranslate-multilingual plugin cannot be used together with SuperCache-plugin, which is essental for popular sites). Other massive problem is the WP's behavior handling rewrite rules, when combined the category/page -URLs and lots of images on pages (documented here http://core.trac.wordpress.org/ticket/8958). When working on our site, we have to manually clean the MySQL-tables all the time.

After all this, we're eager to migrate to more professional CMS. I have narrowed the most interesting ones to be SilverStripe and Concrete5, both being appealing and also commercially used. None of the CMSs I've read about had advanced WordPress import module available with following features:
- proper import of page categories, which is btw. available in WordPress only through plugin
- import of images (we have 3000 of them, I wouldn't be too glad to import them manually ;)

I would be delighted to hear you comments about the situation (:)) and any wise words about what should be do. Both SilverStripe and Concrete5 seem quite good. If there was an advanced import module, that would solve many problems :)

I would like to note also that even thought I'm quite good with html and css, I'm not "real" coder, meaning php, sql, javascript etc. So I cannot create my own migration script.

Avatar
bummzack

Community Member, 904 Posts

9 June 2009 at 1:16am

Hi and welcome to SilverStripe

I think you'll have a hard time finding a migration tool that will do all the work for you. Even more so because you're using special WordPress plugins which would have to be considered in the migration process as well. After all this doesn't sound like an easy task, since one would have to rebuild all the functionality provided by the WordPress Plugins in SilverStripe (or other CMS of choice) too.

If I were you, I'd hire somebody experienced to do the migration from your current system to the new CMS of choice.

Avatar
Ingo

Forum Moderator, 801 Posts

14 June 2009 at 10:19pm

Depending on how much different content types (and markup) you have on your 3000 pages, it might make sense to use our static importer: http://open.silverstripe.com/browser/modules/staticimporter/trunk. Its still fairly early-stage code, and needs an experienced hand to wield it, but it made our life a lot simpler on migrating big projects :)

Avatar
robinp

Community Member, 33 Posts

10 August 2009 at 4:55pm

Hi,

Just trying to get my head around the Static Importer.

If you have div that something simple like

<div id="content">
the page content
</div>

and you want the go into that div to go into the page content what does rule need to be ?

Thank you.

Robin

Avatar
Yakiv

Community Member, 59 Posts

28 August 2009 at 2:14am

Edited: 28/08/2009 2:15am

@robinp - glad to see another person interested in the StaticImporter! I keep downloading and testing the newest SVN release of this module, as it comes out. The most recent was August 12 (2009).

I keep seeing good improvements and I am very impressed with the individuals who are developing this module. I wish I could communicate with them or someone who knows the module very well. No one is adding any information on the docs wiki and no one seems to reply here on the forums, yet I see continued upgrades at least every week or two.

I don't know who to contact to try to get someone to pay attention to the forum posts here, about the StaticImporter. ...I am in love with SilverStripe. I think it is the best CMS I have ever seen and I really want it to continue to be enhanced. This StaticImporter would really help me transition sites I have to SilverStripe, so I wish someone would help! :-)

Please read the other thread, where I have posted lots of comments:

http://silverstripe.org/migrating-a-site-to-silverstripe/show/260266

Let's keep the SilverStripe StaticImporter interest going!!! :-)

Avatar
JeremyW

Core Development Team, 4 Posts

28 August 2009 at 10:29am

Hi folks,

Here's the deal with Static Importer. It is a bit of a developer's tool, which takes away the heavy lifting involved with importing a big site. We haven't yet documented it, which is a bit unfortunate given that you really do need a bit of documentation to use it! I'll try to remedy that, but for the time being let's see what we see here.

First, download the latest SVN version and extract it to a folder called staticimporter in your site root (next to mysite, sapphire, etc). Then do a /dev/build on your site. Then you can start using the controller by browsing to /StaticImporter (case sensitive). This works because when you type in a URL, one of the things Sapphire tries to do to resolve it is to look for a controller with that name.

So then you should get a page with a three step process.

Step 1) Download a mirror of the site as it appears to a browsing user, using wget.
-- You have to tell StaticImporter where your site lives. This is done by putting the following lines in /mysite/_config.php:

StaticImporter::set_url("www.mysite.com/"); 
StaticImporter::set_allowed_extensions(array('php','html','jpg','pdf'));

-- Depending on your version of wget, this step might not work as expected in the browser. Give it a shot, and if it doesn't happen, check out the documentation for wget. Sometimes it can be as simple as running

wget -r http://www.websitename.com
.

Anyway, that will run for a long long time, and download every single file on your site (filtered by your allowed_extensions -- this can save you some time and bandwidth if you've already got all the images and pdfs and such on a local copy. You can also do this if you're using wget on the command line -- read the documentation and have a play.

Step 2) Import the pages you've downloaded into a SilverStripe sitetree.
-- This is the big one. In order to accomplish this, you need to know a bit of XPath (check out http://www.w3schools.com/XPath/default.asp) and a bit of regular expressions (do yourself a favour and spend an hour on these if you don't already know them. They're horribly off-putting the first time you start using them, but like any language, the more you do it, the easier it will get, and it WILL save you hours and hours in the long run and make you feel like a genius once you get them under your belt).

-- You have to create a pattern which tells step 2 what data exists on your current site and what you want it to do. Here's an example:


StaticImporter::set_rules(
	array(
        array(
            'conditions' => array(
    			"/(\/news)/"
            ),
            'fields' => array(
                    'Title' => '//td[@class="caption"]/b',
            		'Created' => '//span[contains(@class, "NewsDate")]',
            		'Hierarchy' => array(
            			'xpath' => array(
            				'//table[contains(@class, "breadcrumbTable")]/tr/td/a/@href',
            				'//table[contains(@class, "breadcrumbTable")]/tr/td/span',
            			),
            			'exclusive' => 1
            		),                		
                    'Content' => array(
                    	'xpath' => '//td[contains(@class, "bodytext")]',
            			'includeMatchedTag' => 0
           			)
            )
        ),
        array(
        	'conditions' => array(
        		"/(events|archives|modules)/",
             ),
             'fields' => array(
                   'Title' => array(
            			'xpath' => array(
            				'//td[contains(@class, "bodytext")]/h1[1]',
            				'//td[contains(@class, "bodytext")]/span[@class="breadcrumbCurrentPage"]',
            				'//td[contains(@class, "bodytext")]/p[position()=1]',
            			),
            			'exclusive' => 1
            		),
            		'Hierarchy' => array(
            			'xpath' => array(
            				'//table[contains(@class, "breadcrumbTable")]/tr/td/a/@href',
            				'//table[contains(@class, "breadcrumbTable")]/tr/td/span',
            			),
            			'exclusive' => 1
            		),
                    'Content' => array(
                    	'xpath' => '//td[contains(@class, "bodytext")]',
            			'includeMatchedTag' => 0
          			)
             )
        ),
        array(
            'conditions' => array(
            ),
            'fields' => array(
                    'Title' => array(
            			'xpath' => array(
            				'//span[@class="breadcrumbCurrentPage"]',
                     	'//td[contains(@class, "bodytext")]/h1[1]/strong/strong/node()',
            				'//td[contains(@class, "bodytext")]/h1[1]/span/node()',
            				'//td[contains(@class, "bodytext")]/h1[1]/span/span/node()',
            				'//td[contains(@class, "bodytext")]//h1[1]/node()',
            				'//td[contains(@class, "bodytext")]//h2[1]//span/node()',
             			'//td[contains(@class, "bodytext")]/font/span/node()',
            				'//td[contains(@class, "bodytext")]/p[position()=1]//*'
            			),
            			'exclusive' => 1
            		),
            		'Hierarchy' => array(
            			'xpath' => array(
            				'//table[contains(@class, "breadcrumbTable")]/tr/td/a/@href',
            				'//table[contains(@class, "breadcrumbTable")]/tr/td/span',
            			),
            			'exclusive' => 1
            		),
                    'Content' => array(
                    	'xpath' => '//td[contains(@class, "bodytext")]',
            			'includeMatchedTag' => 0
            			)
            ),
            'exclusive' => 1
        ),    
	)
);

-- IMPORTANT: The above is NOT something you can just expect to work on your site. You need to customise the patterns so that they point to the parts of your site that you want to save.

Let's go through quickly and see what each part is doing.

We have a set of arrays, each with a "conditions" part and a "fields" part.

The conditions are a series of regular expressions which are applied to the URL of each page. So, the first condition checks to see if the URL contains "/news", the second one checks to see if the URL contains "events", "archives", or "modules", and the third one -- being blank -- is the default. Static Importer goes through each of the conditions till it finds one that works.

The fields is where it starts getting a bit tricky. Each field you define corresponds to a piece of content that you want to target in the main site.

So, for example, say you want to get the title out of each page. If you're lucky, your site has something like

<h1 class='Title'>My page</h1>
, and it's always the same on every page. So, you write a rule that says "Title = Any h1 tag with class 'Title'".

What about if your site's a bit bigger and a bit older, though? You might find that you've got most page titles inside an h1 tag, but some other pages have more than one h1 tag, some pages have the title in an h3, and some pages have the title in a

<span class='title'><strong>Title</strong></span>
! So, you need to look carefully at your pages, and figure out a rule or set of rules which works for everything (or near enough).

A couple of tips: If you define an XPath that matches more than one node, StaticImporter will match all of those nodes, and concatenate all their content together. So maybe you have a table with a bunch of content, one paragraph per TD element. So you could write an xpath like this: //table/tr[@class = "content"]/td, and end up with all your content in one field. This content then gets dropped into your Content field on the newly created SiteTree elements.

StaticImporter also attempts to create the correct hierarchy based on your site's folders. But for Wordpress, you don't alway see content in folders. In these situations, you can look for some type of breadcrumbs, which a lot of sites incorporate, and drop these items into a special field called "Hierarchy" -- this then gets processed and a folder structure is created.

Fields to process

Once you've defined your fields, they'll appear in tickboxes in step 2. You can choose to process just one at a time, to concentrate on the rules you've created. If you don't tick them, they'll be skipped.

Limit Pages

Next you've got a couple of boxes in a section called Limit Pages. This is to target particular slices of the import, because running staticimporter on 1000 pages takes a while, and often you want to just tweak a rule and run it again and again. So, if you type 1 into the Start field and 5 into the Count field, you'll only have to wait for pages 1-5 to be processed. Same goes for if you know there's something funny about the pages around number 350, and you want to see what effect tweaking the rules for those pages has, without waiting for pages 1 to 349, which you know are fine. So you type 350 into the Start field, and maybe you want to see 20 pages in that area, so you type 20 into count.

Actions to perform

-- Show a list of files which will be processed: this is a good way of getting your numbers for the Limit Pages section. It doesn't do any processing so it pops up nice and quickly.
-- Preview the content that will be extracted: rather than creating SiteTree objects, maybe you just want to see if your rules are getting any results at all, or make sure they're getting all the content you think they're getting. This option prints out what your rules picked up out of the HTML being processed.
-- Create pages: once you're all set with your rules, this will actually run through and create pages in the database.

Step 3 is a bit more self-explanatory, and will attempt to rewrite the locations of your files and links. Well, a little bit more self-explanatory anyway. It will do a good job of your imported images, though -- in the majority of cases, you can just leave these in the same place and the relative links will still work. Let me know how step 2 goes and we'll come to step 3 in the fullness of time. I'll continue to re-work and add to this post, and eventually create documentation for this whole module.

One thing which can be a bit tricky is getting all the pages into the right place in the hierarchy; the good news is there's a separate tool, SiteReorganiser, which basically takes a list of Legacy URLs (the URL of the original page on your site, which is saved with each newly created page so you can see where everything came from), and maps them to SilverStripe URL Segments to use as parents.

To wrap this one up, though: content importing is a difficult and lengthy procedure, but the good news is that you can get most of your existing content, and finish it up by hand if necessary.

One thing the original poster asked about was the matter of tags on pages. I see two possible solutions -- either you could get some custom SQL written which would spit out a list of permalinks mapped to tags, which you could then import in SilverStripe, since your permalinks map to SilverStripe LegacyURLs; otherwise, you could modify your WordPress template to include all the tags in a nice easy-to-target hidden div, and then import them to a text field on your new SiteTree objects and subclass StaticImporter to take these and store them in SilverStripe data objects. There's a handy function you can overload which post-processes all your pages once StaticImporter has done its job.

So unfortunately for the original poster, there is a bit of tweaking involved to do something like this. But maybe you can find someone to help you out with some of the PHP stuff, or even jump in and have a go yourself.

Best of luck!

Jeremy

Avatar
robinp

Community Member, 33 Posts

30 August 2009 at 1:51pm

Hi Jeremy,

Thank you for that great detailed post that was exactly the sort of thing I was after :-) I was having trouble figuring out from the code how how set_rules array's worked.

Also for anyone else thinking about working with Static Importer I found I had more success on my local mac ox machine than on a shared server. I've actually used some of code wget code from the static importer in some code to build offline versions of sites.

Thanks you again.

Cheers

Robin

Avatar
Radon

Community Member, 11 Posts

3 September 2009 at 3:06am

Edited: 03/09/2009 3:09am

It might be worth mentioning that when you need to make reorganizations of the web site content and need more control over the html, you should preferrably subclass the StaticImporter and implement the customisePages method, e.g remove the first paragraph in an article div (contains a publish date in my special case which I prefer to store as its own field in the db):

class MyStaticImporter extends StaticImporter {
	function customisePages(&$page) {
		$dom = new DOMDocument();
		$dom->loadXML($page->Content);
		$path = new DOMXPath($dom);
		$list = $path->query('//div[@id="article"]/p[position()=1]');
                if($list->length > 0) {
  		    $dom->documentElement->removeChild( $list->item(0) );
		    $page->Content = $dom->saveXML($dom->documentElement);
                }
	}
}

Go to Top