Skip to main content

This site requires you to update your browser. Your browsing experience maybe affected by not having the most up to date version.

Migrating a Site to Silverstripe /

What you need to know when migrating your existing site to SilverStripe.

Moderators: martimiz, Sean, biapar, Willr, Ingo, swaiba, simon_w

StaticSiteImporter previews all HTML code


Reply


4 Posts   1673 Views

Avatar
suntrop

Community Member, 131 Posts

30 July 2010 at 7:20pm

Hi there.
I want to turn a static website (about 150 pages) into a SS site. I installed SS to test the StaticSiteImporter (latest trunk) and it works - but I don't know if it is as it should be.

When I check "Preview the content that will be extracted" I get many pages listed with the extracted content.
But the content looks like:

<?xml version="1.0" encoding="utf-8"?>
<!-- ra -->
<!DOCTYPE html ....
all HTML code

} catch(err) {}
//]]>
</script><!-- InstanceEnd -->
</body>
</html>

As I understand this feature it should only show me the grabbed content, right?

That is what I defined in mysite/_config.php

StaticImporter::set_url("www.example.com/");
StaticImporter::set_allowed_extensions(array('php','html','jpg','pdf'));
StaticImporter::set_rules(
      array(
         // Default rules for all other URLs
         'conditions' => array(),
         'fields' => array(
            'Title' => array(
               'xpath' => array(
                  '//h1'
               ),
               'exclusive' => 1
            ),
            'Hierarchy' => array(
               'xpath' => array(
                  '//h2[contains(@class, "location")]/a/@href',
               ),
               'exclusive' => 1
            ),
            'Content' => array(
               'xpath' => '//div[contains(@id, "content")]',
               'includeMatchedTag' => 0
            )
         ),
         'exclusive' => 1
      )
   );

Targeted website: http://bit.ly/b5yrHV

Is the XPath wrong or can I import?

Avatar
suntrop

Community Member, 131 Posts

3 August 2010 at 7:59am

After consulting various forums the XPath (all 50 ;)) isn't the failure.

What could be the problem? Why do I get the whole DOM displayed in the very, very tiny textarea?
In which _config.php do I have to put in the StaticImporter::set_rules?

Appreciate all help I can get to this!

Avatar
suntrop

Community Member, 131 Posts

12 August 2010 at 6:28am

I made some changes, rewinded, made others but nothing works.
So I clicked to insert all in the database but unfortunately that doesn't work either.
After one page I exits with an "website error" message. The sitetree has a new item ImportedFiles>Folder>Filename but its content is empty. The page name is almost correct.

Avatar
Bambii7

Community Member, 254 Posts

18 August 2010 at 3:14pm

Hmmmm I haven't used it before, read about it though. For 150 pages I'd sit down for a few hours and copy paste them all, only to guarantee formatting of pages.
I think you'll need to target the content div or container. From memory there was a method of doing this. Other wise it has no way of knowing what is content or what is a side bar or menu element.