311 Posts in 83 Topics by 149 members
|
Page:
1
|
Go to End | |
| Author | Topic: | 1414 Views |
-
StaticSiteImporter previews all HTML code

30 July 2010 at 7:20pm
Hi there.
I want to turn a static website (about 150 pages) into a SS site. I installed SS to test the StaticSiteImporter (latest trunk) and it works - but I don't know if it is as it should be.When I check "Preview the content that will be extracted" I get many pages listed with the extracted content.
But the content looks like:<?xml version="1.0" encoding="utf-8"?>
<!-- ra -->
<!DOCTYPE html ....
all HTML code} catch(err) {}
//]]>
</script><!-- InstanceEnd -->
</body>
</html>As I understand this feature it should only show me the grabbed content, right?
That is what I defined in mysite/_config.php
StaticImporter::set_url("www.example.com/");
StaticImporter::set_allowed_extensions(array('php','html','jpg','pdf'));
StaticImporter::set_rules(
array(
// Default rules for all other URLs
'conditions' => array(),
'fields' => array(
'Title' => array(
'xpath' => array(
'//h1'
),
'exclusive' => 1
),
'Hierarchy' => array(
'xpath' => array(
'//h2[contains(@class, "location")]/a/@href',
),
'exclusive' => 1
),
'Content' => array(
'xpath' => '//div[contains(@id, "content")]',
'includeMatchedTag' => 0
)
),
'exclusive' => 1
)
);Targeted website: http://bit.ly/b5yrHV
Is the XPath wrong or can I import?
-
Re: StaticSiteImporter previews all HTML code

3 August 2010 at 7:59am
After consulting various forums the XPath (all 50 ;)) isn't the failure.
What could be the problem? Why do I get the whole DOM displayed in the very, very tiny textarea?
In which _config.php do I have to put in the StaticImporter::set_rules?Appreciate all help I can get to this!
-
Re: StaticSiteImporter previews all HTML code

12 August 2010 at 6:28am
I made some changes, rewinded, made others but nothing works.
So I clicked to insert all in the database but unfortunately that doesn't work either.
After one page I exits with an "website error" message. The sitetree has a new item ImportedFiles>Folder>Filename but its content is empty. The page name is almost correct. -
Re: StaticSiteImporter previews all HTML code

18 August 2010 at 3:14pm
Hmmmm I haven't used it before, read about it though. For 150 pages I'd sit down for a few hours and copy paste them all, only to guarantee formatting of pages.
I think you'll need to target the content div or container. From memory there was a method of doing this. Other wise it has no way of knowing what is content or what is a side bar or menu element.
| 1414 Views | ||
|
Page:
1
|
Go to Top |


