I had a few issues doing a StaticImport (it's never going to be a simple process) and I thought I would note them down here for posterity in case anyone else has issues. If anyone else is playing with StaticImporter might be a good idea for you to do the same.
Reasons for problems:
- Command line sections of PHP file as configured cannot run if directory has spaces in it (e.g. the defaults for WebMatrix)
- On a Windows system, Calling 'echo' on tidy only returns first line back to $content
- Windows command line craps out on long input anyway (i.e. piping in all the HTML in my files)
- Windows directory seperators go the *other way*.
- Had to install tidy & wget and place them in the system 'PATH' environment variable in order to get these to run (fairly straightforward for an admin like me, but you've lost most of the MS audience right there!)
- Added quotation marks around all file path refs
- Saved scrubbed files back to original location. Made tidy grab original files, save changes back to those files, then pulled grabbed files back in as filtered content (v. hacky...)
- Added LegacyURL field to 'has_one' in Page. Requirement to do this should probably be documented somewhere?
- Reversed the direction of directory seperators in a few places.
Still in progress... My code modifications are extremely hacky so I'm not releasing them - on a tight deadline, just doing 1 site.
If I can get this working, it will be well worth it to import a fairly complex 500 page site, but I thought people should know that there are problems here... I was going to build this myself so I'm glad Ingo's done most of the hard work for me :)
I will edit this post to keep you updated.
All in all I'm very impressed with the Importer. Kind of which I could use CSS selectors (ala JQuery) rather than XPath selectors, but you can't have everything :)
Edit 1 - general comments: Having played with this importer my other comment would be that if you're migrating from CMS to CMS, try to do it on a database level. Someone experienced with databases should be able to help you here. It would also make logical sense to try the CMIS connector. I'm importing 500ish flat files, so this is my only option. I would say if you had under 200 files, you wouldn't see any major benefits in using this tool (unless you're trying to get some experience). If you have complex HTML, you will also not see any benefit from the tool - try running `tidy -asxhtml -nmq` on your site and see how it survives...
Edit 2 - Overall recommendation would be to avoid importing the content but to use the xpath selector to import all the other fields.