Skip to main content

This site requires you to update your browser. Your browsing experience maybe affected by not having the most up to date version.

General Questions

General questions about getting started with SilverStripe that don't fit in any of the categories above.

Moderators: martimiz, Sean, biapar, Willr, Ingo, swaiba, simon_w

Search inside PDF


Reply

6 Posts   1954 Views

Avatar
karlevid

15 January 2010 at 9:15pm (Last edited: 16 January 2010 5:19pm), Community Member, 10 Posts

Is this possible with the SS Search?

Avatar
banal

16 January 2010 at 2:15am (Last edited: 16 January 2010 5:19pm), Community Member, 901 Posts

I don't think it's possible.
You could build a SS wrapper around the Zend Lucene Engine (http://framework.zend.com/manual/en/zend.search.lucene.html), as described in this Blog-Post: http://www.kapustabrothers.com/2008/01/20/indexing-pdf-documents-with-zend_search_lucene/

This could get tricky though :)
I also think you're in the wrong forum with that question.

Avatar
MateuszU

18 January 2010 at 4:11pm Community Member, 89 Posts

Also I think you can relatively easily extract the PDF contents onAfterWrite using `pdftotext` and chuck it into the database as Content so the search can pick it up. This works pretty well.

Avatar
Ingo

25 January 2010 at 12:50pm Forum Moderator, 801 Posts

For a more robust and performant solution (rather than MySQL fulltext), have a look at the 'sphinx' module:
http://open.silverstripe.org/browser/modules/sphinx/trunk

Its a fairly new module, and requires you to use SilverStripe 2.4 alpha1, but the underlying technology (sphinx search) is quite stable.
This changeset (committed a couple of days ago) explains how to work with PDFs in sphinx:
http://open.silverstripe.org/changeset/97360/modules/sphinx/trunk

BTW, it uses pdftotext as well :)

Avatar
banal

25 January 2010 at 8:32pm Community Member, 901 Posts

Hey Ingo that sounds really neat.
The sphinx binaries have to be installed on the server to use this, right?

Avatar
MateuszU

26 January 2010 at 9:28am Community Member, 89 Posts

Yeah, you need to install that on the server and run it as a daemon. You probably won't be able to run it on shared hosting. But it's worth the effort, very quick :)