Skip to main content

This site requires you to update your browser. Your browsing experience maybe affected by not having the most up to date version.

General Questions /

General questions about getting started with SilverStripe that don't fit in any of the categories above.

Moderators: martimiz, Sean, biapar, Willr, Ingo, swaiba, simon_w

Search inside PDF


Reply


6 Posts   2001 Views

Avatar
karlevid

Community Member, 10 Posts

15 January 2010 at 9:15pm

Edited: 16/01/2010 5:19pm

Is this possible with the SS Search?

Avatar
banal

Community Member, 901 Posts

16 January 2010 at 2:15am

Edited: 16/01/2010 5:19pm

I don't think it's possible.
You could build a SS wrapper around the Zend Lucene Engine (http://framework.zend.com/manual/en/zend.search.lucene.html), as described in this Blog-Post: http://www.kapustabrothers.com/2008/01/20/indexing-pdf-documents-with-zend_search_lucene/

This could get tricky though :)
I also think you're in the wrong forum with that question.

Avatar
MateuszU

Community Member, 89 Posts

18 January 2010 at 4:11pm

Also I think you can relatively easily extract the PDF contents onAfterWrite using `pdftotext` and chuck it into the database as Content so the search can pick it up. This works pretty well.

Avatar
Ingo

Forum Moderator, 801 Posts

25 January 2010 at 12:50pm

For a more robust and performant solution (rather than MySQL fulltext), have a look at the 'sphinx' module:
http://open.silverstripe.org/browser/modules/sphinx/trunk

Its a fairly new module, and requires you to use SilverStripe 2.4 alpha1, but the underlying technology (sphinx search) is quite stable.
This changeset (committed a couple of days ago) explains how to work with PDFs in sphinx:
http://open.silverstripe.org/changeset/97360/modules/sphinx/trunk

BTW, it uses pdftotext as well :)

Avatar
banal

Community Member, 901 Posts

25 January 2010 at 8:32pm

Hey Ingo that sounds really neat.
The sphinx binaries have to be installed on the server to use this, right?

Avatar
MateuszU

Community Member, 89 Posts

26 January 2010 at 9:28am

Yeah, you need to install that on the server and run it as a daemon. You probably won't be able to run it on shared hosting. But it's worth the effort, very quick :)