10394 Posts in 2203 Topics by 1714 members
| Go to End | Next > | |
| Author | Topic: | 3573 Views |
-
Lucene module - search including PDF, Word and Excel

10 November 2010 at 10:46am
Hi all,
I've implemented a new Search module for SilverStripe, hope it helps someone out!
It uses Zend_Search_Lucene, which gives more relevant results than the standard MySQL stuff, and supports indexing Microsoft Word and Excel documents.I've used the StandardAnalyzer component by Kenny Katzgrau to tokenise/stem English words.
It can also index PDFs, using either the pdftotext commandline utility, or falls back to the PDF2Text class by Joeri Stegeman if pdftotext is not installed.
http://code.google.com/p/lucene-silverstripe-plugin/
Please leave your comments for improvement.
Cheers!
-
Re: Lucene module - search including PDF, Word and Excel

10 November 2010 at 11:42am
Wow this looks impressive
I look forward to trying it out in my next project. -
Re: Lucene module - search including PDF, Word and Excel

10 November 2010 at 4:16pm
Make sure you have submitted it to the modules section here on silverstripe.org so that people know its around!
-
Re: Lucene module - search including PDF, Word and Excel

17 November 2010 at 10:01pm
So this doens't have any server dependencies like the Sphinx module?
From the looks of it it's really easy to setup, use and extend.
I think the dev team should also look into this as the current search mechanism is a sodding nightmare! -
Re: Lucene module - search including PDF, Word and Excel

18 November 2010 at 3:01pm
It doesn't depend on anything - everything you need to start indexing is in the module folder. =]
If you have 'pdftotext' installed on your server, it will take advantage of that for better PDF indexing, though - and, one caveat, if you have Zend in your include_path, it may interfere with the Zend stuff that's included.
-
Re: Lucene module - search including PDF, Word and Excel

18 November 2010 at 9:37pm
this looks really promising.
i agree with Mad_Clog that the search functionality that ships with silverstripe is very restricted; also i didn't find the sphinx module overly useful due to the mentioned server dependencies.however, the thing that was bugging me most with silverstripe's search method is that it won't look into related fields - from what i've seen this is a feature you've got on your list for future enhancments. would be really happy to hear about it if you can manage to integrate this feature. thumbs up!
-
Re: Lucene module - search including PDF, Word and Excel

19 November 2010 at 3:35am
Integrated this into a site last night, had it up and running within 30 minutes.
It wasn't playing nice with the ZendFramework reference which was already in my include path.
In the set_include_path (called in _config.php) i stripped out my original ZF path from the get_include_path which is used there.Thanks for this module and keep it up
-
Re: Lucene module - search including PDF, Word and Excel

19 November 2010 at 9:36am
I really should do some work so that it will use your own Zend if you've included it already... paid work is coming first at the moment!
I haven't decided on a way to index relationships just yet. For configuration, I think using dot-notation like other SS relation configs would be a good idea, eg. for a has-one you could go Image.Title, or for has-many or many-many you could have Images.Title.
Adding OCR for indexing text contained in images is also on the cards.
| 3573 Views | ||
| Go to Top | Next > |




