Jump to:

10929 Posts in 2613 Topics by 1809 members

All other Modules

SilverStripe Forums » All other Modules » Lucene module - search including PDF, Word and Excel

Discuss all other Modules here.

Moderators: martimiz, Sean, biapar, Willr, Ingo, swaiba, simon_w

Page: 1 2 3
Go to End
Author Topic: 4798 Views
  • schellmax
    Avatar
    Community Member
    126 Posts

    Re: Lucene module - search including PDF, Word and Excel Link to this post

    hey, just found indexing relations is now possible. http://code.google.com/p/lucene-silverstripe-plugin/source/detail?r=29
    i'll definitely give this a try in my next silverstripe project.
    thanks for all the work!

  • Darren Inwood
    Avatar
    Community Member
    12 Posts

    Re: Lucene module - search including PDF, Word and Excel Link to this post

    Cheers for your interest!

    I've just released a new version incorporating some ideas from users. It indexes files a LOT quicker, and can index older Word/Excel documents too. If you're playing with it at the moment, I'd suggest grabbing the 0.3.3 release. =]

  • hippo1
    Avatar
    Community Member
    5 Posts

    Re: Lucene module - search including PDF, Word and Excel Link to this post

    Darren - does version 0.3.3 only work with older Microsoft Word/Excel documents (ie Office 97)?
    As it seems my install of SilverStripe (2.4.4) returns search results from .docx and .xlsx but not .doc and .xls.

  • Darren Inwood
    Avatar
    Community Member
    12 Posts

    Re: Lucene module - search including PDF, Word and Excel Link to this post

    Hi hippo1,

    You actually need the 'zip' extension for PHP loaded for scanning xslx/docx/pptx documents to work. To check if you have this extension, visit a phpinfo() page and see if there is a section under 'Configuration' for 'zip'.

    If you're on a debian-based host (most hosting vendors use debian) the zip extension is installed by default, if you're on WAMP/MAMP you may need to go to 'PHP Extensions' and enable the php_zip option.

    Hope this helps!
    --
    Darren

  • hippo1
    Avatar
    Community Member
    5 Posts

    Re: Lucene module - search including PDF, Word and Excel Link to this post

    Thanks Darren, but as explained I can actually scan those documents (.docx, .xlsx) you have mentioned however it wont scan .doc or .xls documents.
    So it seems to me, being rather new to al of this so certainly can be wrong, that the Lucene 0.3.3 module only searches for Office 97 documents (.docx and .xlsx) but not the later versions (.doc and .xls). I was looking at seeing how hard it would be to modify the code to include .doc and .xls documents but thought would ask here first if they should be supoprted and maybe my SilverStripe installation is at fault.
    BTW - I do have the PHP Zip module installed (verified by phpinfo), and am running SS 2.4.4 on a Windows server with SQL2008 as the backend.

  • Darren Inwood
    Avatar
    Community Member
    12 Posts

    Re: Lucene module - search including PDF, Word and Excel Link to this post

    Apologies, I'm at work atm so my mind is elsewhere...!

    You need to install the commandline 'catdoc' utility suite to enable scanning doc/xsl/ppt documents. This isn't in the documentation which is a bit of an oversight!

    If you're on debian/ubuntu you can do apt-get install catdoc, if you're on another *nix or Mac OS X you can use your own package management system or will possibly need to compile them from source:
    http://www.wagner.pp.ru/~vitus/software/catdoc/

    If you are using Windows then sorry but you'll need to do some coding in ZendSearchLuceneWrapper::index() function, there are some lines like this that won't work on Windows:
    $catdoc = trim(shell_exec('which catdoc'));

    You'll need to replace them with either hardcoded file paths to wherever the catdoc utilities are installed, or set up some sort of config system yourself.

    Hopefully I can find time to get this going on Windows soon so I can release the current SVN trunk, where all this is fixed already!

    Hope that's a better answer =]

  • hippo1
    Avatar
    Community Member
    5 Posts

    Re: Lucene module - search including PDF, Word and Excel Link to this post

    Many thanks for quick response!
    I might have a crack at getting this to work through some coding (windows), as suggested, if even to get to become better acquainted with SilverStripe a little more

    Cheers.

  • nicolant
    Avatar
    Community Member
    6 Posts

    Re: Lucene module - search including PDF, Word and Excel Link to this post

    First, thanks for a great module.

    I'm testing it on shared hosting, which has a limitation of 64 open files. As a result every time I rebuild index or try to index page after that I get "Too many files open" error. Is there a way to reduce number of open files?

    4798 Views
Page: 1 2 3
Go to Top

Want to know more about the company that brought you SilverStripe? Then check out SilverStripe.com

Comments on this website? Please give feedback.