Jump to:

11003 Posts in 2735 Topics by 1824 members

All other Modules

SilverStripe Forums » All other Modules » Solr search and indexing files

Discuss all other Modules here.

Moderators: martimiz, Sean, biapar, Willr, Ingo, swaiba, simon_w

Page: 1 2 3 4
Go to End
Author Topic: 3558 Views
  • tgfisher
    Avatar
    Community Member
    10 Posts

    Solr search and indexing files Link to this post

    We're running SilverStripe on Windows and we've got a lot of PDFs, Word Documents, PowerPoint Presentations, etc. that we NEED to be searchable via SilverStripe. We're looking at the SilverStripe Solr module, but we've noticed there isn't support for that yet for the module (or at least from what I can tell via the module's issue page - https://github.com/nyeholt/silverstripe-solr/issues/1).

    I know Solr has the ability to handle these files out of the box. What needs to be done on the module side to have it work? I would be interested in helping build this out (as it is a pre-requisite for the search functionality of our site). Any feedback would be greatly appreciated ::cough:: Marcus? ::cough::

  • Marcus
    Avatar
    Administrator
    86 Posts

    Re: Solr search and indexing files Link to this post

    Howdy - there's a couple of ways of approaching it. One is to convert files to a text representation and sending that through to be indexed. The other option is to use Solr's native file indexing mechanisms, but a) I haven't done any research into it to know how well it works and b) I'm not sure if the PHP library I'm using supports file attachments to documents. Unfortunately I don't have any time at the moment to look into things - and probably won't for a month or two at least! Any investigation you're able to do I'm happy to provide answers to questions along the way.

  • tgfisher
    Avatar
    Community Member
    10 Posts

    Re: Solr search and indexing files Link to this post

    Marcus, thanks for the response.

    Which PHP library are you referring to? I haven't dug into the code yet, but if you give me the name, I'll keep an eye out for it.

    We used to use Plone for our previous CMS, and it's approach was similar to your first suggestion. It used "wv", "pdftotext", and some other utilities to extract text from the files, and then indexed from there. It seemed to work well, but I'm not sure what search platform Plone uses.

    Is Apache Tika the native file indexing mechanisms you are talking about? I saw it is built into Solr specifically for "Rich Document Parsing and Indexing (PDF, Word, HTML, etc)".

    It may be worth some time trying to work with Solr than to try and re-invent the wheel by using other utilities. I'll dig in and see what I can do on my own. I'm sure I'll be tapping you for help as I go along.

  • Marcus
    Avatar
    Administrator
    86 Posts

    Re: Solr search and indexing files Link to this post

    I was actually referring to the Solr PHP client library http://code.google.com/p/solr-php-client/

  • tgfisher
    Avatar
    Community Member
    10 Posts

    Re: Solr search and indexing files Link to this post

    Marcus, I got the module up and running on Windows, but I'm having a hard time getting Solr to index (and return queries) on additional search fields. Here's what I've got in _config.php for my site:

    SiteTree::add_extension('SiteTree', 'MySiteTree');
    DataObject::add_extension('SiteTree', 'SolrIndexable');
    Object::add_extension('Page_Controller', 'SolrSearchExtension');
    DataObject::add_extension('DataObject', 'MyDataObject');

    And here's what I have for the "MySiteTree" decorator:

    class MySiteTree extends SiteTreeDecorator {
       
       function extraStatics($class = null) {
          return array(
             'searchable_fields' => array(
                'Title',
                'Content',
                'MetaDescription',
                'MetaKeywords',
             )
          );
       }
       ...

    I'm fairly new to SilverStripe, so I feel like I may be missing something obvious. Any ideas on how I can get MetaDescription and MetaKeywords integrated into search?

  • Marcus
    Avatar
    Administrator
    86 Posts

    Re: Solr search and indexing files Link to this post

    I've done a quick update to the module which adds a bit of stuff that will help you out a bit, and have updated the doco at https://github.com/nyeholt/silverstripe-solr/wiki/Usage-overview. Basically, you'll need to add the searchable_fields changes via updateSearchableFields in your extension (or, just modify the static variable directly). You'll also need to add it via SolrSearchService::add_default_query_field('MetaDescription_t'); and SolrSearchService::add_default_query_field('MetaKeywords_ms');

    Be aware of the note about the difference between _t and _ms fields; one is tokenised first, the other isn't, meaning you need to explicitly add * for partial matches in your results.

  • BlueO
    Avatar
    Community Member
    52 Posts

    Re: Solr search and indexing files Link to this post

    Hey I'm also running into a few issues with this and was hoping to get a pointer or two,

    I've got the module setup and it is indexing my pages just fine but won't index a dataobject 'Entry',
    i've put in

    DataObject::add_extension('Entry', 'SolrIndexable');

    but no joy,

    ideas?

    cheers

    Bernard

  • Marcus
    Avatar
    Administrator
    86 Posts

    Re: Solr search and indexing files Link to this post

    Are you getting any errors being spat out at all? I've got a project where I'm indexing standalone non-SiteTree data objects and it works fine

    3558 Views
Page: 1 2 3 4
Go to Top

Want to know more about the company that brought you SilverStripe? Then check out SilverStripe.com

Comments on this website? Please give feedback.