Jump to:

23546 Posts in 19335 Topics by 2889 members

General Questions

SilverStripe Forums » General Questions » Search - what should I actually expect?

General questions about getting started with SilverStripe that don't fit in any of the categories above.

Moderators: martimiz, Sean, biapar, Willr, Ingo, swaiba, simon_w

Page: 1
Go to End
Author Topic: 999 Views
  • Junglefish
    Avatar
    Community Member
    104 Posts

    Search - what should I actually expect? Link to this post

    Hi

    Am currently evaluating an out-of-the-box (more-or-less) installation of SS, version 2.3.3. The search functionality is returning some rather odd results so I wanted to find out a) what I should expect out of the box and b) what can be achieved by tweaking if necessary.

    (Apologies for the slight terseness in the wording below, but I was trying to be as concise as possible while stating what I'm trying to achieve.)

    Here's what's currently happening:

    PageTypes:
    Page
    Calendar
    CalendarEvent
    ArticleHolder
    ArticlePage
    ...all return results. However, any COMMENTS in any of those pagetypes do not get returned.

    Question: Is this what I should expect?

    Question: How can I get the comments included in the search results?

    Binary documents:
    PDFs
    When the PDF is uploaded, it is not immediately picked up by search.
    When I use a re-director pagetype to link directly to a PDF, the PDF text is found in the search results.
    When I use a hyperlink in the text of a normal page to simply point at a PDF, the text is NOT found in the search results.

    Question: Is this what I should expect?

    DOCs
    Random Word Document a)
    As soon as was uploaded, it was picked up by search, even before adding it to any content.

    Random Word Document b)
    Is NOT picked up by search, even after using a re-director pagetype to link directly to it.

    This seems very odd. Why would the two word docs be being handled so differently?

    Question: At what point should DOCs and PDFs be included in search results? Just after upload? After they've been pulled up into the site in some way? If so, in what way?

    Question: Is there some process I need to undertake in order to add binary documents to the search index? If not, what possible reasons might there be to explain why one word document is being found while the other one isn't?

    All advice gratefully received...

  • Junglefish
    Avatar
    Community Member
    104 Posts

    Re: Search - what should I actually expect? Link to this post

    More info...

    It appears I am mistaken on a couple of things. My 'inconsistencies' were caused by me searching for a term that was present in the actual filename. So, here's what I can deduce...

    With binary documents, the search functionality returns a result if it finds the search term in the filename, but not in the document itself. This appears to be true for both PDFs and DOCs.

    My questions are therefore now much simpler:

    a) How do I enable full-text search on PDFs and DOCs?
    b) How do I enable search on Comments?

    Thanks,

  • dalesaurus
    Avatar
    Community Member
    283 Posts

    Re: Search - what should I actually expect? Link to this post

    I am not very familiar with SS's searching capabilities but I am certain they are based on MySQL's full text indexing. This will generally not allow you to do what you are asking above with binary types. Well, at least not without some hacking.

    If you are trying to search rich document types I would recommend using a software package that is designed to do just that instead of a CMS.

    Apache SOLR would be a good standalone product you can integrate into your SS pages via built-in APIs
    http://lucene.apache.org/solr/

    It sounds like you might really be looking for a Document Management system. KnowlegeTree excels in this area:
    http://www.knowledgetree.com/opensource

    And hey, if you have the cash go for a Google Search Appliance. For ~$3000 it will do everything...EVERYTHING.
    http://www.google.com/gsa

  • Junglefish
    Avatar
    Community Member
    104 Posts

    Re: Search - what should I actually expect? Link to this post

    @Dalesaurus. Thanks for the info.

    It's funny, when I was looking around at CMSs to evaluate, one of the killer features I thought SS had was the ability to search binary documents. I *swear* I read it in the documentation somewhere. Also, the guy in this thread seems to suggest it's possible http://ssorg.bigbird.silverstripe.com/archive/show/127856#post127856

    But I take your point, and I guess I'm resigned to having to think about integrating a third-party solution.

    Does anyone out there have any experience of integrating Lucene with SS and can offer any pointers or tips?

    (PS. I'm not really doing "document management", but the application I need to build is an intranet where they make lots of core information available in PDF form. Therefore, the ability to search those PDFs is a key requirement.)

  • dalesaurus
    Avatar
    Community Member
    283 Posts

    Re: Search - what should I actually expect? Link to this post

    A super hacky solution would be to add a form for PDF Search and have it exec a grep command from php. There are CLI tools that can read PDFs, grep can return a filename, and you can use DataObject::get* to grab the objects by name to pack them in a DataObjectSet for pretty results.

    http://stackoverflow.com/questions/694049/unable-to-search-pdf-files-contents-in-terminal

    Its not ideal, but it's a pretty minimal amount of coding.

    However if you are looking for an enterprise intranet that will have non-tech folks working with it, I would still recommend evaluating KnowledgeTree or FOSWiki.

  • Junglefish
    Avatar
    Community Member
    104 Posts

    Re: Search - what should I actually expect? Link to this post

    @Dalesaurus. Thanks for your suggestions.

    Foswiki looks interesting, and might actually be suitable for another project I'm meant to be looking at, but I REALLY want to use SS for this intranet project. So, I need to find a way of searching the binaries within SS and integrating them with the database search results.

    I think I'll open a new thread with this requirement stated clearly. Someone out there must have had to nail this puppy before...

    j/

    999 Views
Page: 1
Go to Top

Want to know more about the company that brought you SilverStripe? Then check out SilverStripe.com

Comments on this website? Please give feedback.