Jump to:

10994 Posts in 2728 Topics by 1819 members

All other Modules

SilverStripe Forums » All other Modules » Searching PDF files - google site search

Discuss all other Modules here.

Moderators: martimiz, Sean, biapar, Willr, Ingo, swaiba, simon_w

Page: 1
Go to End
Author Topic: 453 Views
  • Christy
    Avatar
    Community Member
    57 Posts

    Searching PDF files - google site search Link to this post

    I would appreciate hearing your recommendations/experiences with site searches that include PDF files.

    I have a small site that includes PDFs and the wish is for these to be included in the search. I am thinking at adding the Google Site Search Module but the link in the documentation goes to the Google Enterprise Search. Can the custom search in the Google Webmaster tools be used instead? It would be hard to justify the cost of the Google Enterprise Search. However, the Google forums contain many posts concerning the difficulties of searching PDF files!

    Thanks in advance for any advice.

  • Willr
    Avatar
    Forum Moderator
    5508 Posts

    Re: Searching PDF files - google site search Link to this post

    If you want an easy solution, the Google site search module will include file contents (https://github.com/dnadesign/silverstripe-googlesitesearch) and was free for up to 1000 searches a month when I released it. An example of it running is on http://tnzi.com/

    The other option would be Solr + Text Extraction (http://addons.silverstripe.org/add-ons/silverstripe/textextraction) which would require non trivial setup costs so I'm a fan of Google search for quick use.

  • Christy
    Avatar
    Community Member
    57 Posts

    Re: Searching PDF files - google site search Link to this post

    Willr, thank you very much for your reply.

    As Google are now charging $100 a year for the enterprise version of their search engine, I have been attempting to set up the free version to find files (pdf, docx, pptx etc). (Unfortunately, it does not appear to supply a cse_key to enable me to use your googlesitesearch module.)

    Google finds no errors in the sitemap and now appears to have indexed the site's pages. Still doesn't find any linked files though! The links are found under a members' only section but I have seen a forum post asking how to avoid such files from appearing in the search results so that should not be the problem. Also Google reports no blocked URLs.

    Any ideas about what newbie mistake I must be making?

    Thanks.

  • Christy
    Avatar
    Community Member
    57 Posts

    Re: Searching PDF files - google site search Link to this post

    I believe that I have isolated the problem to Google search not searching the pages containing the links to documents. It is not finding non-link text on those pages. In the sitemap, the three pages have a 50% priority. I have increased their importance in the CMS but this is yet to show in the sitemap. I hope that giving Google time to reindex the site will solve the problems.

    453 Views
Page: 1
Go to Top

Want to know more about the company that brought you SilverStripe? Then check out SilverStripe.com

Comments on this website? Please give feedback.