Skip to main content

This site requires you to update your browser. Your browsing experience maybe affected by not having the most up to date version.

All other Modules /

Discuss all other Modules here.

Moderators: martimiz, Sean, biapar, Willr, Ingo, swaiba, simon_w

Searching PDF files - google site search


Reply


4 Posts   576 Views

Avatar
Christy

Community Member, 58 Posts

10 January 2014 at 4:10pm

I would appreciate hearing your recommendations/experiences with site searches that include PDF files.

I have a small site that includes PDFs and the wish is for these to be included in the search. I am thinking at adding the Google Site Search Module but the link in the documentation goes to the Google Enterprise Search. Can the custom search in the Google Webmaster tools be used instead? It would be hard to justify the cost of the Google Enterprise Search. However, the Google forums contain many posts concerning the difficulties of searching PDF files!

Thanks in advance for any advice.

Avatar
Willr

Forum Moderator, 5513 Posts

10 January 2014 at 10:26pm

If you want an easy solution, the Google site search module will include file contents (https://github.com/dnadesign/silverstripe-googlesitesearch) and was free for up to 1000 searches a month when I released it. An example of it running is on http://tnzi.com/

The other option would be Solr + Text Extraction (http://addons.silverstripe.org/add-ons/silverstripe/textextraction) which would require non trivial setup costs so I'm a fan of Google search for quick use.

Avatar
Christy

Community Member, 58 Posts

16 January 2014 at 2:46pm

Willr, thank you very much for your reply.

As Google are now charging $100 a year for the enterprise version of their search engine, I have been attempting to set up the free version to find files (pdf, docx, pptx etc). (Unfortunately, it does not appear to supply a cse_key to enable me to use your googlesitesearch module.)

Google finds no errors in the sitemap and now appears to have indexed the site's pages. Still doesn't find any linked files though! The links are found under a members' only section but I have seen a forum post asking how to avoid such files from appearing in the search results so that should not be the problem. Also Google reports no blocked URLs.

Any ideas about what newbie mistake I must be making?

Thanks.

Avatar
Christy

Community Member, 58 Posts

17 January 2014 at 5:16pm

I believe that I have isolated the problem to Google search not searching the pages containing the links to documents. It is not finding non-link text on those pages. In the sitemap, the three pages have a 50% priority. I have increased their importance in the CMS but this is yet to show in the sitemap. I hope that giving Google time to reindex the site will solve the problems.