Skip to main content

This site requires you to update your browser. Your browsing experience maybe affected by not having the most up to date version.

We've moved the forum!

Please use forum.silverstripe.org for any new questions (announcement).
The forum archive will stick around, but will be read only.

You can also use our Slack channel or StackOverflow to ask for help.
Check out our community overview for more options to contribute.

All other Modules /

Discuss all other Modules here.

Moderators: martimiz, Sean, Ed, biapar, Willr, Ingo, swaiba

Sphinx 0.1: $SphinxSearchForm / no enabled local indexes to search


Go to End


59 Posts   23886 Views

Avatar
Enclave SSC

Community Member, 31 Posts

2 August 2010 at 5:29pm

Thanks Mark, this works. The outputting of additional indexing information also helped me debug another issue.

Now just to get Aram sorted :)

Avatar
Carbon Crayon

Community Member, 598 Posts

3 August 2010 at 4:53am

Hi guys,

My idx's do seem to be working, though many of the files are empty, some of them such as Product.new.spd and PageLive.words do now contain data.

Mark, I added the code and here is what it output: http://pastie.org/1071225

once again thank you both for the continued help!

Aram

Avatar
Enclave SSC

Community Member, 31 Posts

3 August 2010 at 6:49am

@mark After some testing all is working well. The full file is however not being stored in file content cache. Not sure if you set this limit or if its the datatype (mediumtext) thats limited.

Now just need to highlight searched term in results and show partial string around first occurrence of keyword. :P

@aram I take it you are using sqlite. The warning shouldn't affect indexing but try and load the module it requests. You should be able to do so from within httpd.conf for your apache.

The PID not being accessible means something is still not 100% right permission wise. Is this a VPS with a well known host or your own VPS? If you cannot come right I can always make sure your VPS is configured correctly for you.

Avatar
mark_s

Community Member, 78 Posts

3 August 2010 at 9:31am

Hi.

@aram I think the sqlite warnings are a red herring. The permissions failure is probably the issue. You'll notice it also says it hasn't rotated the indexes. When sphinx reindexes, it creates the new indexes in *.new.* files, and only switches the searchd to using these once they are constructed, so indexing doesn't result in search engine down time. Usually these files are very transient. But because its having problems talking to searchd, it is not able to rotate. So searchd will still be using the old indexes.

To understand the permissions, you need to start with apache. Apache is usually run as root so it can bind to privileged ports (i.e. port 80). But it can be configured to create sub-processes running as a different user to process requests (under debian this is usually the www-data user). Its the latter user you need to know about, because in this scenario it creates the important players: files under temp including sphinx conf, and the searchd process. If there is a mismatch there will probably be problems, so double check that everything is owned by the same user. To be sure, I would also ensure there is only one searchd running.

@enclave: I'd check that the documents are coming back in the search results. It is odd that the file results are not being cached. It implies that either the file contents are not being indexed at all, or it is having an issue storing it in the file; I'm suspicious either way (that may just be my nature :-p)

Mark

Avatar
Enclave SSC

Community Member, 31 Posts

3 August 2010 at 9:37am

Hi Mark, sorry you may have misunderstood me. I mean the FileContentCache is being populated and files returned in the results. This is awesome, but the files are very large case documents and only the first few pages are extracted and stored. I can see the data stored abruptly ends at a certain point without indexing the full file :P

Avatar
mark_s

Community Member, 78 Posts

3 August 2010 at 9:50am

Ok, that's a better problem :-)

The file contents cache is a medium text, so should hold up to 16M of text. If this is coming from a PDF, it should be smaller than the PDF, since pdf2text will only return the stream of words, without formatting or images etc. How long is the text that is stored in the field at the moment? Is it truncated because the field is too small, or could there be something else truncating it?

Mark

Avatar
Carbon Crayon

Community Member, 598 Posts

4 August 2010 at 1:30am

Edited: 05/08/2010 8:34pm

Hi Guys,

As far as I can see, everything this owned by root, but Enclave if you could take a look that would be awesome.

Thanks guys!

Avatar
Carbon Crayon

Community Member, 598 Posts

7 August 2010 at 1:56am

Just to update, I have given up with this, it seems there is something going on with my server/Apache setup which prevents Sphinx from working.....major bummer but I don't have enough knowledge to fix it :(