Skip to main content

This site requires you to update your browser. Your browsing experience maybe affected by not having the most up to date version.

All other Modules

Discuss all other Modules here.

Moderators: martimiz, Sean, biapar, Willr, Ingo, swaiba, simon_w

Sphinx 0.1: $SphinxSearchForm / no enabled local indexes to search


Reply

59 Posts   11836 Views

Avatar
Enclave SSC

28 July 2010 at 7:26am Community Member, 31 Posts

Thanks Mark, I am now understanding this. I quite new to SS and PHP in general.

Do you have any ideas on how I can easily formulate links in my results to link to my various dataobjects pages, short of storing a url to each entry in the database?

I am also managing to index all Objects added to classestosearch in Sphinx. Specifying file content to be indexed for my dataobject is posing a problem.

I use the code you provided:

static $extensions = array(
      'SphinxSearchable'
   );

   static $sphinx = array(
      "search_fields" => array("Title","Summary"),
      "mode" => "xmlpipe",
      "external_content" => array("file_content" => array("Document", "getDocumentContent"))
   );

   static function getDocumentContent($documentId) {
      $doc = DataObject::get_by_id("Document", $documentId);
      if (!$doc || !$doc->File()) return "";
      return $doc->File()->extractFileAsText();
   }

No file contents are however indexed. Any ideas what I could be do wrong. I set $_FILE_TO_URL_MAPPING in my environment aswell.

Avatar
mark_s

28 July 2010 at 1:42pm Community Member, 78 Posts

Hi.

On handling search results:
- for objects in the set that are SiteTree descendents, just use Link directly.
- for non-SiteTree DataObjects you'll probably have to build a controller that understands how to present those objects. A straightforward pattern is to define a new page type, say MyObjectHolder, with a view method that uses the 'ID' URL parameter to invoke a template to view the dataobject. It sounds complicated but isn't very hard. I can provide some guidance on that if you want.

On indexing files:
- have you added this to mysite/_config: DataObject::add_extension('File', 'FileTextExtractable');
- I'm not sure its entirely clear from the docs, but the structure it is describing is: Page has Document objects; the sphinx is defined on Document, which is where the function is located; Document class has $has_one = array("File" => "File"). I think I'll review that whole section because it isn't immediately clear on reading, and I wrote it ;-)
- this structure is not the only structure possible, but its a reasonably flexible structure. e.g. you could have Page having a file, and in the sphinx static on Page define the external_content to point to a function on Page that gets the file contents.
- if you're still having issues, one thing is to check that the file extraction is working correctly. Look at the FileContentCache column in File table in the database. This should be populated after indexing has run - content is read and cached here on demand (as a result of indexing).

Mark

Avatar
Enclave SSC

29 July 2010 at 1:13am (Last edited: 29 July 2010 1:46am), Community Member, 31 Posts

Hi Mark

Thanks firstly for all your excellent support. You already helped me a ton.

- $Link works 100% for SiteTree descendents.

- I created a controller that allows me to view the dataobjects. This works 100% and I access them as follows: cases/complaints/retrieve_case/100. My question is just how to I link to these in my search results once indexed. How do I know the result is coming from the dataobject in order to change the URL/Link in the template. What would be the easiest way to manage this?

Indexing files:

DataObject::add_extension('File', 'FileTextExtractable'); was added to mysite/_config.php

It is clear from your explanation that I did not understand the usage of that code at all.

My structure is as follows:

My custom dataobject DecidedCase has a document field referencing a file in the File table.

static $has_one = array(
      'Document' => 'File'
   );

I want to now Index all the files from DecidedCase which are in the File table. I hope this makes sense :). My FileContentCache column remains NULL. Think I am doing something wrong.

This is my weak attempt at making it work:

static $sphinx = array(
      "search_fields" => array("Title","Summary","Applicant","Respondent"),
      "mode" => "xmlpipe",
      "external_content" => array("file_content" => array("DecidedCase", "getDocumentContent"))
   );

   static function getDocumentContent($documentId) {
      $doc = DataObject::get_by_id("File", $documentId);
      if (!$doc || !$doc->File()) return "";
      return $doc->File()->extractFileAsText();
   }

This is my result on Sphinx/reindex:

using config file '/tmp/silverstripe-cache-var-www-comptrib/sphinx/sphinx.conf'...
indexing index 'DecidedCase'...
ERROR: index 'DecidedCase': xmlpipe: expected '<document>', got 'ERROR [User Error]: Uncaught Exception: Object->__call(): the m'.

Avatar
aimcom

29 July 2010 at 1:47am Community Member, 8 Posts

I have found another solution for indexing the content of a file which works even without the Sphinx search module. The class FileDownload can be used for PDF files and uses pdftotext to index the content:

class FileDownload extends DataObject {
   // ...
   public statis $has_one = array(
      'Parent' => 'Page',
      'File' => 'File'
   );
   
   public function onBeforeWrite() {
      $tmpFile = '/tmp/silverstripe_pdf.txt';
      exec ('/usr/bin/pdftotext -nopgbrk '. BASE_PATH .'/'. $this->File()->Filename .' '. $tmpFile);
      $this->File()->Content = utf8_encode (file_get_contents ($tmpFile));
      $this->File()->write();
      exec ('rm '. $tmpFile);
      
      parent::onBeforeWrite();
   }
   
   // ...
}

Avatar
Carbon Crayon

29 July 2010 at 5:25am Community Member, 598 Posts

Hi Mark, thanks for your last reply.

So I have made sure all the files in the tmp dir are owned my the root user, which is also running apache. I did a chown -R root sphinx to make sure.

Strangely when I stopped searchd (--stop) and then started it again it came up with these errors:

using config file './sphinx.conf'...
listening on UNIX socket /tmp/silverstripe-cache-home-aabweb-public_html-ukc/sphinx/searchd.sock
WARNING: index 'BaseIdx': key 'path' not found' - NOT SERVING
WARNING: index 'Product': preload: failed to open /tmp/silverstripe-cache-home-aabweb-public_html-ukc/sphinx/idxs/Product.sph: No such file or directory; NOT SERVING
WARNING: index 'ProductDelta': preload: failed to open /tmp/silverstripe-cache-home-aabweb-public_html-ukc/sphinx/idxs/ProductDelta.sph: No such file or directory; NOT SER
.
.
.etc...

So I then went to reindex and I got an error saying permission denied for searchd.pid, so I made that 0777 and it indexed ok, but then still the same error when searching. I went through again and the same errors, first searchd complaining it couldn't find any files, then SS complaining it couldnt open searchd.pid then the original error....

When you say run the search command that comes with sphinx from the command line, I tried 'search sphinx.conf' from within the temp sphinx dir and it seemed to just hang. I am not too experienced with command line stuff so feeling a little out of my depth at this point but would dearly love to get this working :(

Thanks for your help so far

Aram

Avatar
Carbon Crayon

29 July 2010 at 5:28am Community Member, 598 Posts

@EnclaveSSC - I think you just need a function Link() on your Dataobject which generates the link to the display controller that you have now created, for example:

function Link(){

$Page = DataObject::get_one('DisplayPage');

return $Page->Link() . 'show/' . $this->ID;

}

Hope that helps :)

Aram

Avatar
Enclave SSC

29 July 2010 at 7:46am (Last edited: 29 July 2010 7:47am), Community Member, 31 Posts

@aram I was having the same issues with Sphinx initially. What I then did was the following.

I removed Sphinx. Downloaded the latest copy of it version 1.something. I built the new version as follows:

./configure --prefix=/usr/local --enable-id64 --with-mysql
make install

Once done I made sure all the sphinx files in /usr/local/bin was owned by www-data. This is the apache user and group.

chown -R www-data:www-data searchd/indexer etc etc.

I never run ./seachd. SilverStripe will initiate the process with the correct config as parameter when needed for search.

Check this with ps axu you should see:

www-data 22859 0.0 0.5 32624 24120 ? S 21:26 0:00 /usr/local/bin/searchd --config /tmp/silverstripe-cache-var-www-comptrib/sphinx/sphinx.conf

after trying a reindex/diagnose.

I then overwrote ContentControllerSearchExtension and reference SphinxSearchForm. Also set allowed actions on the relevant controllers. I now get results but had to manually add my custom class to the $classesTosearch in SphinxSearchForm.php

@mark Any help regarding the file indexing would be appreciated as I still cannot get any files indexed on my dataobject.

Avatar
Enclave SSC

29 July 2010 at 7:56am (Last edited: 29 July 2010 8:47am), Community Member, 31 Posts

@aram Where should that link function be placed. When added to my dataobject decorator I receive a segmentation fault in my apache error log. <- strange.

Adding a basic link function inserting a string into the link works.

To be 100% clear the reason I need to generate a @Link to link to my dataobject is because I use Page_results.ss to display the results of my SphinxSearchForm. This works fine showing and linking to SiteTree and File results because the $Link is generated correctly. $Link for the dataobject items however just reference the root of the site.

I am just not sure how I am going to generate links as follows for the search page:

cases/complaints/retrieve_case/102

cases is a page -> complaints is a holder for dataobjects -> retrieve_case is my action and 102 is my ID.

I use:

function Link(){
   $case = DataObject::get_one("DecidedCase");
   return Director::baseURL() . 'cases/' . $case->Type . '/retrieve_case/' . $case->ID;
}

This however always links to a single case ID. I would like the case ID to change based on the link the search result has found.

** I need to somehow call the ID from my page search in this Link function on my dataobject decorator. Then I need to get dataobject by ID. How can I call the ID from the page_controller search function in this link function? ** My total lack of PHP and OO programming is letting me down.