silverstripe-fulltextsearch/docs/Solr.md

167 lines
5.8 KiB
Markdown
Raw Normal View History

2012-07-19 03:27:53 +02:00
# Solr connector for SilverStripe fulltextsearch module
2012-07-18 17:54:59 +02:00
## Introduction
2012-07-18 17:54:59 +02:00
This module provides a fulltextsearch module connector to Solr.
It works with Solr in multi-core mode. It needs to be able to update Solr configuration files, and has modes for
doing this by direct file access (when Solr shares a server with SilverStripe) and by WebDAV (when it's on a different server).
2012-07-18 17:54:59 +02:00
See the helpful [Solr Tutorial](http://lucene.apache.org/fulltextsearch/api/doc-files/tutorial.html), for more on cores and querying.
2012-07-19 03:27:53 +02:00
2012-07-18 17:54:59 +02:00
## Requirements
2012-07-18 17:54:59 +02:00
Since Solr is Java based, it requires Java 1.5 or greater installed.
It also requires a servlet container such as Tomcat, Jetty, or Resin.
Jetty is already packaged with the module.
2012-07-18 17:54:59 +02:00
See the official [Solr installation docs](http://wiki.apache.org/solr/SolrInstall)
for more information.
2012-07-18 17:54:59 +02:00
Note that these requirements are for the Solr server environment,
which doesn't have to be the same physical machine as the SilverStripe webhost.
2012-07-18 17:54:59 +02:00
## Installation
2012-07-18 17:54:59 +02:00
Configure Solr in file mode. The 'path' directory has to be writeable
by the user the Solr search server is started with (see below).
// File: mysite/_config.php:
<?php
SearchUpdater::bind_manipulation_capture();
Solr::configure_server(isset($solr_config) ? $solr_config : array(
'host' => 'localhost',
'indexstore' => array(
'mode' => 'file',
'path' => BASE_PATH . '/fulltextsearch/thirdparty/fulltextsearch/server/solr'
)
));
Create an index
2012-07-18 17:54:59 +02:00
// File: mysite/code/MyIndex.php:
<?php
class MyIndex extends SolrIndex {
function init() {
$this->addClass('Page');
$this->addAllFulltextFields();
}
}
Start the search server (via CLI, in a separate terminal window or background process)
cd fulltextsearch/thirdparty/fulltextsearch/server/
2012-07-18 17:54:59 +02:00
java -jar start.jar
2012-07-18 17:54:59 +02:00
Initialize the configuration (via CLI)
2012-07-18 17:54:59 +02:00
sake dev/tasks/Solr_configure
2012-07-19 03:27:53 +02:00
## Usage
After configuring Solr, you have the option to add your existing
content to its indices. Run the following command:
2012-07-18 17:54:59 +02:00
sake dev/tasks/Solr_reindex
This will rebuild all indices. You can narrow down the operation with the following options:
2012-07-18 17:54:59 +02:00
- `index`: PHP class name of an index
- `class`: PHP model class to reindex
- `start`: Offset (applies to matched records)
- `variantstate`: JSON encoded string with state, e.g. '{"SearchVariantVersioned":"Stage"}'
- `verbose`: Debug information
Note: The Solr indexes will be stored as binary files inside your SilverStripe project.
You can also copy the `thirdparty/`solr directory somewhere else,
just set the path value in `mysite/_config.php` to point to the new location.
And of course run `java -jar start.jar` from the new directory.
### Spell Checking ("Did you mean...")
Solr has various spell checking strategies (see the ["SpellCheckComponent" docs](http://wiki.apache.org/solr/SpellCheckComponent)), all of which are configured through `solrconfig.xml`.
In the default config which is copied into your index,
spell checking data is collected from all fulltext fields
(everything you added through `SolrIndex->addFulltextField()`).
The values of these fields are collected in a special `_text` field.
$index = new MyIndex();
$query = new SearchQuery();
$query->search('My Term');
$params = array('spellcheck' => 'true', 'spellcheck.collate' => 'true');
$results = $index->search($query, -1, -1, $params);
$results->spellcheck
The built-in `_text` data is better than nothing, but also has some problems:
Its heavily processed, for example by stemming filters which butcher words.
So misspelling "Govnernance" will suggest "govern" rather than "Governance".
This can be fixed by aggregating spell checking data in a separate
<?php
class MyIndex extends SolrIndex {
function init() {
// ...
$this->addCopyField('SiteTree_Title', 'spellcheckData');
$this->addCopyField('DMSDocument_Title', 'spellcheckData');
$this->addCopyField('SiteTree_Content', 'spellcheckData');
$this->addCopyField('DMSDocument_Content', 'spellcheckData');
}
// ...
function getFieldDefinitions() {
$xml = parent::getFieldDefinitions();
$xml .= "\n\n\t\t<!-- Additional custom fields for spell checking -->";
$xml .= "\n\t\t<field name='spellcheckData' type='textSpell' indexed='true' stored='false' multiValued='true' />";
return $xml;
}
}
Now you need to tell solr to use our new field for gathering spelling data.
In order to customize the spell checking configuration,
create your own `solrconfig.xml` (see "File-based configuration").
In there, change the following directive:
<!-- ... -->
<searchComponent name="spellcheck" class="solr.SpellCheckComponent">
<!-- ... -->
<str name="field">spellcheckData</str>
</searchComponent
Don't forget to copy the new configuration via a call to the `Solr_Configure`
task, and reindex your data before using the spell checker.
### Custom Types
Solr supports custom field type definitions which are written to its XML schema.
Many standard ones are already included in the default schema.
As the XML file is generated dynamically, we can add our own types
by overloading the template responsible for it: `types.ss`.
In the following example, we read out type definitions
from a new file `mysite/solr/templates/types.ss` instead:
<?php
class MyIndex extends SolrIndex {
function getTemplatesPath() {
return Director::baseFolder() . '/mysite/solr/templates/';
}
}
2012-07-19 03:27:53 +02:00
## Debugging
### Using the web admin interface
You can visit `http://localhost:8983/solr`, which will show you a list
to the admin interfaces of all available indices.
There you can search the contents of the index via the native SOLR web interface.
2012-07-18 17:54:59 +02:00
It is possible to manually replicate the data automatically sent
to Solr when saving/publishing in SilverStripe,
which is useful when debugging front-end queries,
see `thirdparty/fulltextsearch/server/silverstripe-solr-test.xml`.
java -Durl=http://localhost:8983/solr/MyIndex/update/ -Dtype=text/xml -jar post.jar silverstripe-solr-test.xml