diff --git a/_config/config.yml b/_config/config.yml index 11eea88..50ea2b3 100644 --- a/_config/config.yml +++ b/_config/config.yml @@ -4,6 +4,3 @@ Name: fulltextsearchconfig SilverStripe\ORM\DataObject: extensions: - SilverStripe\FullTextSearch\Search\Extensions\SearchUpdater_ObjectHandler -SilverStripe\CMS\Controllers\ContentController: - extensions: - - SilverStripe\FullTextSearch\Solr\Control\ContentControllerExtension diff --git a/bin/fts_quickstart b/bin/fts_quickstart index 376cc77..b179d2d 100644 --- a/bin/fts_quickstart +++ b/bin/fts_quickstart @@ -46,17 +46,16 @@ else exit 1 fi -# Check to see if it has been enabled in _config.php -grep -i "FulltextSearchable::enable(" "$APPDIR/_config.php" 2> /dev/null +# Check to see if it has been configured in _config.php +grep -i "Solr::configure_server(" "$APPDIR/_config.php" 2> /dev/null if [ "$?" != 0 ]; then - echo "Enabling FulltextSearchable in _config.php..." + echo "Configuring Solr in _config.php..." if [ ! -f "$APPDIR/_config.php" ]; then echo " "$APPDIR/_config.php" echo "" >> "$APPDIR/_config.php" fi echo "" >> "$APPDIR/_config.php" echo "# Enable Fulltextsearch" >> "$APPDIR/_config.php" - echo "\\SilverStripe\\ORM\\Search\\FulltextSearchable::enable();" >> "$APPDIR/_config.php" >> "$APPDIR/_config.php" echo "\\SilverStripe\\FullTextSearch\\Solr\\Solr::configure_server([" >> "$APPDIR/_config.php" echo " 'indexstore' => [" >> "$APPDIR/_config.php" echo " 'mode' => 'file'," >> "$APPDIR/_config.php" diff --git a/docs/en/00_index.md b/docs/en/00_index.md index ed9d14b..c81efc8 100644 --- a/docs/en/00_index.md +++ b/docs/en/00_index.md @@ -6,7 +6,6 @@ - Setup - [Requirements](02_setup.md#requirements) - [Installing Solr](02_setup.md#installing-solr) - - [Installing this module](02_setup.md#installing-the-module) - [Solr admin](02_setup.md#solr-admin) - Configuration - [Solr server parameters](03_configuration.md#solr-server-parameters) @@ -20,10 +19,11 @@ - [Facets](04_advanced_configuration.md#facets) - [Using multiple indexes](04_advanced_configuration.md#multiple-indexes) - [Synonyms](04_advanced_configuration.md#synonyms) - - [Spellcheck](04_advanced_configuration.md#spell-check) + - [Spellcheck](04_advanced_configuration.md#spell-check-("did-you-mean...")) + - [Highlighting](04_advanced_configuration.md#highlighting) - [Boosting](04_advanced_configuration.md#boosting) - [Indexing related objects](04_advanced_configuration.md#indexing-related-objects) - [Subsites](04_advanced_configuration.md#subsites) - - [Adding new fields](04_advanced_configuration.md#adding-new-fields) + - [Custom field types](04_advanced_configuration.md#custom-field-types) - Troubleshooting - [Gotchas](05_troubleshooting.md#common-gotchas) diff --git a/docs/en/01_getting_started.md b/docs/en/01_getting_started.md index 26e6fe6..eb05674 100644 --- a/docs/en/01_getting_started.md +++ b/docs/en/01_getting_started.md @@ -22,17 +22,17 @@ fulltext searching as an extension of the object model. However, the disconnect design and the object model meant that searching was inefficient. The abstraction would also often break and it was hard to then figure out what was going on. -This module instead provides the ability to define those indexes and queries in PHP. The indexes are defined as a mapping -between the SilverStripe object model and the connector-specific fulltext engine index model. This module then interrogates model metadata -to build the specific index definition. +This module instead provides the ability to define those indexes and queries in PHP. The indexes are defined as a +mapping between the SilverStripe object model and the connector-specific fulltext engine index model. This module then +interrogates model metadata to build the specific index definition. -It also hooks into SilverStripe framework in order to update the indexes when the models change and connectors then convert those index and query definitions -into fulltext engine specific code. +It also hooks into SilverStripe framework in order to update the indexes when the models change and connectors then +convert those index and query definitions into fulltext engine specific code. The intent of this module is not to make changing fulltext search engines seamless. Where possible this module provides common interfaces to fulltext engine functionality, abstracting out common behaviour. However, each connector also -offers its own extensions, and there is some behaviour (such as getting the fulltext search engines installed, configured -and running) that each connector deals with itself, in a way best suited to that search engine's design. +offers its own extensions, and there is some behaviour (such as getting the fulltext search engines installed, +configured and running) that each connector deals with itself, in a way best suited to that search engine's design. ## Quick start @@ -48,6 +48,10 @@ This will: - Install Solr 4 - Set up a daemon to run Solr on startup - Start Solr -- Enable `FulltextSearchable` in your `_config.php` (and create one if you don't have one) +- Configure Solr in your `_config.php` (and create one if you don't have one) +- Create a DefaultIndex +- Run a [Solr Configure](03_configuration.md#solr-configure) and a [Solr Reindex](03_configuration.md#solr-reindex) -The simply adding `$SearchForm` to a template and flushing the template cache should add a search text box to your site. +You'll then need to build a search form and results display that suits the functionality of your site. + +// TODO update me when https://github.com/silverstripe/silverstripe-fulltextsearch/pull/216 is merged diff --git a/docs/en/02_setup.md b/docs/en/02_setup.md index 8b16bdf..49222d0 100644 --- a/docs/en/02_setup.md +++ b/docs/en/02_setup.md @@ -1,28 +1,34 @@ # Setup -The fulltextsearch module includes support for connecting to Solr. +The FulltextSearch module includes support for connecting to Solr. -It works with Solr in multi-core mode. It needs to be able to update Solr configuration files, and has modes for doing this by direct file access (when Solr shares a server with SilverStripe) and by WebDAV (when it's on a different server). +It works with Solr in multi-core mode. It needs to be able to update Solr configuration files, and has modes for doing +so by direct file access (when Solr shares a server with SilverStripe) and by WebDAV (when it's on a different server). -See the helpful [Solr Tutorial](http://lucene.apache.org/solr/4_5_1/tutorial.html), for more on cores -and querying. +See the helpful [Solr Tutorial](http://lucene.apache.org/solr/4_5_1/tutorial.html), for more on cores and querying. ## Requirements Since Solr is Java based, it requires Java 1.5 or greater installed. When you're installing it yourself, it also requires a servlet container such as Tomcat, Jetty, or Resin. For -development testing there is a standalone version that comes bundled with Jetty (see [Installing Solr](#installing-solr) below). +development testing there is a standalone version that comes bundled with Jetty (see [Installing Solr](#installing-solr) + below). See the official [Solr installation docs](http://wiki.apache.org/solr/SolrInstall) for more information. -Note that these requirements are for the Solr server environment, which doesn't have to be the same physical machine as the SilverStripe webhost. +Note that these requirements are for the Solr server environment, which doesn't have to be the same physical machine as +the SilverStripe webhost. ## Installing Solr ### Local installation -If you'll be running Solr on the same machine as your SilverStripe installation, you can use the [silverstripe/fulltextsearch-localsolr module](https://github.com/silverstripe-archive/silverstripe-fulltextsearch-localsolr). This can also be useful as a development dependency. You can bring it in via composer (use `require-dev` if you plan to use install Solr remotely in Production): +If you'll be running Solr on the same machine as your SilverStripe installation, and the +[quick start script](01_getting_started.md#quick-start) doesn't suit your needs, you can use the +[fulltextsearch-localsolr module](https://github.com/silverstripe-archive/silverstripe-fulltextsearch-localsolr). This +can also be useful as a development dependency. You can bring it in via composer (use `require-dev` if you plan to +install Solr remotely in Production): ```bash composer require silverstripe/fulltextsearch-localsolr @@ -35,7 +41,8 @@ cd fulltextsearch-localsolr/server java -jar start.jar ``` -Then configure Solr to use `file` more with the following configuration in your `app/_config.php`, making sure that the `path` directory is writeable by the user that started the server (above): +Then configure the module to use `file` mode with the following configuration in your `app/_config.php`, making sure +that the `path` directory is writeable by the user that started the server (above): ```php use SilverStripe\FullTextSearch\Solr\Solr; @@ -51,10 +58,35 @@ Solr::configure_server([ ### Remote installation +Alternatively, it can be beneficial to keep the Solr service contained on its own infrastructure, for performance and +security reasons. The [Common Web Platform (CWP)](www.cwp.govt.nz) uses Solr in this manner. To do so, you should +install the dependencies on the remote server, and then configure the module to use the `webdav` mode like so: +```php +use SilverStripe\FullTextSearch\Solr\Solr; -## Installing the module +Solr::configure_server([ + 'host' => 'remotesolrserver.com', // IP address or hostname + 'indexstore' => [ + 'mode' => 'webdav', + 'path' => BASE_PATH . '/webdav', + ] +]); +``` +Check all the available [configuration options](03_configuration.md#solr-server-parameters) to fine-tune the module to +work with your desired setup. +This will mean that all configuration files, and the indexes themselves, are stored remotely. ## Solr admin + +Solr provides an administration interface with a GUI to allow you to get at the finer details of your cores and +configuration. You can access it at example.com://#/ on a local installation +(usually example.com:8983/solr/#/). + +There you can access logging, run raw queries against your stored indexes, and get some basic performance metrics. +Additionally, you can perform more drastic changes, such as dropping and reloading cores. + +For a comprehensive look at the Solr admin interface, read the +[user guide for Solr 4.10](http://archive.apache.org/dist/lucene/solr/ref-guide/apache-solr-ref-guide-4.10.pdf#page=17) diff --git a/docs/en/03_configuration.md b/docs/en/03_configuration.md index 0330b0e..718274f 100644 --- a/docs/en/03_configuration.md +++ b/docs/en/03_configuration.md @@ -79,15 +79,15 @@ class MyIndex extends SolrIndex You can also skip listing all searchable fields, and have the index figure it out automatically via `addAllFulltextFields()`. This will add any database fields that are `instanceof DBString` to the index. Use this with caution, however, as you may inadvertently return sensitive information - it is often safer to declare your fields explicitly. -Once you've added this file, make sure you run a [Solr configure](#dev_tasks) to set up your new index. +Once you've added this file, make sure you run a [Solr configure](#solr-configure) to set up your new index. ## Adding data to an index -Once you have [created your index](./30_creating_an_index.md), you can add data to it in a number of ways. +Once you have [created your index](#creating-an-index), you can add data to it in a number of ways. ### Reindex the site -Running the [Solr reindex task](./33_dev_tasks.md) will crawl your site for classes that match those defined on your index, and add the defined fields to the index for searching. This is the most common method used to build the index the first time, or to perform a full rebuild of the index. +Running the [Solr reindex task](#solr-reindex) will crawl your site for classes that match those defined on your index, and add the defined fields to the index for searching. This is the most common method used to build the index the first time, or to perform a full rebuild of the index. ### Publish a page in the CMS @@ -177,7 +177,7 @@ $query = SearchQuery::create() ->addSearchTerm('fire'); ``` -You can also limit this to specific fields by passing an array as the second argument: +You can also limit this to specific fields by passing an array as the second argument, specified in the form of `{table}_{field}`: ```php use SilverStripe\FullTextSearch\Search\Queries\SearchQuery; @@ -226,7 +226,7 @@ $query = SearchQuery::create() ->addSearchTerm('fire') // Only include documents edited in 2011 or earlier ->addFilter(Page::class . '_LastEdited', SearchQuery_Range::create(null, '2011-12-31T23:59:59Z')); -$results = singleton(MyIndex::class)->search($query); +$results = MyIndex::singleton()->search($query); ``` Note: At the moment, the date format is specific to the search implementation. @@ -247,7 +247,7 @@ $query = SearchQuery::create() ->addSearchTerm('fire'); // Needs a value, although it can be false ->addFilter(Page::class . '_ShowInMenus', SearchQuery::$present); -$results = singleton(MyIndex::class)->search($query); +$results = MyIndex::singleton()->search($query); ``` ### Querying an index @@ -259,7 +259,7 @@ use SilverStripe\FullTextSearch\Search\Queries\SearchQuery; use My\Namespace\Index\MyIndex; $query = SearchQuery::create()->addSearchTerm('fire'); -$results = singleton(MyIndex::class)->search($query); +$results = MyIndex::singleton()->search($query); ``` The return value of a `search()` call is an object which contains a few properties: @@ -279,7 +279,13 @@ It is often a good idea to run a configure, followed by a reindex, after a code `dev/tasks/Solr_Configure` -This task will upload configuration to the Solr core, reloading it or creating it as necessary. This should be run after every code change to your indexes, or configuration changes. +This task will upload configuration to the Solr core, reloading it or creating it as necessary, and generate the schema. This should be run after every code change to your indexes, or after any configuration changes. This will convert the PHP-based abstraction layer into actual Solr XML. Assuming default configuration and the use of the `DefaultIndex`, it will: + +- create the directory `BASE_PATH/.solr/DefaultIndex/` if it doesn't already exist +- copy configuration files from `vendor/silverstripe/fulltextsearch/conf/extras` to `BASE_PATH/.solr/DefaultIndex/conf/` +- generate a `schema.xml` in `BASE_PATH/.solr/DefaultIndex/conf/` + +This task will overwrite these files every time it is run. ### Solr reindex @@ -289,8 +295,23 @@ This task performs a reindex, which adds all the data specified in the index def If you have the [Queued Jobs module](https://github.com/symbiote/silverstripe-queuedjobs/) installed, then this task will create multiple reindex jobs that are processed asynchronously; unless you are in `dev` mode, in which case the index will be processed immediately (see [processor.yml](/_config/processor.yml)). Otherwise, it will run in one process. Often, if you are running it via the web, the request will time out. Usually this means the actually process is still running in the background, but it can be alarming to the user, so bear that in mind. +Internally groups of records are grouped into sizes of 200. You can configure this group sizing by using the `Solr_Reindex.recordsPerRequest` config: + +```yaml +SilverStripe\FullTextSearch\Solr\Tasks\Solr_Reindex: + recordsPerRequest: 150 +``` + +The Solr indexes will be stored as binary files inside your SilverStripe project. You can also copy the `thirdparty/` Solr directory somewhere else, just set the `path` value in `mysite/_config.php` to point to the new location. + ## File-based configuration +Many aspects of Solr are configured outside of the `schema.xml` file which SilverStripe generates based on the `SolrIndex` subclass that is defined. For example, stopwords are placed in their own `stopwords.txt` file, and advanced [spellchecking](04_advanced_configuration.md#spell-check-("did-you-mean...")) can be configured in `solrconfig.xml`. + +By default, these files are copied from the `fulltextsearch/conf/extras/` directory over to the new index location. In order to use your own files, copy these files into a location of your choosing (for example `mysite/data/solr/`), and tell Solr to use this folder with the `extraspath` [configuration setting](#solr-server-parameters). Run a [`Solr_Configure](#solr-configure) to apply these changes. + +You can also define these on an index-by-index basis by defining `SolrIndex->getExtrasPath()`. + ## Handling results In order to render search results, you need to return them from a controller. You can also drive this through a form response through standard SilverStripe forms. In this case we simply assume there's a GET parameter named `q` with a search term present. @@ -311,7 +332,7 @@ class PageController extends ContentController { $query = SearchQuery::create()->addSearchTerm($request->getVar('q')); return $this->renderWith([ - 'SearchResult' => singleton(MyIndex::class)->search($query) + 'SearchResult' => MyIndex::singleton()->search($query) ]); } } diff --git a/docs/en/04_advanced_configuration.md b/docs/en/04_advanced_configuration.md index eb74e51..a0cd754 100644 --- a/docs/en/04_advanced_configuration.md +++ b/docs/en/04_advanced_configuration.md @@ -2,6 +2,85 @@ ## Facets +Inside the `SolrIndex->search()` function, the third-party library solr-php-client is used to send data to Solr and parse the response. Additional information can be pulled from this response and added to your results object for use in templates using the `updateSearchResults()` extension hook. + +```php +use My\Namespace\Index\MyIndex; +use SilverStripe\FullTextSearch\Search\Queries\SearchQuery; + +$index = MyIndex::singleton(); +$query = SearchQuery::create() + ->addSearchTerm('My Term'); +$params = [ + 'facet' => 'true', + 'facet.field' => 'SiteTree_ClassName', +]; +$results = $index->search($query, -1, -1, $params); +``` + +By adding facet fields into the query parameters, our response object from Solr now contains some additional information that we can add into the results sent to the page. + +```php +namespace My\Namespace\Extension; + +use SilverStripe\Core\Extension; +use SilverStripe\View\ArrayData; +use SilverStripe\ORM\ArrayList; + +class FacetedResultsExtension extends Extension +{ + /** + * Adds extra information from the solr-php-client repsonse + * into our search results. + * @param ArrayData $results The ArrayData that will be used to generate search + * results pages. + * @param stdClass $response The solr-php-client response object. + */ + public function updateSearchResults($results, $response) + { + if (!isset($response->facet_counts) || !isset($response->facet_counts->facet_fields)) { + return; + } + $facetCounts = ArrayList::create([]); + foreach($response->facet_counts->facet_fields as $name => $facets) { + $facetDetails = ArrayData::create([ + 'Name' => $name, + 'Facets' => ArrayList::create([]), + ]); + + foreach($facets as $facetName => $facetCount) { + $facetDetails->Facets->push(ArrayData::create([ + 'Name' => $facetName, + 'Count' => $facetCount, + ])); + } + $facetCounts->push($facetDetails); + } + $results->setField('FacetCounts', $facetCounts); + } +} +``` + +And then apply the extension to your index via `yaml`: + +```yaml +My\Namespace\Index\MyIndex: + extensions: + - My\Namespace\Extension\FacetedResultsExtension +``` + +We can now access the facet information inside our templates like so: + +```silverstripe +<% if $Results.FacetCounts %> + <% loop $Results.FacetCounts.Facets %> + <% loop $Facets %> +

$Name: $Count

+ <% end_loop %> + <% end_loop %> +<% end_if %> +``` + ## Multiple indexes Multiple indexes can be created and searched independently, but if you wish to override an existing @@ -37,16 +116,119 @@ SilverStripe\FullTextSearch\Search\FullTextSearch: ## Synonyms -## Spell check +## Spell check ("Did you mean...") + +Solr has various spell checking strategies (see the ["SpellCheckComponent" docs](http://wiki.apache.org/solr/SpellCheckComponent)), all of which are configured through `solrconfig.xml`. +In the default config which is copied into your index, spell checking data is collected from all fulltext fields +(everything you added through `SolrIndex->addFulltextField()`). The values of these fields are collected in a special `_text` field. + +```php +use My\Namespace\Index\MyIndex; +use SilverStripe\FullTextSearch\Search\Queries\SearchQuery; + +$index = MyIndex::singleton(); +$query = SearchQuery::create() + ->addSearchTerm('My Term'); +$params = [ + 'spellcheck' => 'true', + 'spellcheck.collate' => 'true', +]; +$results = $index->search($query, -1, -1, $params); +$results->spellcheck; +``` + +The built-in `_text` data is better than nothing, but also has some problems: it's heavily processed, for example by +stemming filters which butcher words. So misspelling "Govnernance" will suggest "govern" rather than "Governance". +This can be fixed by aggregating spell checking data in a separate field. + +```php +use SilverStripe\CMS\Model\SiteTree; +use SilverStripe\FullTextSearch\Solr\SolrIndex; + +class MyIndex extends SolrIndex +{ + public function init() + { + $this->addCopyField(SiteTree::class . '_Title', 'spellcheckData'); + $this->addCopyField(SomeModel::class . '_Title', 'spellcheckData'); + $this->addCopyField(SiteTree::class . '_Content', 'spellcheckData'); + $this->addCopyField(SomeModel::class . '_Content', 'spellcheckData'); + } + + public function getFieldDefinitions() + { + $xml = parent::getFieldDefinitions(); + + $xml .= "\n\n\t\t"; + $xml .= "\n\t\t"; + + return $xml; + } +} +``` + +Now you need to tell Solr to use our new field for gathering spelling data. In order to customise the spell checking configuration, +create your own `solrconfig.xml` (see [File-based configuration](03_configuration.md#file-based-configuration)). In there, change the following directive: + +```xml + + spellcheckData + +``` + +Copy the new configuration via a the [`Solr_Configure` task](03_configuration.md#solr-configure), and reindex your data before using the spell checker. + +## Highlighting + +Solr can highlight the searched terms in context of the matched content, to help users determine the relevancy of results (e.g. in which part of a sentence the term is used). In order to use this feature, the full content of the field to be highlighted needs to be stored in the index, +by declaring it through `addStoredField()`: + +```php +use SilverStripe\FullTextSearch\Solr\SolrIndex; + +class MyIndex extends SolrIndex +{ + public function init() + { + $this->addClass(Page::class); + $this->addAllFulltextFields(); + $this->addStoredField('Content'); + } +} +``` + +To search with highlighting enabled, you need to pass in a custom query parameter. +There's a lot more parameters available for tweaking results detailed on the [Solr reference guide](https://archive.apache.org/dist/lucene/solr/ref-guide/apache-solr-ref-guide-4.10.pdf#page=270). + +```php +use My\Namespace\Index\MyIndex; +use SilverStripe\FullTextSearch\Search\Queries\SearchQuery; + +$index = MyIndex::singleton(); +$query = SearchQuery::create() + ->addSearchTerm('My Term'); +$params = [ + 'hl' => 'true', +]; +$results = $index->search($query, -1, -1, $params); +``` + +Each result will automatically contain an `Excerpt` property which you can use in your own results template. The searched term is highlighted with an `` tag by default. + +> Note: It is recommended to strip out all HTML tags and convert entities on the indexed content, +to avoid matching HTML attributes, and cluttering highlighted content with unparsed HTML. ## Boosting/Weighting - Results aren't all created equal. Matches in some fields are more important - than others; for example, a page `Title` might be considered more relevant to the user than terms in the `Content` field. + Results aren't all created equal. Matches in some fields are more important than others; for example, a page `Title` might be considered more relevant to the user than terms in the `Content` field. - To account for this, a "weighting" (or "boosting") factor can be applied to each searched field. The default value is `1.0`, anything below that will decrease the relevance, anything above increases it. + To account for this, a "weighting" (or "boosting") factor can be applied to each searched field. The default value is `1.0`, anything below that will decrease the relevance, anything above increases it. You can get more information on relevancy at the [Solr wiki](http://wiki.apache.org/solr/SolrRelevancyFAQ). - To adjust the relative values, pass them in as the third argument to your `addSearchTerm()` call: +You can manage the boosting in two ways: + +### Boosting on query + + To adjust the relative values at the time of querying, pass them in as the third argument to your `addSearchTerm()` call: ```php use My\Namespace\Index\MyIndex; @@ -63,13 +245,98 @@ SilverStripe\FullTextSearch\Search\FullTextSearch: Page::class . '_SecretParagraph' => 0.1, ] ); - $results = singleton(MyIndex::class)->search($query); + $results = MyIndex::singleton()->search($query); ``` This will ensure that `Title` is given higher priority for matches than `Content`, which is well above `SecretParagraph`. + +### Boosting on index + +Boost values for specific can also be specified directly on the `SolrIndex` class directly. + +The following methods can be used to set one or more boosted fields: + +* `addBoostedField()` - adds a field with a specific boosted value (defaults to 2) +* `setFieldBoosting()` - if a field has already been added to an index, the boosting + value can be customised, changed, or reset for a single field. +* `addFulltextField()` A boost can be set for a field using the `$extraOptions` parameter +with the key `boost` assigned to the desired value: + +```php +use SilverStripe\CMS\Model\SiteTree; +use SilverStripe\FullTextSearch\Solr\SolrIndex; + +class SolrSearchIndex extends SolrIndex +{ + public function init() + { + $this->addClass(SiteTree::class); + + // The following methods would all add the same boost of 1.5 to "Title" + $this->addBoostedField('Title', null, [], 1.5); + + $this->addFulltextField('Title', null, [ + 'boost' => 1.5, + ]); + + $this->addFulltextField('Title'); + $this->setFieldBoosting(SiteTree::class . '_Title', 1.5); + } +} +``` ## Indexing related objects ## Subsites -## Adding new fields +## Custom field types + +Solr supports custom field type definitions which are written to its XML schema. Many standard ones are already included + in the default schema. As the XML file is generated dynamically, we can add our own types by overloading the template + responsible for it: `types.ss`. + +In the following example, we read our type definitions from a new file `mysite/solr/templates/types.ss` instead: + +```php +use SilverStripe\Control\Director; +use SilverStripe\FullTextSearch\Solr\SolrIndex; + +class MyIndex extends SolrIndex +{ + public function getTypes() + { + return $this->renderWith(Director::baseFolder() . '/mysite/solr/templates/types.ss'); + } +} +``` + +It's usually best to start with the existing definitions, and adjust from there. You can both add your own types and adjust the behaviour of existing definitions. + +### Perform filtering on index + +An example of something you can achieve with this is to move synonym filtering from performed on query, to being performed on index. To do this, you'd take + +```xml + +``` + +from inside the `` block and move it to the `` block. This can be advantageous as Solr does a better job of processing synonyms at index; however, it does mean that it requires a full Reindex to make a change, which - depending on the size of your site - could be overkill. See [this article](https://nolanlawson.com/2012/10/31/better-synonym-handling-in-solr/) for a good breakdown. + +### Searching for words containing numbers + +By default, the module is configured to split words containing numbers into multiple tokens. For example, the word "A1" would be interpreted as "A" "1", and since "a" is a common stopword, the term "A1" will be excluded from search. + +To allow searches on words containing numeric tokens, you'll need to change the behaviour of the `WordDelimiterFilterFactory` with an overloaded template as described above. Each instance of `` needs to include the following attributes and values: + +- add `splitOnNumerics="0"` on all `WordDelimiterFilterFactory` fields +- change `catenateNumbers="1"` to `catenateNumbers="0"` on all `WordDelimiterFilterFactory` fields + +### Searching for macrons and other Unicode characters + +The `ASCIIFoldingFilterFactory` filter converts alphabetic, numeric, and symbolic Unicode characters which are not in the Basic Latin Unicode block (the first 127 ASCII characters) to their ASCII equivalents, if one exists. + +Find the fields in your overloaded `types.ss` that you want to enable this behaviour in, for example inside the `` block, add the following to both its index analyzer and query analyzer records. + +```xml + +``` diff --git a/docs/en/Solr.md b/docs/en/Solr.md index dd73f0d..da082aa 100644 --- a/docs/en/Solr.md +++ b/docs/en/Solr.md @@ -1,458 +1,3 @@ -# Solr connector for SilverStripe fulltextsearch module - - -All possible parameters incl optional ones with example values: - - - -## Configuration - -### Create an index - -```php -// File: mysite/code/MyIndex.php: -use SilverStripe\FullTextSearch\Solr\SolrIndex; - -class MyIndex extends SolrIndex -{ - public function init() - { - $this->addClass(Page::class); - $this->addAllFulltextFields(); - } -} -``` - -### Create the index schema - -The PHP-based index definition is an abstraction layer for the actual Solr XML configuration. -In order to create or update it, you need to run the `Solr_Configure` task. - -``` -vendor/bin/sake dev/tasks/Solr_Configure -``` - -Based on the sample configuration above, this command will do the following: - -- Create a `/.solr/MyIndex` folder -- Copy configuration files from `vendor/silverstripe/fulltextsearch/conf/extras/` to `/.solr/MyIndex/conf` -- Generate a `schema.xml`, and place it it in `/.solr/MyIndex/conf` - -If you call the task with an existing index folder, -it will overwrite all files from their default locations, -regenerate the `schema.xml`, and ask Solr to reload the configuration. - -You can use the same command for updating an existing schema, -which will automatically apply without requiring a Solr server restart. - -### Reindex - -After configuring Solr, you have the option to add your existing -content to its indices. Run the following command: - -``` -vendor/bin/sake dev/tasks/Solr_Reindex -``` - -This will delete and rebuild all indices. Depending on your data, -this can take anywhere from minutes to hours. -Keep in mind that the normal mode of updating indices is -based on ORM manipulations of the underlying data. -For example, calling `$myPage->write()` will automatically -update the index entry for this record (and all its variants). - -This task has the following options: - -- `verbose`: Debug information - -Internally, depending on what job processing backend you have configured (such as queuedjobs) -individual tasks for re-indexing groups of records may either be performed behind the scenes -as crontasks, or via separate processes initiated by the current request. - -Internally groups of records are grouped into sizes of 200. You can configure this -group sizing by using the `Solr_Reindex.recordsPerRequest` config. - -```yaml -SilverStripe\FullTextSearch\Solr\Tasks\Solr_Reindex: - recordsPerRequest: 150 -``` - -Note: The Solr indexes will be stored as binary files inside your SilverStripe project. -You can also copy the `thirdparty/` solr directory somewhere else, -just set the `path` value in `mysite/_config.php` to point to the new location. - -You can also run the reindex task through a web request. -By default, the web request won't receive any feedback while its running. -Depending on your PHP and web server configuration, -the web request itself might time out, but the reindex continues anyway. -This is possible because the actual index operations are run as separate -PHP sub-processes inside the main web request. - -### File-based configuration (solrconfig.xml etc) - -Many aspects of Solr are configured outside of the `schema.xml` file -which SilverStripe generates based on the index PHP file. -For example, stopwords are placed in their own `stopwords.txt` file, -and spell checks are configured in `solrconfig.xml`. - -By default, these files are copied from the `fulltextsearch/conf/extras/` -directory over to the new index location. In order to use your own files, -copy these files into a location of your choosing (for example `mysite/data/solr/`), -and tell Solr to use this folder with the `extraspath` configuration setting. - -```php -// mysite/_config.php -use SilverStripe\Control\Director; -use SilverStripe\FullTextSearch\Solr\Solr; - -Solr::configure_server([ - // ... - 'extraspath' => Director::baseFolder() . '/mysite/data/solr/', -]); -``` - -Please run the `Solr_Configure` task for the changes to take effect. - -Note: You can also define those on an index-by-index basis by -implementing `SolrIndex->getExtrasPath()`. - -### Custom Types - -Solr supports custom field type definitions which are written to its XML schema. -Many standard ones are already included in the default schema. -As the XML file is generated dynamically, we can add our own types -by overloading the template responsible for it: `types.ss`. - -In the following example, we read out type definitions -from a new file `mysite/solr/templates/types.ss` instead: - -```php -use SilverStripe\Control\Director; -use SilverStripe\FullTextSearch\Solr\SolrIndex; - -class MyIndex extends SolrIndex -{ - public function getTypes() - { - return $this->renderWith(Director::baseFolder() . '/mysite/solr/templates/types.ss'); - } -} -``` - -#### Searching for words containing numbers - -By default, the fulltextmodule is configured to split words containing numbers into multiple tokens. For example, the word "A1" would be interpreted as "A" "1"; since "a" is a common stopword, the term "A1" will be excluded from search. - -To allow searches on words containing numeric tokens, you'll need to update your overloaded template to change the behaviour of the WordDelimiterFilterFactory. Each instance of `` needs to include the following attributes and values: - -* add splitOnNumerics="0" on all WordDelimiterFilterFactory fields -* change catenateOnNumbers="1" on all WordDelimiterFilterFactory fields - -Update your index to point to your overloaded template using the method described above. - -#### Searching for macrons and other Unicode characters - -The "ASCIIFoldingFilterFactory" filter converts alphabetic, numeric, and symbolic Unicode characters which are not in the Basic Latin Unicode block (the first 127 ASCII characters) to their ASCII equivalents, if one exists. - -Find the fields in your overloaded `types.ss` that you want to enable this behaviour in. EG: - -```xml - -``` - -Add the following to both its index analyzer and query analyzer records. - -```xml - -``` - -Update your index to point to your overloaded template using the method described above. - -### Spell Checking ("Did you mean...") - -Solr has various spell checking strategies (see the ["SpellCheckComponent" docs](http://wiki.apache.org/solr/SpellCheckComponent)), all of which are configured through `solrconfig.xml`. -In the default config which is copied into your index, -spell checking data is collected from all fulltext fields -(everything you added through `SolrIndex->addFulltextField()`). -The values of these fields are collected in a special `_text` field. - -```php -use SilverStripe\FullTextSearch\Search\Queries; - -$index = new MyIndex(); -$query = new SearchQuery(); -$query->addSearchTerm('My Term'); -$params = [ - 'spellcheck' => 'true', - 'spellcheck.collate' => 'true', -]; -$results = $index->search($query, -1, -1, $params); -$results->spellcheck; -``` - -The built-in `_text` data is better than nothing, but also has some problems: -Its heavily processed, for example by stemming filters which butcher words. -So misspelling "Govnernance" will suggest "govern" rather than "Governance". -This can be fixed by aggregating spell checking data in a separate - -```php -use SilverStripe\CMS\Model\SiteTree; -use SilverStripe\FullTextSearch\Solr\SolrIndex; - -class MyIndex extends SolrIndex -{ - public function init() - { - // ... - $this->addCopyField(SiteTree::class . '_Title', 'spellcheckData'); - $this->addCopyField(SomeModel::class . '_Title', 'spellcheckData'); - $this->addCopyField(SiteTree::class . '_Content', 'spellcheckData'); - $this->addCopyField(SomeModel::class . '_Content', 'spellcheckData'); - } - - // ... - public function getFieldDefinitions() - { - $xml = parent::getFieldDefinitions(); - - $xml .= "\n\n\t\t"; - $xml .= "\n\t\t"; - - return $xml; - } -} -``` - -Now you need to tell solr to use our new field for gathering spelling data. -In order to customize the spell checking configuration, -create your own `solrconfig.xml` (see "File-based configuration"). -In there, change the following directive: - -```xml - - - - spellcheckData - -``` - -Don't forget to copy the new configuration via a call to the `Solr_Configure` -task, and reindex your data before using the spell checker. - -### Limiting search fields - -Solr has a way of specifying which fields to search on. You specify these -fields as a parameter to `SearchQuery`. - -In the following example, we're telling Solr to *only* search the -`Title` and `Content` fields. Note that the fields must be specified in -the search parameters as "composite fields", which means they should be -specified in the form of `{table}_{field}`. - -These fields are defined in the schema.xml file that gets sent to Solr. - -```php -use SilverStripe\CMS\Model\SiteTree; -use SilverStripe\FullTextSearch\Search\Queries\SearchQuery; - -$query = new SearchQuery(); -$query->addClassFilter(Page::class); -$query->addSearchTerm('someterms', [SiteTree::class . '_Title', SiteTree::class . '_Content']); -$result = singleton(SolrSearchIndex::class)->search($query, -1, -1); - -// the request to Solr would be: -// q=(SiteTree_Title:Lorem+OR+SiteTree_Content:Lorem) -``` - -### Configuring boosts - -There are several ways in which you can configure boosting on search fields or terms. - -#### Boosting on search query - -Solr has a way of specifying which fields should be boosted as a parameter to `SearchQuery`. - -This means if you boost a certain field, search query matches on that field will be considered -higher relevance than other fields with matches, and therefore those results will be closer -to the top of the results. - -In this example, we enter "Lorem" as the search term, and boost the `Content` field: - -```php -use SilverStripe\CMS\Model\SiteTree; -use SilverStripe\FullTextSearch\Search\Queries\SearchQuery; - -$query = new SearchQuery(); -$query->addClassFilter(Page::class); -$query->addSearchTerm('Lorem', null, [SiteTree::class . '_Content' => 2]); -$result = singleton(SolrSearchIndex::class)->search($query, -1, -1); - -// the request to Solr would be: -// q=SiteTree_Content:Lorem^2 -``` - -More information on [relevancy on the Solr wiki](http://wiki.apache.org/solr/SolrRelevancyFAQ). - -### Boosting on index fields - -Boost values for specific can also be specified directly on the `SolrIndex` class directly. - -The following methods can be used to set one or more boosted fields: - -* `SolrIndex::addBoostedField` Adds a field with a specific boosted value (defaults to 2) -* `SolrIndex::setFieldBoosting` If a field has already been added to an index, the boosting - value can be customised, changed, or reset for a single field. -* `SolrIndex::addFulltextField` A boost can be set for a field using the `$extraOptions` parameter -with the key `boost` assigned to the desired value. - -For example: - -```php -use SilverStripe\CMS\Model\SiteTree; -use SilverStripe\FullTextSearch\Solr\SolrIndex; - -class SolrSearchIndex extends SolrIndex -{ - public function init() - { - $this->addClass(SiteTree::class); - $this->addAllFulltextFields(); - $this->addFilterField('ShowInSearch'); - $this->addBoostedField('Title', null, [], 1.5); - $this->setFieldBoosting(SiteTree::class . '_SearchBoost', 2); - } - -} -``` - -### Custom Types - -Solr supports custom field type definitions which are written to its XML schema. -Many standard ones are already included in the default schema. -As the XML file is generated dynamically, we can add our own types -by overloading the template responsible for it: `types.ss`. - -In the following example, we read out type definitions -from a new file `mysite/solr/templates/types.ss` instead: - -```php -use SilverStripe\Control\Director; -use SilverStripe\FullTextSearch\Solr\SolrIndex; - -class MyIndex extends SolrIndex -{ - public function getTemplatesPath() - { - return Director::baseFolder() . '/mysite/solr/templates/'; - } -} -``` - -### Highlighting - -Solr can highlight the searched terms in context of the matched content, -to help users determine the relevancy of results (e.g. in which part of a sentence -the term is used). In order to use this feature, the full content of the -field to be highlighted needs to be stored in the index, -by declaring it through `addStoredField()`. - -```php -use SilverStripe\FullTextSearch\Solr\SolrIndex; - -class MyIndex extends SolrIndex -{ - public function init() - { - $this->addClass(Page::class); - $this->addAllFulltextFields(); - $this->addStoredField('Content'); - } -} -``` - -To search with highlighting enabled, you need to pass in a custom query parameter. -There's a lot more parameters to tweak results on the [Solr Wiki](http://wiki.apache.org/solr/HighlightingParameters). - -```php -use SilverStripe\FullTextSearch\Search\Queries\SearchQuery; - -$index = new MyIndex(); -$query = new SearchQuery(); -$query->addSearchTerm('My Term'); -$results = $index->search($query, -1, -1, ['hl' => 'true']); -``` - -Each result will automatically contain an "Excerpt" property -which you can use in your own results template. -The searched term is highlighted with an `` tag by default. - -Note: It is recommended to strip out all HTML tags and convert entities on the indexed content, -to avoid matching HTML attributes, and cluttering highlighted content with unparsed HTML. - -### Adding additional information into search results - -Inside the SolrIndex::search() function, the third-party library solr-php-client -is used to send data to Solr and parse the response. Additional information can -be pulled from this response and added to your results object for use in templates -using the `updateSearchResults()` extension hook. - -```php -use SilverStripe\FullTextSearch\Search\Queries\SearchQuery; - -$index = new MyIndex(); -$query = new SearchQuery(); -$query->addSearchTerm('My Term'); -$results = $index->search($query, -1, -1, [ - 'facet' => 'true', - 'facet.field' => 'SiteTree_ClassName', -]); -``` - -By adding facet fields into the query parameters, our response object from Solr -now contains some additional information that we can add into the results sent -to the page. - -```php -use SilverStripe\Core\Extension; -use SilverStripe\View\ArrayData; -use SilverStripe\ORM\ArrayList; - -class MyResultsExtension extends Extension -{ - /** - * Adds extra information from the solr-php-client repsonse - * into our search results. - * @param ArrayData $results The ArrayData that will be used to generate search - * results pages. - * @param stdClass $response The solr-php-client response object. - */ - public function updateSearchResults($results, $response) - { - if (!isset($response->facet_counts) || !isset($response->facet_counts->facet_fields)) { - return; - } - $facetCounts = ArrayList::create(array()); - foreach($response->facet_counts->facet_fields as $name => $facets) { - $facetDetails = ArrayData::create([ - 'Name' => $name, - 'Facets' => ArrayList::create([]), - ]); - - foreach($facets as $facetName => $facetCount) { - $facetDetails->Facets->push(ArrayData::create([ - 'Name' => $facetName, - 'Count' => $facetCount, - ])); - } - $facetCounts->push($facetDetails); - } - $results->setField('FacetCounts', $facetCounts); - } -} -``` - -We can now access the facet information inside our templates. - ### Adding Analyzers, Tokenizers and Token Filters When a document is indexed, its individual fields are subject to the analyzing and tokenizing filters that can transform and normalize the data in the fields. For example — removing blank spaces, removing html code, stemming, removing a particular character and replacing it with another