solr admin url

fix dead links and update quickstart

solr admin expansion

spellcheck docs

custom types

custom field types

highlighting

facets
This commit is contained in:
Andrew Aitken-Fincham 2018-06-04 16:55:47 +01:00 committed by Daniel Hensby
parent fa6a412d72
commit e08731d1f1
No known key found for this signature in database
GPG Key ID: D8DEBC4C8E7BC8B9
8 changed files with 364 additions and 499 deletions

View File

@ -4,6 +4,3 @@ Name: fulltextsearchconfig
SilverStripe\ORM\DataObject:
extensions:
- SilverStripe\FullTextSearch\Search\Extensions\SearchUpdater_ObjectHandler
SilverStripe\CMS\Controllers\ContentController:
extensions:
- SilverStripe\FullTextSearch\Solr\Control\ContentControllerExtension

View File

@ -46,17 +46,16 @@ else
exit 1
fi
# Check to see if it has been enabled in _config.php
grep -i "FulltextSearchable::enable(" "$APPDIR/_config.php" 2> /dev/null
# Check to see if it has been configured in _config.php
grep -i "Solr::configure_server(" "$APPDIR/_config.php" 2> /dev/null
if [ "$?" != 0 ]; then
echo "Enabling FulltextSearchable in _config.php..."
echo "Configuring Solr in _config.php..."
if [ ! -f "$APPDIR/_config.php" ]; then
echo "<?php" > "$APPDIR/_config.php"
echo "" >> "$APPDIR/_config.php"
fi
echo "" >> "$APPDIR/_config.php"
echo "# Enable Fulltextsearch" >> "$APPDIR/_config.php"
echo "\\SilverStripe\\ORM\\Search\\FulltextSearchable::enable();" >> "$APPDIR/_config.php" >> "$APPDIR/_config.php"
echo "\\SilverStripe\\FullTextSearch\\Solr\\Solr::configure_server([" >> "$APPDIR/_config.php"
echo " 'indexstore' => [" >> "$APPDIR/_config.php"
echo " 'mode' => 'file'," >> "$APPDIR/_config.php"

View File

@ -6,7 +6,6 @@
- Setup
- [Requirements](02_setup.md#requirements)
- [Installing Solr](02_setup.md#installing-solr)
- [Installing this module](02_setup.md#installing-the-module)
- [Solr admin](02_setup.md#solr-admin)
- Configuration
- [Solr server parameters](03_configuration.md#solr-server-parameters)
@ -20,10 +19,11 @@
- [Facets](04_advanced_configuration.md#facets)
- [Using multiple indexes](04_advanced_configuration.md#multiple-indexes)
- [Synonyms](04_advanced_configuration.md#synonyms)
- [Spellcheck](04_advanced_configuration.md#spell-check)
- [Spellcheck](04_advanced_configuration.md#spell-check-("did-you-mean..."))
- [Highlighting](04_advanced_configuration.md#highlighting)
- [Boosting](04_advanced_configuration.md#boosting)
- [Indexing related objects](04_advanced_configuration.md#indexing-related-objects)
- [Subsites](04_advanced_configuration.md#subsites)
- [Adding new fields](04_advanced_configuration.md#adding-new-fields)
- [Custom field types](04_advanced_configuration.md#custom-field-types)
- Troubleshooting
- [Gotchas](05_troubleshooting.md#common-gotchas)

View File

@ -22,17 +22,17 @@ fulltext searching as an extension of the object model. However, the disconnect
design and the object model meant that searching was inefficient. The abstraction would also often break and it was
hard to then figure out what was going on.
This module instead provides the ability to define those indexes and queries in PHP. The indexes are defined as a mapping
between the SilverStripe object model and the connector-specific fulltext engine index model. This module then interrogates model metadata
to build the specific index definition.
This module instead provides the ability to define those indexes and queries in PHP. The indexes are defined as a
mapping between the SilverStripe object model and the connector-specific fulltext engine index model. This module then
interrogates model metadata to build the specific index definition.
It also hooks into SilverStripe framework in order to update the indexes when the models change and connectors then convert those index and query definitions
into fulltext engine specific code.
It also hooks into SilverStripe framework in order to update the indexes when the models change and connectors then
convert those index and query definitions into fulltext engine specific code.
The intent of this module is not to make changing fulltext search engines seamless. Where possible this module provides
common interfaces to fulltext engine functionality, abstracting out common behaviour. However, each connector also
offers its own extensions, and there is some behaviour (such as getting the fulltext search engines installed, configured
and running) that each connector deals with itself, in a way best suited to that search engine's design.
offers its own extensions, and there is some behaviour (such as getting the fulltext search engines installed,
configured and running) that each connector deals with itself, in a way best suited to that search engine's design.
## Quick start
@ -48,6 +48,10 @@ This will:
- Install Solr 4
- Set up a daemon to run Solr on startup
- Start Solr
- Enable `FulltextSearchable` in your `_config.php` (and create one if you don't have one)
- Configure Solr in your `_config.php` (and create one if you don't have one)
- Create a DefaultIndex
- Run a [Solr Configure](03_configuration.md#solr-configure) and a [Solr Reindex](03_configuration.md#solr-reindex)
The simply adding `$SearchForm` to a template and flushing the template cache should add a search text box to your site.
You'll then need to build a search form and results display that suits the functionality of your site.
// TODO update me when https://github.com/silverstripe/silverstripe-fulltextsearch/pull/216 is merged

View File

@ -1,28 +1,34 @@
# Setup
The fulltextsearch module includes support for connecting to Solr.
The FulltextSearch module includes support for connecting to Solr.
It works with Solr in multi-core mode. It needs to be able to update Solr configuration files, and has modes for doing this by direct file access (when Solr shares a server with SilverStripe) and by WebDAV (when it's on a different server).
It works with Solr in multi-core mode. It needs to be able to update Solr configuration files, and has modes for doing
so by direct file access (when Solr shares a server with SilverStripe) and by WebDAV (when it's on a different server).
See the helpful [Solr Tutorial](http://lucene.apache.org/solr/4_5_1/tutorial.html), for more on cores
and querying.
See the helpful [Solr Tutorial](http://lucene.apache.org/solr/4_5_1/tutorial.html), for more on cores and querying.
## Requirements
Since Solr is Java based, it requires Java 1.5 or greater installed.
When you're installing it yourself, it also requires a servlet container such as Tomcat, Jetty, or Resin. For
development testing there is a standalone version that comes bundled with Jetty (see [Installing Solr](#installing-solr) below).
development testing there is a standalone version that comes bundled with Jetty (see [Installing Solr](#installing-solr)
below).
See the official [Solr installation docs](http://wiki.apache.org/solr/SolrInstall) for more information.
Note that these requirements are for the Solr server environment, which doesn't have to be the same physical machine as the SilverStripe webhost.
Note that these requirements are for the Solr server environment, which doesn't have to be the same physical machine as
the SilverStripe webhost.
## Installing Solr
### Local installation
If you'll be running Solr on the same machine as your SilverStripe installation, you can use the [silverstripe/fulltextsearch-localsolr module](https://github.com/silverstripe-archive/silverstripe-fulltextsearch-localsolr). This can also be useful as a development dependency. You can bring it in via composer (use `require-dev` if you plan to use install Solr remotely in Production):
If you'll be running Solr on the same machine as your SilverStripe installation, and the
[quick start script](01_getting_started.md#quick-start) doesn't suit your needs, you can use the
[fulltextsearch-localsolr module](https://github.com/silverstripe-archive/silverstripe-fulltextsearch-localsolr). This
can also be useful as a development dependency. You can bring it in via composer (use `require-dev` if you plan to
install Solr remotely in Production):
```bash
composer require silverstripe/fulltextsearch-localsolr
@ -35,7 +41,8 @@ cd fulltextsearch-localsolr/server
java -jar start.jar
```
Then configure Solr to use `file` more with the following configuration in your `app/_config.php`, making sure that the `path` directory is writeable by the user that started the server (above):
Then configure the module to use `file` mode with the following configuration in your `app/_config.php`, making sure
that the `path` directory is writeable by the user that started the server (above):
```php
use SilverStripe\FullTextSearch\Solr\Solr;
@ -51,10 +58,35 @@ Solr::configure_server([
### Remote installation
Alternatively, it can be beneficial to keep the Solr service contained on its own infrastructure, for performance and
security reasons. The [Common Web Platform (CWP)](www.cwp.govt.nz) uses Solr in this manner. To do so, you should
install the dependencies on the remote server, and then configure the module to use the `webdav` mode like so:
```php
use SilverStripe\FullTextSearch\Solr\Solr;
## Installing the module
Solr::configure_server([
'host' => 'remotesolrserver.com', // IP address or hostname
'indexstore' => [
'mode' => 'webdav',
'path' => BASE_PATH . '/webdav',
]
]);
```
Check all the available [configuration options](03_configuration.md#solr-server-parameters) to fine-tune the module to
work with your desired setup.
This will mean that all configuration files, and the indexes themselves, are stored remotely.
## Solr admin
Solr provides an administration interface with a GUI to allow you to get at the finer details of your cores and
configuration. You can access it at example.com:<SOLR_PORT>/<SOLR_PATH>/#/ on a local installation
(usually example.com:8983/solr/#/).
There you can access logging, run raw queries against your stored indexes, and get some basic performance metrics.
Additionally, you can perform more drastic changes, such as dropping and reloading cores.
For a comprehensive look at the Solr admin interface, read the
[user guide for Solr 4.10](http://archive.apache.org/dist/lucene/solr/ref-guide/apache-solr-ref-guide-4.10.pdf#page=17)

View File

@ -79,15 +79,15 @@ class MyIndex extends SolrIndex
You can also skip listing all searchable fields, and have the index figure it out automatically via `addAllFulltextFields()`. This will add any database fields that are `instanceof DBString` to the index. Use this with caution, however, as you may inadvertently return sensitive information - it is often safer to declare your fields explicitly.
Once you've added this file, make sure you run a [Solr configure](#dev_tasks) to set up your new index.
Once you've added this file, make sure you run a [Solr configure](#solr-configure) to set up your new index.
## Adding data to an index
Once you have [created your index](./30_creating_an_index.md), you can add data to it in a number of ways.
Once you have [created your index](#creating-an-index), you can add data to it in a number of ways.
### Reindex the site
Running the [Solr reindex task](./33_dev_tasks.md) will crawl your site for classes that match those defined on your index, and add the defined fields to the index for searching. This is the most common method used to build the index the first time, or to perform a full rebuild of the index.
Running the [Solr reindex task](#solr-reindex) will crawl your site for classes that match those defined on your index, and add the defined fields to the index for searching. This is the most common method used to build the index the first time, or to perform a full rebuild of the index.
### Publish a page in the CMS
@ -177,7 +177,7 @@ $query = SearchQuery::create()
->addSearchTerm('fire');
```
You can also limit this to specific fields by passing an array as the second argument:
You can also limit this to specific fields by passing an array as the second argument, specified in the form of `{table}_{field}`:
```php
use SilverStripe\FullTextSearch\Search\Queries\SearchQuery;
@ -226,7 +226,7 @@ $query = SearchQuery::create()
->addSearchTerm('fire')
// Only include documents edited in 2011 or earlier
->addFilter(Page::class . '_LastEdited', SearchQuery_Range::create(null, '2011-12-31T23:59:59Z'));
$results = singleton(MyIndex::class)->search($query);
$results = MyIndex::singleton()->search($query);
```
Note: At the moment, the date format is specific to the search implementation.
@ -247,7 +247,7 @@ $query = SearchQuery::create()
->addSearchTerm('fire');
// Needs a value, although it can be false
->addFilter(Page::class . '_ShowInMenus', SearchQuery::$present);
$results = singleton(MyIndex::class)->search($query);
$results = MyIndex::singleton()->search($query);
```
### Querying an index
@ -259,7 +259,7 @@ use SilverStripe\FullTextSearch\Search\Queries\SearchQuery;
use My\Namespace\Index\MyIndex;
$query = SearchQuery::create()->addSearchTerm('fire');
$results = singleton(MyIndex::class)->search($query);
$results = MyIndex::singleton()->search($query);
```
The return value of a `search()` call is an object which contains a few properties:
@ -279,7 +279,13 @@ It is often a good idea to run a configure, followed by a reindex, after a code
`dev/tasks/Solr_Configure`
This task will upload configuration to the Solr core, reloading it or creating it as necessary. This should be run after every code change to your indexes, or configuration changes.
This task will upload configuration to the Solr core, reloading it or creating it as necessary, and generate the schema. This should be run after every code change to your indexes, or after any configuration changes. This will convert the PHP-based abstraction layer into actual Solr XML. Assuming default configuration and the use of the `DefaultIndex`, it will:
- create the directory `BASE_PATH/.solr/DefaultIndex/` if it doesn't already exist
- copy configuration files from `vendor/silverstripe/fulltextsearch/conf/extras` to `BASE_PATH/.solr/DefaultIndex/conf/`
- generate a `schema.xml` in `BASE_PATH/.solr/DefaultIndex/conf/`
This task will overwrite these files every time it is run.
### Solr reindex
@ -289,8 +295,23 @@ This task performs a reindex, which adds all the data specified in the index def
If you have the [Queued Jobs module](https://github.com/symbiote/silverstripe-queuedjobs/) installed, then this task will create multiple reindex jobs that are processed asynchronously; unless you are in `dev` mode, in which case the index will be processed immediately (see [processor.yml](/_config/processor.yml)). Otherwise, it will run in one process. Often, if you are running it via the web, the request will time out. Usually this means the actually process is still running in the background, but it can be alarming to the user, so bear that in mind.
Internally groups of records are grouped into sizes of 200. You can configure this group sizing by using the `Solr_Reindex.recordsPerRequest` config:
```yaml
SilverStripe\FullTextSearch\Solr\Tasks\Solr_Reindex:
recordsPerRequest: 150
```
The Solr indexes will be stored as binary files inside your SilverStripe project. You can also copy the `thirdparty/` Solr directory somewhere else, just set the `path` value in `mysite/_config.php` to point to the new location.
## File-based configuration
Many aspects of Solr are configured outside of the `schema.xml` file which SilverStripe generates based on the `SolrIndex` subclass that is defined. For example, stopwords are placed in their own `stopwords.txt` file, and advanced [spellchecking](04_advanced_configuration.md#spell-check-("did-you-mean...")) can be configured in `solrconfig.xml`.
By default, these files are copied from the `fulltextsearch/conf/extras/` directory over to the new index location. In order to use your own files, copy these files into a location of your choosing (for example `mysite/data/solr/`), and tell Solr to use this folder with the `extraspath` [configuration setting](#solr-server-parameters). Run a [`Solr_Configure](#solr-configure) to apply these changes.
You can also define these on an index-by-index basis by defining `SolrIndex->getExtrasPath()`.
## Handling results
In order to render search results, you need to return them from a controller. You can also drive this through a form response through standard SilverStripe forms. In this case we simply assume there's a GET parameter named `q` with a search term present.
@ -311,7 +332,7 @@ class PageController extends ContentController
{
$query = SearchQuery::create()->addSearchTerm($request->getVar('q'));
return $this->renderWith([
'SearchResult' => singleton(MyIndex::class)->search($query)
'SearchResult' => MyIndex::singleton()->search($query)
]);
}
}

View File

@ -2,6 +2,85 @@
## Facets
Inside the `SolrIndex->search()` function, the third-party library solr-php-client is used to send data to Solr and parse the response. Additional information can be pulled from this response and added to your results object for use in templates using the `updateSearchResults()` extension hook.
```php
use My\Namespace\Index\MyIndex;
use SilverStripe\FullTextSearch\Search\Queries\SearchQuery;
$index = MyIndex::singleton();
$query = SearchQuery::create()
->addSearchTerm('My Term');
$params = [
'facet' => 'true',
'facet.field' => 'SiteTree_ClassName',
];
$results = $index->search($query, -1, -1, $params);
```
By adding facet fields into the query parameters, our response object from Solr now contains some additional information that we can add into the results sent to the page.
```php
namespace My\Namespace\Extension;
use SilverStripe\Core\Extension;
use SilverStripe\View\ArrayData;
use SilverStripe\ORM\ArrayList;
class FacetedResultsExtension extends Extension
{
/**
* Adds extra information from the solr-php-client repsonse
* into our search results.
* @param ArrayData $results The ArrayData that will be used to generate search
* results pages.
* @param stdClass $response The solr-php-client response object.
*/
public function updateSearchResults($results, $response)
{
if (!isset($response->facet_counts) || !isset($response->facet_counts->facet_fields)) {
return;
}
$facetCounts = ArrayList::create([]);
foreach($response->facet_counts->facet_fields as $name => $facets) {
$facetDetails = ArrayData::create([
'Name' => $name,
'Facets' => ArrayList::create([]),
]);
foreach($facets as $facetName => $facetCount) {
$facetDetails->Facets->push(ArrayData::create([
'Name' => $facetName,
'Count' => $facetCount,
]));
}
$facetCounts->push($facetDetails);
}
$results->setField('FacetCounts', $facetCounts);
}
}
```
And then apply the extension to your index via `yaml`:
```yaml
My\Namespace\Index\MyIndex:
extensions:
- My\Namespace\Extension\FacetedResultsExtension
```
We can now access the facet information inside our templates like so:
```silverstripe
<% if $Results.FacetCounts %>
<% loop $Results.FacetCounts.Facets %>
<% loop $Facets %>
<p>$Name: $Count</p>
<% end_loop %>
<% end_loop %>
<% end_if %>
```
## Multiple indexes
Multiple indexes can be created and searched independently, but if you wish to override an existing
@ -37,16 +116,119 @@ SilverStripe\FullTextSearch\Search\FullTextSearch:
## Synonyms
## Spell check
## Spell check ("Did you mean...")
Solr has various spell checking strategies (see the ["SpellCheckComponent" docs](http://wiki.apache.org/solr/SpellCheckComponent)), all of which are configured through `solrconfig.xml`.
In the default config which is copied into your index, spell checking data is collected from all fulltext fields
(everything you added through `SolrIndex->addFulltextField()`). The values of these fields are collected in a special `_text` field.
```php
use My\Namespace\Index\MyIndex;
use SilverStripe\FullTextSearch\Search\Queries\SearchQuery;
$index = MyIndex::singleton();
$query = SearchQuery::create()
->addSearchTerm('My Term');
$params = [
'spellcheck' => 'true',
'spellcheck.collate' => 'true',
];
$results = $index->search($query, -1, -1, $params);
$results->spellcheck;
```
The built-in `_text` data is better than nothing, but also has some problems: it's heavily processed, for example by
stemming filters which butcher words. So misspelling "Govnernance" will suggest "govern" rather than "Governance".
This can be fixed by aggregating spell checking data in a separate field.
```php
use SilverStripe\CMS\Model\SiteTree;
use SilverStripe\FullTextSearch\Solr\SolrIndex;
class MyIndex extends SolrIndex
{
public function init()
{
$this->addCopyField(SiteTree::class . '_Title', 'spellcheckData');
$this->addCopyField(SomeModel::class . '_Title', 'spellcheckData');
$this->addCopyField(SiteTree::class . '_Content', 'spellcheckData');
$this->addCopyField(SomeModel::class . '_Content', 'spellcheckData');
}
public function getFieldDefinitions()
{
$xml = parent::getFieldDefinitions();
$xml .= "\n\n\t\t<!-- Additional custom fields for spell checking -->";
$xml .= "\n\t\t<field name='spellcheckData' type='textSpellHtml' indexed='true' stored='false' multiValued='true' />";
return $xml;
}
}
```
Now you need to tell Solr to use our new field for gathering spelling data. In order to customise the spell checking configuration,
create your own `solrconfig.xml` (see [File-based configuration](03_configuration.md#file-based-configuration)). In there, change the following directive:
```xml
<searchComponent name="spellcheck" class="solr.SpellCheckComponent">
<str name="field">spellcheckData</str>
</searchComponent>
```
Copy the new configuration via a the [`Solr_Configure` task](03_configuration.md#solr-configure), and reindex your data before using the spell checker.
## Highlighting
Solr can highlight the searched terms in context of the matched content, to help users determine the relevancy of results (e.g. in which part of a sentence the term is used). In order to use this feature, the full content of the field to be highlighted needs to be stored in the index,
by declaring it through `addStoredField()`:
```php
use SilverStripe\FullTextSearch\Solr\SolrIndex;
class MyIndex extends SolrIndex
{
public function init()
{
$this->addClass(Page::class);
$this->addAllFulltextFields();
$this->addStoredField('Content');
}
}
```
To search with highlighting enabled, you need to pass in a custom query parameter.
There's a lot more parameters available for tweaking results detailed on the [Solr reference guide](https://archive.apache.org/dist/lucene/solr/ref-guide/apache-solr-ref-guide-4.10.pdf#page=270).
```php
use My\Namespace\Index\MyIndex;
use SilverStripe\FullTextSearch\Search\Queries\SearchQuery;
$index = MyIndex::singleton();
$query = SearchQuery::create()
->addSearchTerm('My Term');
$params = [
'hl' => 'true',
];
$results = $index->search($query, -1, -1, $params);
```
Each result will automatically contain an `Excerpt` property which you can use in your own results template. The searched term is highlighted with an `<em>` tag by default.
> Note: It is recommended to strip out all HTML tags and convert entities on the indexed content,
to avoid matching HTML attributes, and cluttering highlighted content with unparsed HTML.
## Boosting/Weighting
Results aren't all created equal. Matches in some fields are more important
than others; for example, a page `Title` might be considered more relevant to the user than terms in the `Content` field.
Results aren't all created equal. Matches in some fields are more important than others; for example, a page `Title` might be considered more relevant to the user than terms in the `Content` field.
To account for this, a "weighting" (or "boosting") factor can be applied to each searched field. The default value is `1.0`, anything below that will decrease the relevance, anything above increases it.
To account for this, a "weighting" (or "boosting") factor can be applied to each searched field. The default value is `1.0`, anything below that will decrease the relevance, anything above increases it. You can get more information on relevancy at the [Solr wiki](http://wiki.apache.org/solr/SolrRelevancyFAQ).
To adjust the relative values, pass them in as the third argument to your `addSearchTerm()` call:
You can manage the boosting in two ways:
### Boosting on query
To adjust the relative values at the time of querying, pass them in as the third argument to your `addSearchTerm()` call:
```php
use My\Namespace\Index\MyIndex;
@ -63,13 +245,98 @@ SilverStripe\FullTextSearch\Search\FullTextSearch:
Page::class . '_SecretParagraph' => 0.1,
]
);
$results = singleton(MyIndex::class)->search($query);
$results = MyIndex::singleton()->search($query);
```
This will ensure that `Title` is given higher priority for matches than `Content`, which is well above `SecretParagraph`.
### Boosting on index
Boost values for specific can also be specified directly on the `SolrIndex` class directly.
The following methods can be used to set one or more boosted fields:
* `addBoostedField()` - adds a field with a specific boosted value (defaults to 2)
* `setFieldBoosting()` - if a field has already been added to an index, the boosting
value can be customised, changed, or reset for a single field.
* `addFulltextField()` A boost can be set for a field using the `$extraOptions` parameter
with the key `boost` assigned to the desired value:
```php
use SilverStripe\CMS\Model\SiteTree;
use SilverStripe\FullTextSearch\Solr\SolrIndex;
class SolrSearchIndex extends SolrIndex
{
public function init()
{
$this->addClass(SiteTree::class);
// The following methods would all add the same boost of 1.5 to "Title"
$this->addBoostedField('Title', null, [], 1.5);
$this->addFulltextField('Title', null, [
'boost' => 1.5,
]);
$this->addFulltextField('Title');
$this->setFieldBoosting(SiteTree::class . '_Title', 1.5);
}
}
```
## Indexing related objects
## Subsites
## Adding new fields
## Custom field types
Solr supports custom field type definitions which are written to its XML schema. Many standard ones are already included
in the default schema. As the XML file is generated dynamically, we can add our own types by overloading the template
responsible for it: `types.ss`.
In the following example, we read our type definitions from a new file `mysite/solr/templates/types.ss` instead:
```php
use SilverStripe\Control\Director;
use SilverStripe\FullTextSearch\Solr\SolrIndex;
class MyIndex extends SolrIndex
{
public function getTypes()
{
return $this->renderWith(Director::baseFolder() . '/mysite/solr/templates/types.ss');
}
}
```
It's usually best to start with the existing definitions, and adjust from there. You can both add your own types and adjust the behaviour of existing definitions.
### Perform filtering on index
An example of something you can achieve with this is to move synonym filtering from performed on query, to being performed on index. To do this, you'd take
```xml
<filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="true"/>
```
from inside the `<analyzer type="query">` block and move it to the `<analyzer type="index">` block. This can be advantageous as Solr does a better job of processing synonyms at index; however, it does mean that it requires a full Reindex to make a change, which - depending on the size of your site - could be overkill. See [this article](https://nolanlawson.com/2012/10/31/better-synonym-handling-in-solr/) for a good breakdown.
### Searching for words containing numbers
By default, the module is configured to split words containing numbers into multiple tokens. For example, the word "A1" would be interpreted as "A" "1", and since "a" is a common stopword, the term "A1" will be excluded from search.
To allow searches on words containing numeric tokens, you'll need to change the behaviour of the `WordDelimiterFilterFactory` with an overloaded template as described above. Each instance of `<filter class="solr.WordDelimiterFilterFactory">` needs to include the following attributes and values:
- add `splitOnNumerics="0"` on all `WordDelimiterFilterFactory` fields
- change `catenateNumbers="1"` to `catenateNumbers="0"` on all `WordDelimiterFilterFactory` fields
### Searching for macrons and other Unicode characters
The `ASCIIFoldingFilterFactory` filter converts alphabetic, numeric, and symbolic Unicode characters which are not in the Basic Latin Unicode block (the first 127 ASCII characters) to their ASCII equivalents, if one exists.
Find the fields in your overloaded `types.ss` that you want to enable this behaviour in, for example inside the `<fieldType name="htmltext">` block, add the following to both its index analyzer and query analyzer records.
```xml
<filter class="solr.ASCIIFoldingFilterFactory"/>
```

View File

@ -1,458 +1,3 @@
# Solr connector for SilverStripe fulltextsearch module
All possible parameters incl optional ones with example values:
## Configuration
### Create an index
```php
// File: mysite/code/MyIndex.php:
use SilverStripe\FullTextSearch\Solr\SolrIndex;
class MyIndex extends SolrIndex
{
public function init()
{
$this->addClass(Page::class);
$this->addAllFulltextFields();
}
}
```
### Create the index schema
The PHP-based index definition is an abstraction layer for the actual Solr XML configuration.
In order to create or update it, you need to run the `Solr_Configure` task.
```
vendor/bin/sake dev/tasks/Solr_Configure
```
Based on the sample configuration above, this command will do the following:
- Create a `<BASE_PATH>/.solr/MyIndex` folder
- Copy configuration files from `vendor/silverstripe/fulltextsearch/conf/extras/` to `<BASE_PATH>/.solr/MyIndex/conf`
- Generate a `schema.xml`, and place it it in `<BASE_PATH>/.solr/MyIndex/conf`
If you call the task with an existing index folder,
it will overwrite all files from their default locations,
regenerate the `schema.xml`, and ask Solr to reload the configuration.
You can use the same command for updating an existing schema,
which will automatically apply without requiring a Solr server restart.
### Reindex
After configuring Solr, you have the option to add your existing
content to its indices. Run the following command:
```
vendor/bin/sake dev/tasks/Solr_Reindex
```
This will delete and rebuild all indices. Depending on your data,
this can take anywhere from minutes to hours.
Keep in mind that the normal mode of updating indices is
based on ORM manipulations of the underlying data.
For example, calling `$myPage->write()` will automatically
update the index entry for this record (and all its variants).
This task has the following options:
- `verbose`: Debug information
Internally, depending on what job processing backend you have configured (such as queuedjobs)
individual tasks for re-indexing groups of records may either be performed behind the scenes
as crontasks, or via separate processes initiated by the current request.
Internally groups of records are grouped into sizes of 200. You can configure this
group sizing by using the `Solr_Reindex.recordsPerRequest` config.
```yaml
SilverStripe\FullTextSearch\Solr\Tasks\Solr_Reindex:
recordsPerRequest: 150
```
Note: The Solr indexes will be stored as binary files inside your SilverStripe project.
You can also copy the `thirdparty/` solr directory somewhere else,
just set the `path` value in `mysite/_config.php` to point to the new location.
You can also run the reindex task through a web request.
By default, the web request won't receive any feedback while its running.
Depending on your PHP and web server configuration,
the web request itself might time out, but the reindex continues anyway.
This is possible because the actual index operations are run as separate
PHP sub-processes inside the main web request.
### File-based configuration (solrconfig.xml etc)
Many aspects of Solr are configured outside of the `schema.xml` file
which SilverStripe generates based on the index PHP file.
For example, stopwords are placed in their own `stopwords.txt` file,
and spell checks are configured in `solrconfig.xml`.
By default, these files are copied from the `fulltextsearch/conf/extras/`
directory over to the new index location. In order to use your own files,
copy these files into a location of your choosing (for example `mysite/data/solr/`),
and tell Solr to use this folder with the `extraspath` configuration setting.
```php
// mysite/_config.php
use SilverStripe\Control\Director;
use SilverStripe\FullTextSearch\Solr\Solr;
Solr::configure_server([
// ...
'extraspath' => Director::baseFolder() . '/mysite/data/solr/',
]);
```
Please run the `Solr_Configure` task for the changes to take effect.
Note: You can also define those on an index-by-index basis by
implementing `SolrIndex->getExtrasPath()`.
### Custom Types
Solr supports custom field type definitions which are written to its XML schema.
Many standard ones are already included in the default schema.
As the XML file is generated dynamically, we can add our own types
by overloading the template responsible for it: `types.ss`.
In the following example, we read out type definitions
from a new file `mysite/solr/templates/types.ss` instead:
```php
use SilverStripe\Control\Director;
use SilverStripe\FullTextSearch\Solr\SolrIndex;
class MyIndex extends SolrIndex
{
public function getTypes()
{
return $this->renderWith(Director::baseFolder() . '/mysite/solr/templates/types.ss');
}
}
```
#### Searching for words containing numbers
By default, the fulltextmodule is configured to split words containing numbers into multiple tokens. For example, the word "A1" would be interpreted as "A" "1"; since "a" is a common stopword, the term "A1" will be excluded from search.
To allow searches on words containing numeric tokens, you'll need to update your overloaded template to change the behaviour of the WordDelimiterFilterFactory. Each instance of `<filter class="solr.WordDelimiterFilterFactory">` needs to include the following attributes and values:
* add splitOnNumerics="0" on all WordDelimiterFilterFactory fields
* change catenateOnNumbers="1" on all WordDelimiterFilterFactory fields
Update your index to point to your overloaded template using the method described above.
#### Searching for macrons and other Unicode characters
The "ASCIIFoldingFilterFactory" filter converts alphabetic, numeric, and symbolic Unicode characters which are not in the Basic Latin Unicode block (the first 127 ASCII characters) to their ASCII equivalents, if one exists.
Find the fields in your overloaded `types.ss` that you want to enable this behaviour in. EG:
```xml
<fieldType name="htmltext" class="solr.TextField" ... >
```
Add the following to both its index analyzer and query analyzer records.
```xml
<filter class="solr.ASCIIFoldingFilterFactory"/>
```
Update your index to point to your overloaded template using the method described above.
### Spell Checking ("Did you mean...")
Solr has various spell checking strategies (see the ["SpellCheckComponent" docs](http://wiki.apache.org/solr/SpellCheckComponent)), all of which are configured through `solrconfig.xml`.
In the default config which is copied into your index,
spell checking data is collected from all fulltext fields
(everything you added through `SolrIndex->addFulltextField()`).
The values of these fields are collected in a special `_text` field.
```php
use SilverStripe\FullTextSearch\Search\Queries;
$index = new MyIndex();
$query = new SearchQuery();
$query->addSearchTerm('My Term');
$params = [
'spellcheck' => 'true',
'spellcheck.collate' => 'true',
];
$results = $index->search($query, -1, -1, $params);
$results->spellcheck;
```
The built-in `_text` data is better than nothing, but also has some problems:
Its heavily processed, for example by stemming filters which butcher words.
So misspelling "Govnernance" will suggest "govern" rather than "Governance".
This can be fixed by aggregating spell checking data in a separate
```php
use SilverStripe\CMS\Model\SiteTree;
use SilverStripe\FullTextSearch\Solr\SolrIndex;
class MyIndex extends SolrIndex
{
public function init()
{
// ...
$this->addCopyField(SiteTree::class . '_Title', 'spellcheckData');
$this->addCopyField(SomeModel::class . '_Title', 'spellcheckData');
$this->addCopyField(SiteTree::class . '_Content', 'spellcheckData');
$this->addCopyField(SomeModel::class . '_Content', 'spellcheckData');
}
// ...
public function getFieldDefinitions()
{
$xml = parent::getFieldDefinitions();
$xml .= "\n\n\t\t<!-- Additional custom fields for spell checking -->";
$xml .= "\n\t\t<field name='spellcheckData' type='textSpellHtml' indexed='true' stored='false' multiValued='true' />";
return $xml;
}
}
```
Now you need to tell solr to use our new field for gathering spelling data.
In order to customize the spell checking configuration,
create your own `solrconfig.xml` (see "File-based configuration").
In there, change the following directive:
```xml
<!-- ... -->
<searchComponent name="spellcheck" class="solr.SpellCheckComponent">
<!-- ... -->
<str name="field">spellcheckData</str>
</searchComponent>
```
Don't forget to copy the new configuration via a call to the `Solr_Configure`
task, and reindex your data before using the spell checker.
### Limiting search fields
Solr has a way of specifying which fields to search on. You specify these
fields as a parameter to `SearchQuery`.
In the following example, we're telling Solr to *only* search the
`Title` and `Content` fields. Note that the fields must be specified in
the search parameters as "composite fields", which means they should be
specified in the form of `{table}_{field}`.
These fields are defined in the schema.xml file that gets sent to Solr.
```php
use SilverStripe\CMS\Model\SiteTree;
use SilverStripe\FullTextSearch\Search\Queries\SearchQuery;
$query = new SearchQuery();
$query->addClassFilter(Page::class);
$query->addSearchTerm('someterms', [SiteTree::class . '_Title', SiteTree::class . '_Content']);
$result = singleton(SolrSearchIndex::class)->search($query, -1, -1);
// the request to Solr would be:
// q=(SiteTree_Title:Lorem+OR+SiteTree_Content:Lorem)
```
### Configuring boosts
There are several ways in which you can configure boosting on search fields or terms.
#### Boosting on search query
Solr has a way of specifying which fields should be boosted as a parameter to `SearchQuery`.
This means if you boost a certain field, search query matches on that field will be considered
higher relevance than other fields with matches, and therefore those results will be closer
to the top of the results.
In this example, we enter "Lorem" as the search term, and boost the `Content` field:
```php
use SilverStripe\CMS\Model\SiteTree;
use SilverStripe\FullTextSearch\Search\Queries\SearchQuery;
$query = new SearchQuery();
$query->addClassFilter(Page::class);
$query->addSearchTerm('Lorem', null, [SiteTree::class . '_Content' => 2]);
$result = singleton(SolrSearchIndex::class)->search($query, -1, -1);
// the request to Solr would be:
// q=SiteTree_Content:Lorem^2
```
More information on [relevancy on the Solr wiki](http://wiki.apache.org/solr/SolrRelevancyFAQ).
### Boosting on index fields
Boost values for specific can also be specified directly on the `SolrIndex` class directly.
The following methods can be used to set one or more boosted fields:
* `SolrIndex::addBoostedField` Adds a field with a specific boosted value (defaults to 2)
* `SolrIndex::setFieldBoosting` If a field has already been added to an index, the boosting
value can be customised, changed, or reset for a single field.
* `SolrIndex::addFulltextField` A boost can be set for a field using the `$extraOptions` parameter
with the key `boost` assigned to the desired value.
For example:
```php
use SilverStripe\CMS\Model\SiteTree;
use SilverStripe\FullTextSearch\Solr\SolrIndex;
class SolrSearchIndex extends SolrIndex
{
public function init()
{
$this->addClass(SiteTree::class);
$this->addAllFulltextFields();
$this->addFilterField('ShowInSearch');
$this->addBoostedField('Title', null, [], 1.5);
$this->setFieldBoosting(SiteTree::class . '_SearchBoost', 2);
}
}
```
### Custom Types
Solr supports custom field type definitions which are written to its XML schema.
Many standard ones are already included in the default schema.
As the XML file is generated dynamically, we can add our own types
by overloading the template responsible for it: `types.ss`.
In the following example, we read out type definitions
from a new file `mysite/solr/templates/types.ss` instead:
```php
use SilverStripe\Control\Director;
use SilverStripe\FullTextSearch\Solr\SolrIndex;
class MyIndex extends SolrIndex
{
public function getTemplatesPath()
{
return Director::baseFolder() . '/mysite/solr/templates/';
}
}
```
### Highlighting
Solr can highlight the searched terms in context of the matched content,
to help users determine the relevancy of results (e.g. in which part of a sentence
the term is used). In order to use this feature, the full content of the
field to be highlighted needs to be stored in the index,
by declaring it through `addStoredField()`.
```php
use SilverStripe\FullTextSearch\Solr\SolrIndex;
class MyIndex extends SolrIndex
{
public function init()
{
$this->addClass(Page::class);
$this->addAllFulltextFields();
$this->addStoredField('Content');
}
}
```
To search with highlighting enabled, you need to pass in a custom query parameter.
There's a lot more parameters to tweak results on the [Solr Wiki](http://wiki.apache.org/solr/HighlightingParameters).
```php
use SilverStripe\FullTextSearch\Search\Queries\SearchQuery;
$index = new MyIndex();
$query = new SearchQuery();
$query->addSearchTerm('My Term');
$results = $index->search($query, -1, -1, ['hl' => 'true']);
```
Each result will automatically contain an "Excerpt" property
which you can use in your own results template.
The searched term is highlighted with an `<em>` tag by default.
Note: It is recommended to strip out all HTML tags and convert entities on the indexed content,
to avoid matching HTML attributes, and cluttering highlighted content with unparsed HTML.
### Adding additional information into search results
Inside the SolrIndex::search() function, the third-party library solr-php-client
is used to send data to Solr and parse the response. Additional information can
be pulled from this response and added to your results object for use in templates
using the `updateSearchResults()` extension hook.
```php
use SilverStripe\FullTextSearch\Search\Queries\SearchQuery;
$index = new MyIndex();
$query = new SearchQuery();
$query->addSearchTerm('My Term');
$results = $index->search($query, -1, -1, [
'facet' => 'true',
'facet.field' => 'SiteTree_ClassName',
]);
```
By adding facet fields into the query parameters, our response object from Solr
now contains some additional information that we can add into the results sent
to the page.
```php
use SilverStripe\Core\Extension;
use SilverStripe\View\ArrayData;
use SilverStripe\ORM\ArrayList;
class MyResultsExtension extends Extension
{
/**
* Adds extra information from the solr-php-client repsonse
* into our search results.
* @param ArrayData $results The ArrayData that will be used to generate search
* results pages.
* @param stdClass $response The solr-php-client response object.
*/
public function updateSearchResults($results, $response)
{
if (!isset($response->facet_counts) || !isset($response->facet_counts->facet_fields)) {
return;
}
$facetCounts = ArrayList::create(array());
foreach($response->facet_counts->facet_fields as $name => $facets) {
$facetDetails = ArrayData::create([
'Name' => $name,
'Facets' => ArrayList::create([]),
]);
foreach($facets as $facetName => $facetCount) {
$facetDetails->Facets->push(ArrayData::create([
'Name' => $facetName,
'Count' => $facetCount,
]));
}
$facetCounts->push($facetDetails);
}
$results->setField('FacetCounts', $facetCounts);
}
}
```
We can now access the facet information inside our templates.
### Adding Analyzers, Tokenizers and Token Filters
When a document is indexed, its individual fields are subject to the analyzing and tokenizing filters that can transform and normalize the data in the fields. For example — removing blank spaces, removing html code, stemming, removing a particular character and replacing it with another