split querying into its own file

adding non-SiteTree dataobjects

subsite boosting
This commit is contained in:
Andrew Aitken-Fincham 2018-06-11 17:21:35 +01:00 committed by Daniel Hensby
parent 53eb826681
commit 06c604c9f3
No known key found for this signature in database
GPG Key ID: D8DEBC4C8E7BC8B9
8 changed files with 218 additions and 248 deletions

View File

@ -19,7 +19,9 @@ Adds support for fulltext search engines like Sphinx and Solr to SilverStripe CM
## Documentation
See [the docs](/docs/en/00_index.md), or for the quick version see [the quick start guide](/docs/en/01_getting_started.md#quick-start).
For pure Solr docs, check out [the Solr 4.10.4 guide](https://archive.apache.org/dist/lucene/solr/ref-guide/apache-solr-ref-guide-4.10.pdf).
See [the docs](/docs/en/00_index.md) for configuration and setup, or for the quick version see [the quick start guide](/docs/en/01_getting_started.md#quick-start).
For details of updates, bugfixes, and features, please see the [changelog](CHANGELOG.md).
@ -48,8 +50,4 @@ maybe 'Content->Summary' to allow calling a specific method on the field object
- Allow user logic to cause triggering reindex of documents when field is user generated
* Add sphinx connector
* Add generic APIs for spell correction, file text extraction and snippet generation
* Better docs

View File

@ -11,20 +11,24 @@
- [Solr server parameters](03_configuration.md#solr-server-parameters)
- [Creating an index](03_configuration.md#creating-an-index)
- [Adding data to an index](03_configuration.md#adding-data-to-an-index)
- [Querying an index](03_configuration.md#querying-the-index)
- [Running the dev/tasks](03_configuration.md#dev-tasks)
- [File-based configuration](03_configuration.md#file-based-configuration)
- [Handling results](03_configuration.md#handling-results)
- Querying
- [Building a SearchQuery](04_querying.md#building-a-`searchquery`)
- [Searching value ranges](04_querying.md#searching-value-ranges)
- [Empty or existing values](04_querying.md#empty-or-existing-values)
- [Executing your query](04_querying.md#executing-your-query)
- Advanced configuration
- [Facets](04_advanced_configuration.md#facets)
- [Using multiple indexes](04_advanced_configuration.md#multiple-indexes)
- [Analyzers, tokens and token filters](04_advanced_configuration.md#analyzers,-tokenizers-and-token-filters)
- [Spellcheck](04_advanced_configuration.md#spell-check-("did-you-mean..."))
- [Highlighting](04_advanced_configuration.md#highlighting)
- [Boosting](04_advanced_configuration.md#boosting)
- [Indexing related objects](04_advanced_configuration.md#indexing-related-objects)
- [Subsites](04_advanced_configuration.md#subsites)
- [Custom field types](04_advanced_configuration.md#custom-field-types)
= [Text extraction](04_advanced_configuration.md#text-extraction)
- [Facets](05_advanced_configuration.md#facets)
- [Using multiple indexes](05_advanced_configuration.md#multiple-indexes)
- [Analyzers, tokens and token filters](05_advanced_configuration.md#analyzers,-tokenizers-and-token-filters)
- [Spellcheck](05_advanced_configuration.md#spell-check-("did-you-mean..."))
- [Highlighting](05_advanced_configuration.md#highlighting)
- [Boosting](05_advanced_configuration.md#boosting)
- [Indexing related objects](05_advanced_configuration.md#indexing-related-objects)
- [Subsites](05_advanced_configuration.md#subsites)
- [Custom field types](05_advanced_configuration.md#custom-field-types)
= [Text extraction](05_advanced_configuration.md#text-extraction)
- Troubleshooting
- [Gotchas](05_troubleshooting.md#common-gotchas)
- [Gotchas](06_troubleshooting.md#common-gotchas)

View File

@ -106,7 +106,7 @@ $page = Page::create(['Content' => 'Help me. My house is on fire. This is less t
$page->write();
```
Depending on the size of the index and how much content needs to be processed, it could take a while for your search results to be updated, so your newly-updated page may not be available in your search results immediately.
Depending on the size of the index and how much content needs to be processed, it could take a while for your search results to be updated, so your newly-updated page may not be available in your search results immediately. This approach is typically not recommended.
### Queued jobs
@ -136,138 +136,41 @@ class MyIndex extends SolrIndex
}
```
Alternatively, you can index draft content, but simply exclude it from searches. This can be handy to preview search results on unpublished content, in case a CMS author is logged in. Before constructing your `SearchQuery`, conditionally switch to the "live" stage:
Alternatively, you can index draft content, but simply exclude it from searches. This can be handy to preview search results on unpublished content, in case a CMS author is logged in. Before constructing your `SearchQuery`, conditionally switch to the "live" stage.
### Adding DataObjects
If you create a class that extends `DataObject` (and not `Page`) then it won't be automatically added to the search
index. You'll have to make some changes to add it in. The `DataObject` class will require the following minimum code
to render properly in the search results:
* `Link()` needs to return the URL to follow from the search results to actually view the object.
* `Name` (as a DB field) will be used as the result title.
* `Abstract` (as a DB field) will show under the search result title.
* `getShowInSearch()` is required to get the record to show in search, since all results are filtered by `ShowInSearch`.
So with that, you can add your class to your index:
```php
use SilverStripe\FullTextSearch\Search\Queries\SearchQuery;
use SilverStripe\Security\Permission;
use SilverStripe\Versioned\Versioned;
use My\Namespace\Model\SearchableDataObject;
use SilverStripe\FullTextSearch\Solr\SolrIndex;
use Page;
if (!Permission::check('CMS_ACCESS_CMSMain')) {
Versioned::set_stage(Versioned::LIVE);
class MySolrSearchIndex extends SolrIndex {
public function init()
{
$this->addClass(SearchableDataObject::class);
$this->addClass(Page::class);
$this->addAllFulltextFields();
}
}
$query = SearchQuery::create();
// ...
```
## Querying an index
This is where the magic happens. You will construct the search terms and other parameters required to form a `SearchQuery` object, and pass that into a `SearchIndex` to get results.
### Building a `SearchQuery`
First, you'll need to construct a new `SearchQuery` object:
```php
use SilverStripe\FullTextSearch\Search\Queries\SearchQuery;
$query = SearchQuery::create();
```
You can then alter the `SearchQuery` with a number of methods:
#### `addSearchTerm()`
The simplest - pass through a string to search your index for.
```php
use SilverStripe\FullTextSearch\Search\Queries\SearchQuery;
$query = SearchQuery::create()
->addSearchTerm('fire');
```
You can also limit this to specific fields by passing an array as the second argument, specified in the form of `{table}_{field}`:
```php
use SilverStripe\FullTextSearch\Search\Queries\SearchQuery;
use Page;
$query = SearchQuery::create()
->addSearchTerm('on fire', [Page::class . '_Title']);
```
#### `addFuzzySearchTerm()`
Pass through a string to search your index for, with "fuzzier" matching - this means that a term like "fishing" would also likely find results containing "fish" or "fisher". Otherwise behaves the same as `addSearchTerm()`.
```php
use SilverStripe\FullTextSearch\Search\Queries\SearchQuery;
$query = SearchQuery::create()
->addFuzzySearchTerm('fire');
```
#### `addClassFilter()`
Only query a specific class in the index, optionally including subclasses.
```php
use SilverStripe\FullTextSearch\Search\Queries\SearchQuery;
use My\Namespace\PageType\SpecialPage;
$query = SearchQuery::create()
->addClassFilter(SpecialPage::class, false); // only return results from SpecialPages, not subclasses
```
#### Searching value ranges
Most values can be expressed as ranges, most commonly dates or numbers. To search for a range of values rather than an exact match,
use the `SearchQuery_Range` class. The range can include bounds on both sides, or stay open-ended by simply leaving the argument blank.
It takes arguments in the form of `SearchQuery_Range::create($start, $end))`:
```php
use SilverStripe\FullTextSearch\Search\Queries\SearchQuery;
use SilverStripe\FullTextSearch\Search\Queries\SearchQuery_Range;
use My\Namespace\Index\MyIndex;
use Page;
$query = SearchQuery::create()
->addSearchTerm('fire')
// Only include documents edited in 2011 or earlier
->addFilter(Page::class . '_LastEdited', SearchQuery_Range::create(null, '2011-12-31T23:59:59Z'));
$results = MyIndex::singleton()->search($query);
```
Note: At the moment, the date format is specific to the search implementation.
#### Searching for empty or existing values
Since there's a type conversion between the SilverStripe database, object properties
and the search index persistence, it's often not clear which condition is searched for.
Should it equal an empty string, or only match if the field wasn't indexed at all?
The `SearchQuery` API has the concept of a "missing" and "present" field value for this:
```php
use SilverStripe\FullTextSearch\Search\Queries\SearchQuery;
use My\Namespace\Index\MyIndex;
use Page;
$query = SearchQuery::create()
->addSearchTerm('fire');
// Needs a value, although it can be false
->addFilter(Page::class . '_ShowInMenus', SearchQuery::$present);
$results = MyIndex::singleton()->search($query);
```
### Querying an index
Once you have your query constructed, you need to run it against your index.
```php
use SilverStripe\FullTextSearch\Search\Queries\SearchQuery;
use My\Namespace\Index\MyIndex;
$query = SearchQuery::create()->addSearchTerm('fire');
$results = MyIndex::singleton()->search($query);
```
The return value of a `search()` call is an object which contains a few properties:
* `Matches`: `ArrayList` of the current "page" of search results.
* `Suggestion`: (optional) Any suggested spelling corrections in the original query notation
* `SuggestionNice`: (optional) Any suggested spelling corrections for display (without query notation)
* `SuggestionQueryString` (optional) Link to repeat the search with suggested spelling corrections
Once you've created the above classes and run the [solr dev tasks](#solr-dev-tasks) to tell Solr about the new index
you've just created, this will add `SearchableDataObject` and the text fields it has to the index. Now when you search
on the site using `MySolrSearchIndex->search()`, the `SearchableDataObject` results will show alongside normal `Page`
results.
## Solr dev tasks
@ -306,7 +209,7 @@ The Solr indexes will be stored as binary files inside your SilverStripe project
## File-based configuration
Many aspects of Solr are configured outside of the `schema.xml` file which SilverStripe generates based on the `SolrIndex` subclass that is defined. For example, stopwords are placed in their own `stopwords.txt` file, and advanced [spellchecking](04_advanced_configuration.md#spell-check-("did-you-mean...")) can be configured in `solrconfig.xml`.
Many aspects of Solr are configured outside of the `schema.xml` file which SilverStripe generates based on the `SolrIndex` subclass that is defined. For example, stopwords are placed in their own `stopwords.txt` file, and advanced [spellchecking](05_advanced_configuration.md#spell-check-("did-you-mean...")) can be configured in `solrconfig.xml`.
By default, these files are copied from the `fulltextsearch/conf/extras/` directory over to the new index location. In order to use your own files, copy these files into a location of your choosing (for example `mysite/data/solr/`), and tell Solr to use this folder with the `extraspath` [configuration setting](#solr-server-parameters). Run a [`Solr_Configure](#solr-configure) to apply these changes.
@ -340,7 +243,7 @@ class PageController extends ContentController
In your template (e.g. `Page_results.ss`) you can access the results and loop through them. They're stored in the `$Matches` property of the search return object.
```ss
```silverstripe
<% if $SearchResult.Matches %>
<h2>Results for &quot;{$Query}&quot;</h2>
<p>Displaying Page $SearchResult.Matches.CurrentPage of $SearchResult.Matches.TotalPages</p>

130
docs/en/04_querying.md Normal file
View File

@ -0,0 +1,130 @@
# Querying
This is where the magic happens. You will construct the search terms and other parameters required to form a `SearchQuery` object, and pass that into a `SearchIndex` to get results.
## Building a `SearchQuery`
First, you'll need to construct a new `SearchQuery` object:
```php
use SilverStripe\FullTextSearch\Search\Queries\SearchQuery;
$query = SearchQuery::create();
```
You can then alter the `SearchQuery` with a number of methods:
### `addSearchTerm()`
The simplest - pass through a string to search your index for.
```php
use SilverStripe\FullTextSearch\Search\Queries\SearchQuery;
$query = SearchQuery::create()
->addSearchTerm('fire');
```
You can also limit this to specific fields by passing an array as the second argument, specified in the form of `{table}_{field}`:
```php
use SilverStripe\FullTextSearch\Search\Queries\SearchQuery;
use Page;
$query = SearchQuery::create()
->addSearchTerm('on fire', [Page::class . '_Title']);
```
### `addFuzzySearchTerm()`
Pass through a string to search your index for, with "fuzzier" matching - this means that a term like "fishing" would also likely find results containing "fish" or "fisher". Otherwise behaves the same as `addSearchTerm()`.
```php
use SilverStripe\FullTextSearch\Search\Queries\SearchQuery;
$query = SearchQuery::create()
->addFuzzySearchTerm('fire');
```
### `addClassFilter()`
Only query a specific class in the index, optionally including subclasses.
```php
use SilverStripe\FullTextSearch\Search\Queries\SearchQuery;
use My\Namespace\PageType\SpecialPage;
$query = SearchQuery::create()
->addClassFilter(SpecialPage::class, false); // only return results from SpecialPages, not subclasses
```
## Searching value ranges
Most values can be expressed as ranges, most commonly dates or numbers. To search for a range of values rather than an exact match,
use the `SearchQuery_Range` class. The range can include bounds on both sides, or stay open-ended by simply leaving the argument blank.
It takes arguments in the form of `SearchQuery_Range::create($start, $end))`:
```php
use SilverStripe\FullTextSearch\Search\Queries\SearchQuery;
use SilverStripe\FullTextSearch\Search\Queries\SearchQuery_Range;
use My\Namespace\Index\MyIndex;
use Page;
$query = SearchQuery::create()
->addSearchTerm('fire')
// Only include documents edited in 2011 or earlier
->addFilter(Page::class . '_LastEdited', SearchQuery_Range::create(null, '2011-12-31T23:59:59Z'));
$results = MyIndex::singleton()->search($query);
```
### How do I use date ranges where dates might not be defined?
The Solr index updater only includes dates with values, so the field might not exist in all your index entries. A simple bounded range query (`<field>:[* TO <date>]`) will fail in this case. In order to query the field, reverse the search conditions and exclude the ranges you don't want:
```php
// Wrong: Filter will ignore all empty field values
$query->addFilter('fieldname', SearchQuery_Range::create('*', 'somedate'));
// Right: Exclude the opposite range
$query->addExclude('fieldname', SearchQuery_Range::create('somedate', '*'));
```
Note: At the moment, the date format is specific to the search implementation.
## Empty or existing values
Since there's a type conversion between the SilverStripe database, object properties
and the search index persistence, it's often not clear which condition is searched for.
Should it equal an empty string, or only match if the field wasn't indexed at all?
The `SearchQuery` API has the concept of a "missing" and "present" field value for this:
```php
use SilverStripe\FullTextSearch\Search\Queries\SearchQuery;
use My\Namespace\Index\MyIndex;
use Page;
$query = SearchQuery::create()
->addSearchTerm('fire');
// Needs a value, although it can be false
->addFilter(Page::class . '_ShowInMenus', SearchQuery::$present);
$results = MyIndex::singleton()->search($query);
```
## Executing your query
Once you have your query constructed, you need to run it against your index.
```php
use SilverStripe\FullTextSearch\Search\Queries\SearchQuery;
use My\Namespace\Index\MyIndex;
$query = SearchQuery::create()->addSearchTerm('fire');
$results = MyIndex::singleton()->search($query);
```
The return value of a `search()` call is an object which contains a few properties:
* `Matches`: `ArrayList` of the current "page" of search results.
* `Suggestion`: (optional) Any suggested spelling corrections in the original query notation
* `SuggestionNice`: (optional) Any suggested spelling corrections for display (without query notation)
* `SuggestionQueryString` (optional) Link to repeat the search with suggested spelling corrections

View File

@ -357,8 +357,31 @@ class SolrSearchIndex extends SolrIndex
## Indexing related objects
To add a related object to your index.
## Subsites
When you are utilising the [subsites module](https://github.com/silverstripe/silverstripe-subsites) you
may want to add [boosting](#boosting/weighting) to results from the current subsite. To do so, you'll
need to use [eDisMax](https://lucene.apache.org/solr/guide/6_6/the-extended-dismax-query-parser.html)
and the supporting parameters `bq` and `bf`. You should add the following to your `SolrIndex`
extension:
```php
use SilverStripe\FullTextSearch\Search\Queries\SearchQuery;
use SilverStripe\Subsites\Model\Subsite;
public function search(SearchQuery $query, $offset = -1, $limit = -1, $params = [])) {
$params = array_merge($params, [
'defType' => 'edismax', // turn on eDisMax
'bq' => '_subsite:'.Subsite::currentSubsiteID(), // boost-query on current subsite ID
'bf' => '_subsite^2' // double the score of any document with that subsite ID
]);
return parent::search($query, $offset, $limit, $params);
}
```
## Custom field types
Solr supports custom field type definitions which are written to its XML schema. Many standard ones are already included

View File

@ -1,3 +0,0 @@
# Troubleshooting
## Common gotchas

View File

@ -0,0 +1,14 @@
# Troubleshooting
## Common gotchas
* By default number-letter boundaries are treated as a word boundary. For example, `A1` is two words - `a` and `1` - when Solr parses the search term.
* Special characters and operators are not correctly escaped
* Multi-word synonym issues
* When Dolr indexes are reconfigured and reindexed, their content is trashed and rebuilt
### CWP-specific
* `solrconfig.xml` customisations fail silently
* Developers arent able to test raw queries or see output via the
[Solr admin interface](02_setup.md#solr-admin)

View File

@ -1,99 +0,0 @@
## Adding DataObject classes to Solr search
If you create a class that extends `DataObject` (and not `Page`) then it won't be automatically added to the search
index. You'll have to make some changes to add it in.
So, let's take an example of `StaffMember`:
```php
use SilverStripe\Control\Controller;
use SilverStripe\ORM\DataObject;
class StaffMember extends DataObject
{
private static $db = [
'Name' => 'Varchar(255)',
'Abstract' => 'Text',
'PhoneNumber' => 'Varchar(50)',
];
public function Link($action = 'show')
{
return Controller::join_links('my-controller', $action, $this->ID);
}
public function getShowInSearch()
{
return 1;
}
}
```
This `DataObject` class has the minimum code necessary to allow it to be viewed in the site search.
`Link()` will return a URL for where a user goes to view the data in more detail in the search results.
`Name` will be used as the result title, and `Abstract` the summary of the staff member which will show under the
search result title.
`getShowInSearch` is required to get the record to show in search, since all results are filtered by `ShowInSearch`.
So with that, let's create a new class called `MySolrSearchIndex`:
```php
use StaffMember;
use SilverStripe\CMS\Model\SiteTree;
use SilverStripe\FullTextSearch\Solr\SolrIndex;
class MySolrSearchIndex extends SolrIndex {
public function init()
{
$this->addClass(SiteTree::class);
$this->addClass(StaffMember::class);
$this->addAllFulltextFields();
$this->addFilterField('ShowInSearch');
}
}
```
This is a copy/paste of the existing configuration but with the addition of `StaffMember`.
Once you've created the above classes and run `flush=1`, access `dev/tasks/Solr_Configure` and `dev/tasks/Solr_Reindex`
to tell Solr about the new index you've just created. This will add `StaffMember` and the text fields it has to the
index. Now when you search on the site using `MySolrSearchIndex->search()`,
the `StaffMember` results will show alongside normal `Page` results.
## Debugging
### Using the web admin interface
You can visit `http://localhost:8983/solr`, which will show you a list
to the admin interfaces of all available indices.
There you can search the contents of the index via the native SOLR web interface.
It is possible to manually replicate the data automatically sent
to Solr when saving/publishing in SilverStripe,
which is useful when debugging front-end queries,
see `thirdparty/fulltextsearch/server/silverstripe-solr-test.xml`.
```
java -Durl=http://localhost:8983/solr/MyIndex/update/ -Dtype=text/xml -jar post.jar silverstripe-solr-test.xml
```
## FAQ
### How do I use date ranges where dates might not be defined?
The Solr index updater only includes dates with values,
so the field might not exist in all your index entries.
A simple bounded range query (`<field>:[* TO <date>]`) will fail in this case.
In order to query the field, reverse the search conditions and exclude the ranges you don't want:
```php
// Wrong: Filter will ignore all empty field values
$myQuery->addFilter('fieldname', new SearchQuery_Range('*', 'somedate'));
// Better: Exclude the opposite range
$myQuery->addExclude('fieldname', new SearchQuery_Range('somedate', '*'));
```