silverstripe-fulltextsearch/docs/en/03_configuration.md

13 KiB

Configuration

Solr server parameters

Set these values inside your app/_config.php - the defaults are shown below:

use SilverStripe\FullTextSearch\Solr\Solr;

Solr::configure_server([
    'host' => 'localhost', // The host or IP address that Solr is listening on
    'port' => '8983', // The port Solr is listening on
    'path' => '/solr', // The suburl the Solr service is available on
    'version' => '4', // Solr server version - currently only 3 and 4 supported
    'service' => 'Solr4Service', // The class that provides actual communcation to the Solr server
    'extraspath' => BASE_PATH .'/vendor/silverstripe/fulltextsearch/conf/solr/4/extras/', // Absolute path to the folder containing templates used for generating the schema and field definitions
    'templates' => BASE_PATH . '/vendor/silverstripe/fulltextsearch/conf/solr/4/templates/', // Absolute path to the configuration default files, e.g. solrconfig.xml
    'indexstore' => [
        'mode' => NULL, // [REQUIRED] a classname which implements SolrConfigStore, or 'file' or 'webdav'
        'path' => NULL, // [REQUIRED] The (locally accessible) path to write the index configurations to OR The suburl on the Solr host that is set up to accept index configurations via webdav (e.g. BASE_PATH . '/.solr')
        'remotepath' => same as 'path' when using 'file' mode, // The path that the Solr server will read the index configurations from
        'auth' => NULL, // Webdav only - A username:password pair string to use to auth against the webdav server (e.g. solr:solr)
        'port' => '8983' // The port for WebDAV if different from the Solr port
    ]
]);

Note: We recommend to put the indexstore['path'] directory outside of the webroot. If you place it inside of the webroot (as shown in the example), please ensure its contents are not accessible through the webserver. This can be achieved by server configuration, or (in most configurations) also by marking the folder as hidden via a "dot" prefix.

Disabling automatic configuration

If you have this module installed but do not have a Solr server running, you can disable the database manipulation hooks that trigger automatic index updates:

SilverStripe\FullTextSearch\Search\Updaters\SearchUpdater:
  enabled: false

Creating an index

An index can essentially be considered a database that contains all of your searchable content. By default, it will store everything in a field called Content, which is queried to find your search results. To create an index that you can query, you can define it like so:

use Page;
use SilverStripe\FullTextSearch\Solr\SolrIndex;

class MyIndex extends SolrIndex
{
    public function init()
    {
        $this->addClass(Page::class);
        $this->addFulltextField('Title');
    }
}

This will create a new SolrIndex called MyIndex, and it will store the Title field on all Pages for searching. To index more than one class, you simply call addClass() multiple times. Fields that you add don't have to be present on all classes in the index, they will only apply to a class if it is present.

use Page;
use SilverStripe\Security\Member;
use SilverStripe\FullTextSearch\Solr\SolrIndex;

class MyIndex extends SolrIndex
{
    public function init()
    {
        $this->addClass(Page::class);
        $this->addClass(Member::class);
        $this->addFulltextField('Content'); // only applies to Page class
        $this->addFulltextField('FirstName'); // only applies to Member class
    }
}

You can also skip listing all searchable fields, and have the index figure it out automatically via addAllFulltextFields(). This will add any database fields that are instanceof DBString to the index. Use this with caution, however, as you may inadvertently return sensitive information - it is often safer to declare your fields explicitly.

Once you've added this file, make sure you run a Solr configure to set up your new index.

Adding data to an index

Once you have created your index, you can add data to it in a number of ways.

Reindex the site

Running the Solr reindex task will crawl your site for classes that match those defined on your index, and add the defined fields to the index for searching. This is the most common method used to build the index the first time, or to perform a full rebuild of the index.

Publish a page in the CMS

Every change, addition or removal of an indexed class instance triggers an index update through a "processor" object. The update is transparently handled through inspecting every executed database query and checking which database tables are involved in it.

A reindex event will trigger when you make a change in the CMS, via SearchUpdater::handle_manipulation(), or ProxyDBExtension::updateProxy(). This tracks changes to the database, so any alterations will trigger a reindex. In order to minimise delays to those users, the index update is deferred until after the actual request returns to the user, through PHP's register_shutdown_function() functionality.

Manually

If the situation calls for it, you can add an object to the index directly:

use Page;

$page = Page::create(['Content' => 'Help me. My house is on fire. This is less than optimal.']);
$page->write();

Depending on the size of the index and how much content needs to be processed, it could take a while for your search results to be updated, so your newly-updated page may not be available in your search results immediately. This approach is typically not recommended.

Queued jobs

If the Queued Jobs module is installed, updates are queued up instead of executed in the same request. Queued jobs are usually processed every minute. Large index updates will be batched into multiple queued jobs to ensure a job can run to completion within common constraints, such as memory and execution time limits. You can check the status of jobs in an administrative interface under admin/queuedjobs/.

Excluding draft content

By default, the SearchUpdater class indexes all available "variant states", so in the case of the Versioned extension, both "draft" and "live". For most cases, you'll want to exclude draft content from your search results.

You can either prevent the draft content from being indexed in the first place, by adding the following to your SearchIndex::init() method:

use Page;
use SilverStripe\FullTextSearch\Search\Variants\SearchVariantVersioned;
use SilverStripe\FullTextSearch\Solr\SolrIndex;
use SilverStripe\Versioned\Versioned;

class MyIndex extends SolrIndex
{
    public function init()
    {
        $this->addClass(Page::class);
        $this->addFulltextField('Title');
        $this->excludeVariantState([SearchVariantVersioned::class => Versioned::DRAFT]);
    }
}

Alternatively, you can index draft content, but simply exclude it from searches. This can be handy to preview search results on unpublished content, in case a CMS author is logged in. Before constructing your SearchQuery, conditionally switch to the "live" stage.

Adding DataObjects

If you create a class that extends DataObject (and not Page) then it won't be automatically added to the search index. You'll have to make some changes to add it in. The DataObject class will require the following minimum code to render properly in the search results:

  • Link() needs to return the URL to follow from the search results to actually view the object.
  • Name (as a DB field) will be used as the result title.
  • Abstract (as a DB field) will show under the search result title.
  • getShowInSearch() is required to get the record to show in search, since all results are filtered by ShowInSearch.

So with that, you can add your class to your index:

use My\Namespace\Model\SearchableDataObject;
use SilverStripe\FullTextSearch\Solr\SolrIndex;
use Page;

class MySolrSearchIndex extends SolrIndex {

    public function init()
    {
        $this->addClass(SearchableDataObject::class);
        $this->addClass(Page::class);
        $this->addAllFulltextFields();
    }
}

Once you've created the above classes and run the solr dev tasks to tell Solr about the new index you've just created, this will add SearchableDataObject and the text fields it has to the index. Now when you search on the site using MySolrSearchIndex->search(), the SearchableDataObject results will show alongside normal Page results.

Solr dev tasks

There are two dev/tasks that are central to the operation of the module - Solr_Configure and Solr_Reindex. You can access these through the web, or via CLI. Running via the web will return "quiet" output by default, but you can increase verbosity by adding ?verbose=1 to the dev/tasks URL; CLI will return verbose output by default.

It is often a good idea to run a configure, followed by a reindex, after a code change - for example, after a deployment.

Solr configure

dev/tasks/Solr_Configure

This task will upload configuration to the Solr core, reloading it or creating it as necessary, and generate the schema. This should be run after every code change to your indexes, or after any configuration changes. This will convert the PHP-based abstraction layer into actual Solr XML. Assuming default configuration and the use of the DefaultIndex, it will:

  • create the directory BASE_PATH/.solr/DefaultIndex/ if it doesn't already exist
  • copy configuration files from vendor/silverstripe/fulltextsearch/conf/extras to BASE_PATH/.solr/DefaultIndex/conf/
  • generate a schema.xml in BASE_PATH/.solr/DefaultIndex/conf/

This task will overwrite these files every time it is run.

Solr reindex

dev/tasks/Solr_Reindex

This task performs a reindex, which adds all the data specified in the index definition into the index store.

If you have the Queued Jobs module installed, then this task will create multiple reindex jobs that are processed asynchronously; unless you are in dev mode, in which case the index will be processed immediately (see processor.yml). Otherwise, it will run in one process. Often, if you are running it via the web, the request will time out. Usually this means the actually process is still running in the background, but it can be alarming to the user, so bear that in mind.

Internally groups of records are grouped into sizes of 200. You can configure this group sizing by using the Solr_Reindex.recordsPerRequest config:

SilverStripe\FullTextSearch\Solr\Tasks\Solr_Reindex:
  recordsPerRequest: 150

The Solr indexes will be stored as binary files inside your SilverStripe project. You can also copy the thirdparty/ Solr directory somewhere else, just set the path value in mysite/_config.php to point to the new location.

File-based configuration

Many aspects of Solr are configured outside of the schema.xml file which SilverStripe generates based on the SolrIndex subclass that is defined. For example, stopwords are placed in their own stopwords.txt file, and advanced spellchecking can be configured in solrconfig.xml.

By default, these files are copied from the fulltextsearch/conf/extras/ directory over to the new index location. In order to use your own files, copy these files into a location of your choosing (for example mysite/data/solr/), and tell Solr to use this folder with the extraspath configuration setting. Run a `Solr_Configure to apply these changes.

You can also define these on an index-by-index basis by defining SolrIndex->getExtrasPath().

Handling results

In order to render search results, you need to return them from a controller. You can also drive this through a form response through standard SilverStripe forms. In this case we simply assume there's a GET parameter named q with a search term present.

use SilverStripe\CMS\Controllers\ContentController;
use SilverStripe\Control\HTTPRequest;
use SilverStripe\FullTextSearch\Search\Queries\SearchQuery;
use My\Namespace\Index\MyIndex;

class PageController extends ContentController
{
    private static $allowed_actions = [
        'search',
    ];

    public function search(HTTPRequest $request)
    {
        $query = SearchQuery::create()->addSearchTerm($request->getVar('q'));
        return $this->renderWith([
            'SearchResult' => MyIndex::singleton()->search($query)
        ]);
    }
}

In your template (e.g. Page_results.ss) you can access the results and loop through them. They're stored in the $Matches property of the search return object.

<% if $SearchResult.Matches %>
    <h2>Results for &quot;{$Query}&quot;</h2>
    <p>Displaying Page $SearchResult.Matches.CurrentPage of $SearchResult.Matches.TotalPages</p>
    <ol>
        <% loop $SearchResult.Matches %>
            <li>
                <h3><a href="$Link">$Title</a></h3>
                <p><% if $Abstract %>$Abstract.XML<% else %>$Content.ContextSummary<% end_if %></p>
            </li>
        <% end_loop %>
    </ol>
<% else %>
    <p>Sorry, your search query did not return any results.</p>
<% end_if %>

Please check the pagination guide in the main SilverStripe documentation to learn how to paginate through search results.