Docs on index triggers, queues and timeouts

This commit is contained in:
Ingo Schommer 2015-05-04 17:40:44 +12:00
parent 79eb663638
commit 6321c5310f
3 changed files with 62 additions and 8 deletions

View File

@ -198,8 +198,23 @@ class Solr_Configure extends BuildTask {
}
}
/**
* The reindex task breaks up the actual reindex activity into small units of work.
* These are "shelled out" to a separate PHP process, which allows the parent task
* to run for longer than the maximum allowed execution time.
*
* If the task is run through a web request rather than CLI,
* it won't show any progress while its running. A web request will
* also likely time out faster than the full reindex run time,
* either because of PHP or web server execution time limits.
* Due to the separate PHP sub-processes running the actual reindex,
* the web server PHP module might not terminate the request and let it run to completion
* (this is the case for Apache with mod_php).
*
* See http://php.net/manual/en/function.set-time-limit.php
*/
class Solr_Reindex extends BuildTask {
static $recordsPerRequest = 200;
public function run($request) {
@ -210,9 +225,11 @@ class Solr_Reindex extends BuildTask {
$originalState = SearchVariant::current_state();
if (isset($_GET['start'])) {
// Run the actual reindex if a 'start' parameter is present
$this->runFrom(singleton($_GET['index']), $_GET['class'], $_GET['start'], json_decode($_GET['variantstate'], true));
}
else {
// Prepare and shell out the actual reindex
foreach(array('framework','sapphire') as $dirname) {
$script = sprintf("%s%s$dirname%scli-script.php", BASE_PATH, DIRECTORY_SEPARATOR, DIRECTORY_SEPARATOR);
if(file_exists($script)) {
@ -265,6 +282,7 @@ class Solr_Reindex extends BuildTask {
$cmd .= " verbose=1";
}
// Does not count towards the execution time of this script
$res = $verbose ? passthru($cmd) : `$cmd`;
if($verbose) echo " ".preg_replace('/\r\n|\n/', '$0 ', $res)."\n";

View File

@ -25,16 +25,16 @@ as the SilverStripe webhost.
## Installation (Local)
#### Get the Solr server
### Get the Solr server
composer require silverstripe/fulltextsearch-localsolr 4.5.1.x-dev
#### Start the server (via CLI, in a separate terminal window or background process)
### Start the server (via CLI, in a separate terminal window or background process)
cd fulltextsearch-localsolr/server/
java -jar start.jar
#### Configure the fulltextsearch Solr component to use the local server
### Configure the fulltextsearch Solr component to use the local server
Configure Solr in file mode. The 'path' directory has to be writeable
by the user the Solr search server is started with (see below).
@ -55,7 +55,9 @@ please ensure its contents are not accessible through the webserver.
This can be achieved by server configuration, or (in most configurations)
also by marking the folder as hidden via a "dot" prefix.
#### Create an index
## Configuration
### Create an index
// File: mysite/code/MyIndex.php:
<?php
@ -66,7 +68,10 @@ also by marking the folder as hidden via a "dot" prefix.
}
}
#### Initialize the configuration (via CLI)
### Create the index schema
The PHP-based index definition is an abstraction layer for the actual Solr XML configuration.
In order to create or update it, you need to run the `Solr_Configure` task.
sake dev/tasks/Solr_Configure
@ -76,11 +81,14 @@ Based on the sample configuration above, this command will do the following:
- Copy configuration files from `fulltextsearch/conf/extras/` to `<BASE_PATH>/.solr/MyIndex/conf`
- Generate a `schema.xml`, and place it it in `<BASE_PATH>/.solr/MyIndex/conf`
If you call the `Solr_configure` task with an existing index folder,
If you call the task with an existing index folder,
it will overwrite all files from their default locations,
regenerate the `schema.xml`, and ask Solr to reload the configuration.
## Usage
You can use the same command for updating an existing schema,
which will automatically apply without requiring a Solr server restart.
### Reindex
After configuring Solr, you have the option to add your existing
content to its indices. Run the following command:
@ -106,6 +114,13 @@ Note: The Solr indexes will be stored as binary files inside your SilverStripe p
You can also copy the `thirdparty/` solr directory somewhere else,
just set the `path` value in `mysite/_config.php` to point to the new location.
You can also run the reindex task through a web request.
By default, the web request won't receive any feedback while its running.
Depending on your PHP and web server configuration,
the web request itself might time out, but the reindex continues anyway.
This is possible because the actual index operations are run as separate
PHP sub-processes inside the main web request.
### File-based configuration (solrconfig.xml etc)
Many aspects of Solr are configured outside of the `schema.xml` file

View File

@ -67,6 +67,27 @@ Note: There's usually a connector-specific "reindex" task for this.
Note that for most connectors, changes won't be searchable until _after_ the request that triggered the change.
## Automatic Index Updates
Every change, addition or removal of an indexed class instance triggers an index update through a
"processor" object. The update is transparently handled through inspecting every executed database query
and checking which database tables are involved in it.
Index updates usually are executed in the same request which caused the index to become "dirty".
For example, a CMS author might have edited a page, or a user has left a new comment.
In order to minimise delays to those users, the index update is deferred until after
the actual request returns to the user, through PHP's `register_shutdown_function()` functionality.
If the [queuedjobs](https://github.com/silverstripe-australia/silverstripe-queuedjobs) module is installed,
updates are queued up instead of executed in the same request. Queue jobs are usually processed every minute.
Large index updates will be batched into multiple queue jobs to ensure a job can run to completion within
common execution constraints (memory and time limits). You can check the status of jobs in
an administrative interface under `admin/queuedjobs/`.
## Manual Index Updates
Manual updates are connector specific, please check the connector docs for details.
## Searching Specific Fields
By default, the index searches through all indexed fields.