Go to file
Alex Saelens 49a829f7c9
Remedy deprecated dynamic variable declaration (#354)
Co-authored-by: Michal Kleiner <mk@011.nz>
2024-01-15 15:24:13 +13:00
_config Merge branch '3.6' into 3.7 2020-11-06 08:56:01 +13:00
.github/workflows Merge branch '3' into 4.0 2023-03-30 13:16:25 +13:00
.tx ENH Update translations (#343) 2023-03-09 14:57:13 +13:00
bin Remove unused class imports, import docblock reference for Apache_Solr_Response, use strict comparison 2019-05-17 09:51:20 +12:00
conf/solr NEW Ensure commits are visible to seachers (fixes #274) 2020-04-07 17:22:43 +12:00
docs Merge pull request #278 from open-sausages/pulls/3/autosoftcommit 2022-05-17 11:39:36 +12:00
lang ENH Update translations (#352) 2023-08-21 12:51:35 +12:00
src Remedy deprecated dynamic variable declaration (#354) 2024-01-15 15:24:13 +13:00
templates/Layout Added translatable title 2019-09-11 16:41:53 +03:00
tests DEP PHP Support in CMS5 2023-01-16 15:14:55 +13:00
.editorconfig Added standard editor config 2015-11-19 13:25:25 +13:00
.gitattributes Make docs available in composer require without preferring source to ensure vendor/bin/fulltextsearch_quickstart script has access to the files it needs and users have docs available available locally. 2019-06-01 13:42:38 +12:00
.gitignore Merge branch '2' into 3 2020-04-07 08:53:59 +01:00
code-of-conduct.md Added standard code of conduct 2015-11-21 20:13:42 +13:00
codecov.yml FIX Update Travis configuration for SS4, add phpunit config and necessary composer updates 2017-11-15 09:43:36 +13:00
composer.json Merge branch '3' into 4.0 2023-04-27 14:40:35 +12:00
license.md Updated license year 2016-01-01 06:45:22 +13:00
phpcs.xml.dist MNT Shared travis configs 2020-11-06 08:57:55 +13:00
phpunit.xml.dist MNT Standardise modules 2022-08-01 16:22:01 +12:00
README.md DOC Update README.md for CMS 5 2023-04-21 15:47:21 +12:00

FullTextSearch module

CI Silverstripe supported module

Adds support for fulltext search engines like Sphinx and Solr to Silverstripe CMS.

Installation

composer require silverstripe/fulltextsearch

Enable indexing of draft content:

You can index draft content with the following yml configuration:

SilverStripe\FullTextSearch\Search\Services\SearchableService:
  variant_state_draft_excluded: false

However, when set to false, it will still only index draft content when a DataObject is in a published state, not a draft-only or modified state. This is because it will still fail the new anonymous user canView() check in SearchableService::isSearchable() and be automatically deleted from the index.

If you wish to also index draft content when a DataObject is in a draft-only or a modified state, then you'll need to also configure SearchableService::indexing_canview_exclude_classes. See below for instructions on how to do this.

Disabling the anonymous user canView() pre-index check

You can apply configuration to remove the new pre-index canView() check from your DataObjects if it is not necessary, or if it impedes expected functionality (e.g. for sites where users must authenticate to view any content). This will also disable the check for descendants of the specified DataObjects. Ensure that your implementation of fulltextsearch is correctly performing a canView() check at query time before disabling the pre-index check, as this may result in leakage of private data.

SilverStripe\FullTextSearch\Search\Services\SearchableService:
  indexing_canview_exclude_classes:
    - Some\Org\MyDataObject
    # This will disable the check for all pagetypes:
    - SilverStripe\CMS\Model\SiteTree

You can also use the updateIsSearchable extension point on SearchableService to modify the result of the method after the ShowInSearch and canView() checks have run.

It is highly recommend you run a solr_reindex on your production site after upgrading from 3.6 or earlier to purge any old data that should no longer be in the search index.

These additional check can have an impact on the reindex performance due to additional queries for permission checks. If your site also indexes content in files, such as pdf's or docx's, using the text-extraction module which is fairly time-intensive, then the relative performance impact of the canView() checks won't be as noticeable.

Details on filtering before adding content to the solr index

  • SearchableService::isIndexable() check in SolrReindexBase. Used when indexing all records during Solr reindex.
  • SearchableService::isIndexable() check in SearchUpdateProcessor. Used when indexing single records during DataObject->write().

Details on filtering when extracting results from the solr index

  • SearchableService::isViewable() check in SolrIndex. This will often be used in CWP implementations that use the CwpSearchEngine class, as well as most custom implementations that call MySearchIndex->search()
  • SearchableService::isViewable() check in SearchForm. This will be used in solr implementations where a /SearchForm url is used to display search results.
  • Some implementations will call SearchableService::isViewable() twice. If this happens then the first call will be cached in memory so there is virtually no performance penalty calling it a second time.
  • If your implementation is very custom and does not subclass nor make use of either SolrIndex or SearchForm, then it's recommended you update your implementation to call SearchableService::isViewable().

Documentation

For pure Solr docs, check out the Solr 4.10.4 guide.

See the docs for configuration and setup, or for the quick version see the quick start guide.

For details of updates, bugfixes, and features, please see the changelog.

TODO

  • Get rid of includeSubclasses - isn't actually used in practice, makes the codebase uglier, and ClassHierarchy can be used at query time for most of the same use cases

  • Fix field referencing in queries. Should be able to do $query->search('Text', 'Content'), not $query->search('Text', SiteTree::class . '_Content') like you have to do now

    • Make sure that when field exists in multiple classes, searching against bare fields searches all of them

    • Allow searching against specific instances too

  • Make fields restrictable by class in an index - 'SiteTree#Content' to limit fields to a particular class, maybe 'Content->Summary' to allow calling a specific method on the field object to get the text

  • Allow following user relationships (Children.Foo for example)

  • Be clearer about what happens with relationships to stateful objects (e.g. Parent.Foo where Parent is versioned)

  • Improvements to SearchUpdater

    • Make it work properly when in-between objects (the A in A.B.Foo) update

    • Allow user logic to cause triggering reindex of documents when field is user generated

  • Add generic APIs for spell correction, file text extraction and snippet generation