You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
Go to file
Sabina Talipova a281114ed2
Merge pull request #77 from creative-commoners/pulls/3/stop-using-depr
2 months ago
.github/workflows MNT Standardise modules 6 months ago
.travis Move to new travis containerised infrastructure 8 years ago
_config API FileTextExtractable::getContent now takes a File instance instead of a path 5 years ago
docs/en DOCS Fix class reference for cache class 11 months ago
src API Stop using deprecated API 2 months ago
tests API phpunit 9 support 1 year ago
.editorconfig Added standard editor config 7 years ago
.gitattributes Update gitattributes and Scrutinizer configuration 5 years ago
.upgrade.yml API Update namespaces for FileTextCache and add upgrader mapping 5 years ago Add supported module standard docs 7 years ago MNT Standardise modules 6 months ago Added standard code of conduct 7 years ago
codecov.yml Add phpunit/phpcs configuration and update Travis configuration 5 years ago
composer.json DEP Set PHP 7.4 as the minimum version 12 months ago Bump license year 5 years ago
phpcs.xml.dist MNT Travis shared config 2 years ago
phpunit.xml.dist MNT Standardise modules 6 months ago

Text extraction module

CI Silverstripe supported module

Provides a text extraction API for file content, that can hook into different extractor engines based on availability and the parsed file format. The output returned is always a string of the file content.

Via the FileTextExtractable extension, this logic can be used to cache the extracted content on a DataObject subclass (usually File).

The module supports text extraction on the following file formats:

  • HTML (built-in)
  • PDF (with XPDF or Solr)
  • Microsoft Word, Excel, Powerpoint (Solr)
  • OpenOffice (Solr)
  • CSV (Solr)
  • RTF (Solr)
  • EPub (Solr)
  • Many others (Tika)



composer require silverstripe/textextraction

The module depends on the Guzzle HTTP Library, which is automatically checked out by composer. Alternatively, install Guzzle through PEAR and ensure its in your include_path.



Bugs are tracked in the issues section of this repository. Before submitting an issue please read over existing issues to ensure yours is unique.

If the issue does look like a new bug:

  • Create a new issue
  • Describe the steps required to reproduce your issue, and the expected outcome. Unit tests, screenshots and screencasts can help here.
  • Describe your environment as detailed as possible: Silverstripe version, Browser, PHP version, Operating System, any installed Silverstripe modules.

Please report security issues to directly. Please don't file security issues in the bugtracker.

Development and contribution

If you would like to make contributions to the module please ensure you raise a pull request and discuss with the module maintainers.