Go to file
Ishan Jayamanne 21ed6e0f86 Update isAvailable check to work for identical versions
Tika server reports it's version as "Apache Tika 1.7". Unfortunately, `version_compare` in PHP says that version "1.7" is less than version "1.7.0", meaning that Tika server was incorrectly being ruled out unless you used Tika server version 1.8 (where "1.8" > "1.7.0").

Changing the comparison string to just "1.7" means they match exactly, and therefore `version_compare` will return `0` rather than `-1`.
2019-02-13 11:15:54 +07:00
.travis Move to new travis containerised infrastructure 2015-08-25 15:28:20 +01:00
_config API FileTextExtractable::getContent now takes a File instance instead of a path 2018-07-03 15:55:02 +12:00
docs/en FIX Update Guzzle implementations in Tika extractors 2018-07-06 16:11:59 +12:00
src Update isAvailable check to work for identical versions 2019-02-13 11:15:54 +07:00
tests FIX Update Guzzle implementations in Tika extractors 2018-07-06 16:11:59 +12:00
.editorconfig Added standard editor config 2015-11-19 13:27:10 +13:00
.gitattributes Update gitattributes and Scrutinizer configuration 2018-07-03 11:36:04 +12:00
.scrutinizer.yml Update gitattributes and Scrutinizer configuration 2018-07-03 11:36:04 +12:00
.travis.yml Add phpunit/phpcs configuration and update Travis configuration 2018-07-03 11:35:52 +12:00
.upgrade.yml API Update namespaces for FileTextCache and add upgrader mapping 2018-07-03 11:23:27 +12:00
CONTRIBUTING.md Add supported module standard docs 2015-11-07 14:06:23 +13:00
README.md Update readme badges and requirements for SilverStripe 4 2018-07-03 10:47:56 +12:00
code-of-conduct.md Added standard code of conduct 2015-11-21 20:17:44 +13:00
codecov.yml Add phpunit/phpcs configuration and update Travis configuration 2018-07-03 11:35:52 +12:00
composer.json Remove obsolete branch alias 2018-07-09 10:03:13 +12:00
license.md Bump license year 2018-07-03 10:48:02 +12:00
phpcs.xml.dist Update codebase to ensure relative PSR-2 compliance 2018-07-03 11:37:38 +12:00
phpunit.xml.dist Update broken path in phpunit configuration 2018-07-03 16:00:31 +12:00

README.md

Text extraction module

Build Status Scrutinizer Code Quality codecov SilverStripe supported module

Provides a text extraction API for file content, that can hook into different extractor engines based on availability and the parsed file format. The output returned is always a string of the file content.

Via the FileTextExtractable extension, this logic can be used to cache the extracted content on a DataObject subclass (usually File).

The module supports text extraction on the following file formats:

  • HTML (built-in)
  • PDF (with XPDF or Solr)
  • Microsoft Word, Excel, Powerpoint (Solr)
  • OpenOffice (Solr)
  • CSV (Solr)
  • RTF (Solr)
  • EPub (Solr)
  • Many others (Tika)

Requirements

Installation

composer require silverstripe/textextraction

The module depends on the Guzzle HTTP Library, which is automatically checked out by composer. Alternatively, install Guzzle through PEAR and ensure its in your include_path.

Documentation

Bugtracker

Bugs are tracked in the issues section of this repository. Before submitting an issue please read over existing issues to ensure yours is unique.

If the issue does look like a new bug:

  • Create a new issue
  • Describe the steps required to reproduce your issue, and the expected outcome. Unit tests, screenshots and screencasts can help here.
  • Describe your environment as detailed as possible: SilverStripe version, Browser, PHP version, Operating System, any installed SilverStripe modules.

Please report security issues to security@silverstripe.org directly. Please don't file security issues in the bugtracker.

Development and contribution

If you would like to make contributions to the module please ensure you raise a pull request and discuss with the module maintainers.