mirror of
https://github.com/silverstripe/silverstripe-textextraction
synced 2024-10-22 11:06:00 +02:00
code | ||
tests | ||
_config.php | ||
README.md |
Text Extraction Module
Overview
Provides an extraction API for file content, which can hook into different extractor engines based on availability and the parsed file format. The output is always a string: the file content.
Via the FileTextExtractable
extension, this logic can be used to
cache the extracted content on a DataObject
subclass (usually File
).
Note: Previously part of the sphinx module.
Requirements
- SilverStripe 3.0
- (optional) XPDF (
pdftotext
utility)
Configuration
No configuration is required, unless you want to make
the content available through your DataObject
subclass.
In this case, add the following to mysite/_config.php
:
DataObject::add_extension('File', 'FileTextExtractable');
Usage
Manual extraction:
$myFile = '/my/path/myfile.pdf';
$extractor = FileTextExtractor::for_file($myFile);
$content = $extractor->getContent($myFile);
DataObject extraction:
$myFileObj = File::get()->First();
$content = $myFileObj->extractFileAsText();