Updated README

2024-10-22 09:06:00 +00:00 · 2012-08-22 23:22:07 +02:00 · 2012-08-22 23:22:07 +02:00 · 847a4e0694
commit 847a4e0694
parent f3fcf60c0f
1 changed files with 28 additions and 5 deletions
--- a/README.md
+++ b/README.md
@ -2,14 +2,37 @@

 ## Overview

+Provides an extraction API for file content, which can hook into different extractor
+engines based on availability and the parsed file format.
+The output is always a string: the file content.

-Previously part of the [sphinx module](https://github.com/silverstripe/silverstripe-sphinx).
-
-## Usage
-
+Via the `FileTextExtractable` extension, this logic can be used to 
+cache the extracted content on a `DataObject` subclass (usually `File`).

+Note: Previously part of the [sphinx module](https://github.com/silverstripe/silverstripe-sphinx).

 ## Requirements

 * SilverStripe 3.0
- * (optional) [XPDF](http://www.foolabs.com/xpdf/) (`pdftotext` utility)
+ * (optional) [XPDF](http://www.foolabs.com/xpdf/) (`pdftotext` utility)
+
+## Configuration
+
+No configuration is required, unless you want to make
+the content available through your `DataObject` subclass.
+In this case, add the following to `mysite/_config.php`:
+
+	DataObject::add_extension('File', 'FileTextExtractable');
+
+## Usage
+
+Manual extraction:
+
+	$myFile = '/my/path/myfile.pdf';
+	$extractor = FileTextExtractor::for_file($myFile);
+	$content = $extractor->getContent($myFile);
+
+DataObject extraction:
+
+	$myFileObj = File::get()->First();
+	$content = $myFileObj->extractFileAsText();