Commit Graph

16 Commits

Author SHA1 Message Date
Damian Mooyman
98fd4228f9 Provide alternative backends for caching of extracted content
Implement Flushable for clearing the cache
2015-05-05 17:22:45 +12:00
Christopher Pitt
b7488577ad Downgraded Guzzle version 2015-03-05 13:57:31 +13:00
Damian Mooyman
1ad9e46727 API Support tika server 2015-02-25 17:55:41 +13:00
Damian Mooyman
2977f85cb5 API Implement Tika support
API Implement support for detection via mime-type as well as file extension
API Implement FileContent property for safe usage in templates
API instead of returning the list of extensions / mime types supported, support is determined on a per-file bases
Marking dev-master as version 2.0 as this contains breaking changes
2015-02-20 15:12:20 +13:00
Ingo Schommer
30223e4f7c 3.1 compat 2013-05-07 21:54:51 +02:00
Ingo Schommer
b32bc08dc4 More resilience in SolrCellTextExtractor
Shouldn't outright fail the request if a file can't be found
2013-05-07 19:27:06 +02:00
Ingo Schommer
b86483abc4 3.1 compat 2013-05-07 18:47:56 +02:00
Ingo Schommer
f2c8df2348 BUG Exclude meta info from SolrCell content retrieval
Was matching </str> greedily, which included too much content
2013-03-11 00:56:44 +01:00
Ingo Schommer
9af389f51b NEW SolrCellTextExtractor 2013-02-01 15:35:16 +01:00
Ingo Schommer
14816075b8 FIX Case insensitive extension matching 2013-02-01 15:34:54 +01:00
Ingo Schommer
788a49bf9f BUG Improved HTMLTextExtractor, remove non-content tags 2012-09-06 13:41:21 +02:00
Ingo Schommer
733644d6bb Better shell execution feedback from PDF extractor 2012-08-27 11:31:53 +02:00
Ingo Schommer
f3fcf60c0f FileTextExtractor->isAvailable() 2012-08-22 18:25:55 +02:00
Ingo Schommer
977c4e49c9 API Using paths instead of File objects in extractors
Makes coupling to File objects optional, by choosing
to use the FileTextExtractable extension.
2012-08-22 18:25:12 +02:00
Ingo Schommer
7de717b0bd 3.0 compat 2012-08-22 18:24:38 +02:00
Ingo Schommer
ec0921c6d1 Initial commit 2012-08-22 17:52:08 +02:00