Daniel Hensby
eb25505a8e
Merge pull request #2 from cam-findlay/patch-1
2017-11-23 13:18:44 +00:00
Alexandre Guidet
196007314a
fixed the version comparison using version_compare() instead of plain float
2016-10-19 15:46:30 +13:00
Daniel Hensby
e9e33605b4
FIX PDFTextExtractor no longer smushes words together than break across lines
2016-10-03 23:59:18 +01:00
Jake Bentvelzen
75ffe7b56a
fix(PDFTextExtractor): Added support for Windows, but only if 'binary_location' is defined. Updated documentation to inform the user of this.
2016-05-13 15:07:33 +10:00
Damian Mooyman
f72ba3a978
API Whitelist bin paths for pdftotext
2016-02-25 16:40:25 +13:00
helpfulrobot
8e14595f1a
Converted to PSR-2
2015-11-18 17:07:31 +13:00
Loz Calver
9ea4b79543
FIX: SolrCellTextExtractor always reporting itself as unavailable ( fixes #14 )
2015-06-08 12:42:31 +01:00
Christopher Pitt
fbc31692e7
Using Symfony mime type detection
2015-05-13 21:36:05 +12:00
Ingo Schommer
da6c554acb
Check file existence in for_file()
...
finfo() will silently fail the whole request (at least on my PHP 5.4 install)
if invoked on a file that doesn't exist, so fail early here.
2015-05-12 16:45:03 +12:00
Damian Mooyman
1ad9e46727
API Support tika server
2015-02-25 17:55:41 +13:00
Damian Mooyman
2977f85cb5
API Implement Tika support
...
API Implement support for detection via mime-type as well as file extension
API Implement FileContent property for safe usage in templates
API instead of returning the list of extensions / mime types supported, support is determined on a per-file bases
Marking dev-master as version 2.0 as this contains breaking changes
2015-02-20 15:12:20 +13:00
cam-findlay
a34c443be5
FIX additional exception handling for Tika errors return via Guzzle.
...
Tika server errors via Guzzle can cause the Solr search query to return a 500 error and breaks search results pages for users. Issues was relating to uncaught exceptions from Guzzle causing a silent fail if a text file is perhaps unreadable or missing (return null never occurs which breaks the search).
2013-06-07 10:42:38 +12:00
Ingo Schommer
b32bc08dc4
More resilience in SolrCellTextExtractor
...
Shouldn't outright fail the request if a file can't be found
2013-05-07 19:27:06 +02:00
Ingo Schommer
b86483abc4
3.1 compat
2013-05-07 18:47:56 +02:00
Ingo Schommer
f2c8df2348
BUG Exclude meta info from SolrCell content retrieval
...
Was matching </str> greedily, which included too much content
2013-03-11 00:56:44 +01:00
Ingo Schommer
9af389f51b
NEW SolrCellTextExtractor
2013-02-01 15:35:16 +01:00
Ingo Schommer
14816075b8
FIX Case insensitive extension matching
2013-02-01 15:34:54 +01:00
Ingo Schommer
788a49bf9f
BUG Improved HTMLTextExtractor, remove non-content tags
2012-09-06 13:41:21 +02:00
Ingo Schommer
733644d6bb
Better shell execution feedback from PDF extractor
2012-08-27 11:31:53 +02:00
Ingo Schommer
f3fcf60c0f
FileTextExtractor->isAvailable()
2012-08-22 18:25:55 +02:00
Ingo Schommer
977c4e49c9
API Using paths instead of File objects in extractors
...
Makes coupling to File objects optional, by choosing
to use the FileTextExtractable extension.
2012-08-22 18:25:12 +02:00
Ingo Schommer
ec0921c6d1
Initial commit
2012-08-22 17:52:08 +02:00