Commit Graph

178 Commits

Author SHA1 Message Date
Ingo Schommer
c813d234f0 Merge pull request #5 from tractorcow/pulls/tika-support
API Support tika server
2015-02-26 22:50:36 +13:00
Damian Mooyman
1ad9e46727 API Support tika server 2015-02-25 17:55:41 +13:00
Ingo Schommer
23d83b7d01 Merge pull request #4 from tractorcow/pulls/tika-support
API Implement Tika support
2015-02-23 10:52:34 +13:00
Damian Mooyman
2977f85cb5 API Implement Tika support
API Implement support for detection via mime-type as well as file extension
API Implement FileContent property for safe usage in templates
API instead of returning the list of extensions / mime types supported, support is determined on a per-file bases
Marking dev-master as version 2.0 as this contains breaking changes
2015-02-20 15:12:20 +13:00
Sam Minnee
526de4586c FIX: Fixed broken test caused by file being modified. 2014-02-18 10:42:54 +13:00
Sam Minnee
e56bdf5e27 Made readme example less specific 2014-02-18 10:28:02 +13:00
cam-findlay
a34c443be5 FIX additional exception handling for Tika errors return via Guzzle.
Tika server errors via Guzzle can cause the Solr search query to return a 500 error and breaks search results pages for users. Issues was relating to uncaught exceptions from Guzzle causing a silent fail if a text file is perhaps unreadable or missing (return null never occurs which breaks the search).
2013-06-07 10:42:38 +12:00
Ingo Schommer
a380bb7c8f Don't write file, since it'd rename the file and make it inaccessible for subsequent tests 2013-05-07 22:21:56 +02:00
Ingo Schommer
30223e4f7c 3.1 compat 2013-05-07 21:54:51 +02:00
Ingo Schommer
49316d99ff Travis support 2013-05-07 21:49:32 +02:00
Ingo Schommer
24a055a741 More docs on how to use extraction with Solr 2013-05-07 20:14:01 +02:00
Ingo Schommer
b32bc08dc4 More resilience in SolrCellTextExtractor
Shouldn't outright fail the request if a file can't be found
2013-05-07 19:27:06 +02:00
Ingo Schommer
b86483abc4 3.1 compat 2013-05-07 18:47:56 +02:00
Ingo Schommer
b5c663570a Merge pull request #1 from jnv/patch-1
Fix description in composer.json
2013-04-11 00:58:47 -07:00
Jan Vlnas
55b8bc28c1 Fix description in composer.json 2013-03-13 23:59:40 +01:00
Ingo Schommer
f2c8df2348 BUG Exclude meta info from SolrCell content retrieval
Was matching </str> greedily, which included too much content
2013-03-11 00:56:44 +01:00
Ingo Schommer
9af389f51b NEW SolrCellTextExtractor 2013-02-01 15:35:16 +01:00
Ingo Schommer
14816075b8 FIX Case insensitive extension matching 2013-02-01 15:34:54 +01:00
Ingo Schommer
a6cc647d01 Added composer.json 2013-01-07 14:07:39 +01:00
Ingo Schommer
788a49bf9f BUG Improved HTMLTextExtractor, remove non-content tags 2012-09-06 13:41:21 +02:00
Ingo Schommer
733644d6bb Better shell execution feedback from PDF extractor 2012-08-27 11:31:53 +02:00
Ingo Schommer
478ab65db7 Added License 2012-08-22 23:23:34 +02:00
Ingo Schommer
847a4e0694 Updated README 2012-08-22 23:22:46 +02:00
Ingo Schommer
f3fcf60c0f FileTextExtractor->isAvailable() 2012-08-22 18:25:55 +02:00
Ingo Schommer
977c4e49c9 API Using paths instead of File objects in extractors
Makes coupling to File objects optional, by choosing
to use the FileTextExtractable extension.
2012-08-22 18:25:12 +02:00
Ingo Schommer
7de717b0bd 3.0 compat 2012-08-22 18:24:38 +02:00
Ingo Schommer
98f847c946 Added rudimentary test coverage 2012-08-22 18:23:06 +02:00
Ingo Schommer
ec0921c6d1 Initial commit 2012-08-22 17:52:08 +02:00