Commit Graph

190 Commits

Author SHA1 Message Date
Damian Mooyman 1f4083dda4 BUG Fix incorrect cache key generation 2015-05-12 15:23:14 +12:00
Ingo Schommer 8aca06aef2 Truncate FileContentCache by default to avoid SQL query errors
MySQL has a packet limit of 1MB as a default
(http://dev.mysql.com/doc/refman/5.0/en/packet-too-large.html).
This interferes with the UPDATE queries required
to add file content caches. Since the query can't be terminated
correctly, the whole content will be discarded with a query error.

This change allows to truncate content prior to the UPDATE operation,
and defaults to 500 characters. This leaves some room for multibyte
characters as well as other parts of the SQL query.
2015-05-07 19:14:02 +12:00
Ingo Schommer 72ce8fc0bc Improved Tika error logging 2015-05-07 12:06:59 +12:00
Christopher Pitt adb71a7823 Merge pull request #8 from tractorcow/pulls/lock-dispatcher
Specify known-working version of stable dependency for php 5.3.3
2015-05-06 13:52:51 +12:00
Damian Mooyman 3ffb303a0b Specify known-working version of stable dependency for php 5.3.3 2015-05-06 13:47:17 +12:00
Ingo Schommer 62637c6197 Merge pull request #7 from tractorcow/pulls/2.0/cache-options
Provide alternative backends for caching of extracted content
2015-05-05 18:21:08 +12:00
Damian Mooyman 98fd4228f9 Provide alternative backends for caching of extracted content
Implement Flushable for clearing the cache
2015-05-05 17:22:45 +12:00
Ingo Schommer 98a83a5bca Clarified Tika docs 2015-04-30 11:39:11 +12:00
Ingo Schommer 1224f0939d Improved Tika docs 2015-04-29 11:59:34 +12:00
Damian Mooyman fb70c1dd50 Merge pull request #6 from assertchris/php-5-3-compat
Downgraded Guzzle version
2015-03-05 14:11:11 +13:00
Christopher Pitt b7488577ad Downgraded Guzzle version 2015-03-05 13:57:31 +13:00
Ingo Schommer 4400443163 Small spelling fixes 2015-02-26 23:11:31 +13:00
Ingo Schommer c813d234f0 Merge pull request #5 from tractorcow/pulls/tika-support
API Support tika server
2015-02-26 22:50:36 +13:00
Damian Mooyman 1ad9e46727 API Support tika server 2015-02-25 17:55:41 +13:00
Ingo Schommer 23d83b7d01 Merge pull request #4 from tractorcow/pulls/tika-support
API Implement Tika support
2015-02-23 10:52:34 +13:00
Damian Mooyman 2977f85cb5 API Implement Tika support
API Implement support for detection via mime-type as well as file extension
API Implement FileContent property for safe usage in templates
API instead of returning the list of extensions / mime types supported, support is determined on a per-file bases
Marking dev-master as version 2.0 as this contains breaking changes
2015-02-20 15:12:20 +13:00
Sam Minnee 526de4586c FIX: Fixed broken test caused by file being modified. 2014-02-18 10:42:54 +13:00
Sam Minnee e56bdf5e27 Made readme example less specific 2014-02-18 10:28:02 +13:00
cam-findlay a34c443be5 FIX additional exception handling for Tika errors return via Guzzle.
Tika server errors via Guzzle can cause the Solr search query to return a 500 error and breaks search results pages for users. Issues was relating to uncaught exceptions from Guzzle causing a silent fail if a text file is perhaps unreadable or missing (return null never occurs which breaks the search).
2013-06-07 10:42:38 +12:00
Ingo Schommer a380bb7c8f Don't write file, since it'd rename the file and make it inaccessible for subsequent tests 2013-05-07 22:21:56 +02:00
Ingo Schommer 30223e4f7c 3.1 compat 2013-05-07 21:54:51 +02:00
Ingo Schommer 49316d99ff Travis support 2013-05-07 21:49:32 +02:00
Ingo Schommer 24a055a741 More docs on how to use extraction with Solr 2013-05-07 20:14:01 +02:00
Ingo Schommer b32bc08dc4 More resilience in SolrCellTextExtractor
Shouldn't outright fail the request if a file can't be found
2013-05-07 19:27:06 +02:00
Ingo Schommer b86483abc4 3.1 compat 2013-05-07 18:47:56 +02:00
Ingo Schommer b5c663570a Merge pull request #1 from jnv/patch-1
Fix description in composer.json
2013-04-11 00:58:47 -07:00
Jan Vlnas 55b8bc28c1 Fix description in composer.json 2013-03-13 23:59:40 +01:00
Ingo Schommer f2c8df2348 BUG Exclude meta info from SolrCell content retrieval
Was matching </str> greedily, which included too much content
2013-03-11 00:56:44 +01:00
Ingo Schommer 9af389f51b NEW SolrCellTextExtractor 2013-02-01 15:35:16 +01:00
Ingo Schommer 14816075b8 FIX Case insensitive extension matching 2013-02-01 15:34:54 +01:00
Ingo Schommer a6cc647d01 Added composer.json 2013-01-07 14:07:39 +01:00
Ingo Schommer 788a49bf9f BUG Improved HTMLTextExtractor, remove non-content tags 2012-09-06 13:41:21 +02:00
Ingo Schommer 733644d6bb Better shell execution feedback from PDF extractor 2012-08-27 11:31:53 +02:00
Ingo Schommer 478ab65db7 Added License 2012-08-22 23:23:34 +02:00
Ingo Schommer 847a4e0694 Updated README 2012-08-22 23:22:46 +02:00
Ingo Schommer f3fcf60c0f FileTextExtractor->isAvailable() 2012-08-22 18:25:55 +02:00
Ingo Schommer 977c4e49c9 API Using paths instead of File objects in extractors
Makes coupling to File objects optional, by choosing
to use the FileTextExtractable extension.
2012-08-22 18:25:12 +02:00
Ingo Schommer 7de717b0bd 3.0 compat 2012-08-22 18:24:38 +02:00
Ingo Schommer 98f847c946 Added rudimentary test coverage 2012-08-22 18:23:06 +02:00
Ingo Schommer ec0921c6d1 Initial commit 2012-08-22 17:52:08 +02:00