Ingo Schommer
da6c554acb
Check file existence in for_file()
...
finfo() will silently fail the whole request (at least on my PHP 5.4 install)
if invoked on a file that doesn't exist, so fail early here.
2015-05-12 16:45:03 +12:00
Ingo Schommer
15f9647bca
Merge pull request #11 from tractorcow/pulls/invalidate
...
API Only invalidate cache when file is changed
2015-05-12 16:08:45 +12:00
Damian Mooyman
c9d74f83db
API Only invalidate cache when file is changed
2015-05-12 16:01:38 +12:00
Damian Mooyman
6cf09f26c8
Merge pull request #9 from chillu/pulls/tika-logging
...
Improved Tika error logging
2015-05-12 15:27:08 +12:00
Damian Mooyman
6c7ffa2c6f
Merge pull request #10 from chillu/pulls/truncate-db-cache
...
Truncate FileContentCache by default to avoid SQL query errors
2015-05-12 15:25:59 +12:00
Damian Mooyman
1f4083dda4
BUG Fix incorrect cache key generation
2015-05-12 15:23:14 +12:00
Ingo Schommer
8aca06aef2
Truncate FileContentCache by default to avoid SQL query errors
...
MySQL has a packet limit of 1MB as a default
(http://dev.mysql.com/doc/refman/5.0/en/packet-too-large.html ).
This interferes with the UPDATE queries required
to add file content caches. Since the query can't be terminated
correctly, the whole content will be discarded with a query error.
This change allows to truncate content prior to the UPDATE operation,
and defaults to 500 characters. This leaves some room for multibyte
characters as well as other parts of the SQL query.
2015-05-07 19:14:02 +12:00
Ingo Schommer
72ce8fc0bc
Improved Tika error logging
2015-05-07 12:06:59 +12:00
Christopher Pitt
adb71a7823
Merge pull request #8 from tractorcow/pulls/lock-dispatcher
...
Specify known-working version of stable dependency for php 5.3.3
2015-05-06 13:52:51 +12:00
Damian Mooyman
3ffb303a0b
Specify known-working version of stable dependency for php 5.3.3
2015-05-06 13:47:17 +12:00
Ingo Schommer
62637c6197
Merge pull request #7 from tractorcow/pulls/2.0/cache-options
...
Provide alternative backends for caching of extracted content
2015-05-05 18:21:08 +12:00
Damian Mooyman
98fd4228f9
Provide alternative backends for caching of extracted content
...
Implement Flushable for clearing the cache
2015-05-05 17:22:45 +12:00
Ingo Schommer
98a83a5bca
Clarified Tika docs
2015-04-30 11:39:11 +12:00
Ingo Schommer
1224f0939d
Improved Tika docs
2015-04-29 11:59:34 +12:00
Damian Mooyman
fb70c1dd50
Merge pull request #6 from assertchris/php-5-3-compat
...
Downgraded Guzzle version
2015-03-05 14:11:11 +13:00
Christopher Pitt
b7488577ad
Downgraded Guzzle version
2015-03-05 13:57:31 +13:00
Ingo Schommer
4400443163
Small spelling fixes
2015-02-26 23:11:31 +13:00
Ingo Schommer
c813d234f0
Merge pull request #5 from tractorcow/pulls/tika-support
...
API Support tika server
2015-02-26 22:50:36 +13:00
Damian Mooyman
1ad9e46727
API Support tika server
2015-02-25 17:55:41 +13:00
Ingo Schommer
23d83b7d01
Merge pull request #4 from tractorcow/pulls/tika-support
...
API Implement Tika support
2015-02-23 10:52:34 +13:00
Damian Mooyman
2977f85cb5
API Implement Tika support
...
API Implement support for detection via mime-type as well as file extension
API Implement FileContent property for safe usage in templates
API instead of returning the list of extensions / mime types supported, support is determined on a per-file bases
Marking dev-master as version 2.0 as this contains breaking changes
2015-02-20 15:12:20 +13:00
Sam Minnee
526de4586c
FIX: Fixed broken test caused by file being modified.
2014-02-18 10:42:54 +13:00
Sam Minnee
e56bdf5e27
Made readme example less specific
2014-02-18 10:28:02 +13:00
cam-findlay
a34c443be5
FIX additional exception handling for Tika errors return via Guzzle.
...
Tika server errors via Guzzle can cause the Solr search query to return a 500 error and breaks search results pages for users. Issues was relating to uncaught exceptions from Guzzle causing a silent fail if a text file is perhaps unreadable or missing (return null never occurs which breaks the search).
2013-06-07 10:42:38 +12:00
Ingo Schommer
a380bb7c8f
Don't write file, since it'd rename the file and make it inaccessible for subsequent tests
2013-05-07 22:21:56 +02:00
Ingo Schommer
30223e4f7c
3.1 compat
2013-05-07 21:54:51 +02:00
Ingo Schommer
49316d99ff
Travis support
2013-05-07 21:49:32 +02:00
Ingo Schommer
24a055a741
More docs on how to use extraction with Solr
2013-05-07 20:14:01 +02:00
Ingo Schommer
b32bc08dc4
More resilience in SolrCellTextExtractor
...
Shouldn't outright fail the request if a file can't be found
2013-05-07 19:27:06 +02:00
Ingo Schommer
b86483abc4
3.1 compat
2013-05-07 18:47:56 +02:00
Ingo Schommer
b5c663570a
Merge pull request #1 from jnv/patch-1
...
Fix description in composer.json
2013-04-11 00:58:47 -07:00
Jan Vlnas
55b8bc28c1
Fix description in composer.json
2013-03-13 23:59:40 +01:00
Ingo Schommer
f2c8df2348
BUG Exclude meta info from SolrCell content retrieval
...
Was matching </str> greedily, which included too much content
2013-03-11 00:56:44 +01:00
Ingo Schommer
9af389f51b
NEW SolrCellTextExtractor
2013-02-01 15:35:16 +01:00
Ingo Schommer
14816075b8
FIX Case insensitive extension matching
2013-02-01 15:34:54 +01:00
Ingo Schommer
a6cc647d01
Added composer.json
2013-01-07 14:07:39 +01:00
Ingo Schommer
788a49bf9f
BUG Improved HTMLTextExtractor, remove non-content tags
2012-09-06 13:41:21 +02:00
Ingo Schommer
733644d6bb
Better shell execution feedback from PDF extractor
2012-08-27 11:31:53 +02:00
Ingo Schommer
478ab65db7
Added License
2012-08-22 23:23:34 +02:00
Ingo Schommer
847a4e0694
Updated README
2012-08-22 23:22:46 +02:00
Ingo Schommer
f3fcf60c0f
FileTextExtractor->isAvailable()
2012-08-22 18:25:55 +02:00
Ingo Schommer
977c4e49c9
API Using paths instead of File objects in extractors
...
Makes coupling to File objects optional, by choosing
to use the FileTextExtractable extension.
2012-08-22 18:25:12 +02:00
Ingo Schommer
7de717b0bd
3.0 compat
2012-08-22 18:24:38 +02:00
Ingo Schommer
98f847c946
Added rudimentary test coverage
2012-08-22 18:23:06 +02:00
Ingo Schommer
ec0921c6d1
Initial commit
2012-08-22 17:52:08 +02:00