silverstripe-textextraction

mirror of https://github.com/silverstripe/silverstripe-textextraction synced 2024-10-22 11:06:00 +02:00

Author	SHA1	Message	Date
Daniel Hensby	eb25505a8e	Merge pull request #2 from cam-findlay/patch-1	2017-11-23 13:18:44 +00:00
Jake Dale Ovenden	eb7a45865b	Allow username and password in requests to Tika server (#35 )	2017-11-23 10:24:32 +13:00
Juan van den Anker	0761311170	Don't try to save the object to the cache if it has been disabled	2017-02-22 15:17:32 +13:00
Alexandre Guidet	196007314a	fixed the version comparison using version_compare() instead of plain float	2016-10-19 15:46:30 +13:00
Daniel Hensby	e9e33605b4	FIX PDFTextExtractor no longer smushes words together than break across lines	2016-10-03 23:59:18 +01:00
Jake Bentvelzen	75ffe7b56a	fix(PDFTextExtractor): Added support for Windows, but only if 'binary_location' is defined. Updated documentation to inform the user of this.	2016-05-13 15:07:33 +10:00
Damian Mooyman	f72ba3a978	API Whitelist bin paths for pdftotext	2016-02-25 16:40:25 +13:00
helpfulrobot	8e14595f1a	Converted to PSR-2	2015-11-18 17:07:31 +13:00
Loz Calver	9ea4b79543	FIX: SolrCellTextExtractor always reporting itself as unavailable (fixes #14 )	2015-06-08 12:42:31 +01:00
Christopher Pitt	fbc31692e7	Using Symfony mime type detection	2015-05-13 21:36:05 +12:00
Ingo Schommer	da6c554acb	Check file existence in for_file() finfo() will silently fail the whole request (at least on my PHP 5.4 install) if invoked on a file that doesn't exist, so fail early here.	2015-05-12 16:45:03 +12:00
Damian Mooyman	c9d74f83db	API Only invalidate cache when file is changed	2015-05-12 16:01:38 +12:00
Damian Mooyman	6cf09f26c8	Merge pull request #9 from chillu/pulls/tika-logging Improved Tika error logging	2015-05-12 15:27:08 +12:00
Damian Mooyman	6c7ffa2c6f	Merge pull request #10 from chillu/pulls/truncate-db-cache Truncate FileContentCache by default to avoid SQL query errors	2015-05-12 15:25:59 +12:00
Damian Mooyman	1f4083dda4	BUG Fix incorrect cache key generation	2015-05-12 15:23:14 +12:00
Ingo Schommer	8aca06aef2	Truncate FileContentCache by default to avoid SQL query errors MySQL has a packet limit of 1MB as a default (http://dev.mysql.com/doc/refman/5.0/en/packet-too-large.html). This interferes with the UPDATE queries required to add file content caches. Since the query can't be terminated correctly, the whole content will be discarded with a query error. This change allows to truncate content prior to the UPDATE operation, and defaults to 500 characters. This leaves some room for multibyte characters as well as other parts of the SQL query.	2015-05-07 19:14:02 +12:00
Ingo Schommer	72ce8fc0bc	Improved Tika error logging	2015-05-07 12:06:59 +12:00
Damian Mooyman	98fd4228f9	Provide alternative backends for caching of extracted content Implement Flushable for clearing the cache	2015-05-05 17:22:45 +12:00
Christopher Pitt	b7488577ad	Downgraded Guzzle version	2015-03-05 13:57:31 +13:00
Damian Mooyman	1ad9e46727	API Support tika server	2015-02-25 17:55:41 +13:00
Damian Mooyman	2977f85cb5	API Implement Tika support API Implement support for detection via mime-type as well as file extension API Implement FileContent property for safe usage in templates API instead of returning the list of extensions / mime types supported, support is determined on a per-file bases Marking dev-master as version 2.0 as this contains breaking changes	2015-02-20 15:12:20 +13:00
cam-findlay	a34c443be5	FIX additional exception handling for Tika errors return via Guzzle. Tika server errors via Guzzle can cause the Solr search query to return a 500 error and breaks search results pages for users. Issues was relating to uncaught exceptions from Guzzle causing a silent fail if a text file is perhaps unreadable or missing (return null never occurs which breaks the search).	2013-06-07 10:42:38 +12:00
Ingo Schommer	30223e4f7c	3.1 compat	2013-05-07 21:54:51 +02:00
Ingo Schommer	b32bc08dc4	More resilience in SolrCellTextExtractor Shouldn't outright fail the request if a file can't be found	2013-05-07 19:27:06 +02:00
Ingo Schommer	b86483abc4	3.1 compat	2013-05-07 18:47:56 +02:00
Ingo Schommer	f2c8df2348	BUG Exclude meta info from SolrCell content retrieval Was matching </str> greedily, which included too much content	2013-03-11 00:56:44 +01:00
Ingo Schommer	9af389f51b	NEW SolrCellTextExtractor	2013-02-01 15:35:16 +01:00
Ingo Schommer	14816075b8	FIX Case insensitive extension matching	2013-02-01 15:34:54 +01:00
Ingo Schommer	788a49bf9f	BUG Improved HTMLTextExtractor, remove non-content tags	2012-09-06 13:41:21 +02:00
Ingo Schommer	733644d6bb	Better shell execution feedback from PDF extractor	2012-08-27 11:31:53 +02:00
Ingo Schommer	f3fcf60c0f	FileTextExtractor->isAvailable()	2012-08-22 18:25:55 +02:00
Ingo Schommer	977c4e49c9	API Using paths instead of File objects in extractors Makes coupling to File objects optional, by choosing to use the FileTextExtractable extension.	2012-08-22 18:25:12 +02:00
Ingo Schommer	7de717b0bd	3.0 compat	2012-08-22 18:24:38 +02:00
Ingo Schommer	ec0921c6d1	Initial commit	2012-08-22 17:52:08 +02:00

34 Commits