Commit Graph

18 Commits

Author SHA1 Message Date
Steve Boyd b92616eb4e API phpunit 9 support 2021-10-27 18:16:05 +13:00
Robbie Averill 32e2f9f84f FIX Ensure test uses database cache, it asserts assuming it is configured 2019-08-28 10:07:21 +12:00
Robbie Averill 86eba78064 Add tests for isAvailable() 2019-02-13 11:23:28 +07:00
Robbie Averill 231a2091af FIX Update Guzzle implementations in Tika extractors 2018-07-06 16:11:59 +12:00
Robbie Averill 9e8ed243d0 Seperate Tika tests, group them for phpunit, further reduce log level, make Extractors injectable 2018-07-03 17:15:18 +12:00
Robbie Averill 6bf932e5f0 FIX unlink call checks that a file exists first, and tests pass a File object 2018-07-03 16:30:05 +12:00
Robbie Averill edb02e9189 API FileTextExtractable::getContent now takes a File instance instead of a path 2018-07-03 15:55:02 +12:00
Robbie Averill fe5148e678 API Add namespaces to tests and update SapphireTest implementation 2018-07-03 11:35:24 +12:00
Russell Michell f341010d7a FIX: First-pass SS4 compatibility.
- Added namespaces, use statements
- Added missing docblocks etc
- Uses SS4's new Cache system
- Uses proper environment vars
- Cannot instantiate 'FileTextCache' (interface) as a service. This can be configured through YML, so default to FileTextCache_Cache
- Modded YML config to make it run.
- Fixes to allow TIKA to actually get file contents.
- Addresses issues raised by @robbieaverill
- Rebased against github.com/silverstripe/silverstripe-textextraction:master
- Replaced `SS_Log` with Monolog.
2017-12-21 10:41:06 +13:00
Damian Mooyman f72ba3a978 API Whitelist bin paths for pdftotext 2016-02-25 16:40:25 +13:00
helpfulrobot 8e14595f1a Converted to PSR-2 2015-11-18 17:07:31 +13:00
Ingo Schommer 8aca06aef2 Truncate FileContentCache by default to avoid SQL query errors
MySQL has a packet limit of 1MB as a default
(http://dev.mysql.com/doc/refman/5.0/en/packet-too-large.html).
This interferes with the UPDATE queries required
to add file content caches. Since the query can't be terminated
correctly, the whole content will be discarded with a query error.

This change allows to truncate content prior to the UPDATE operation,
and defaults to 500 characters. This leaves some room for multibyte
characters as well as other parts of the SQL query.
2015-05-07 19:14:02 +12:00
Damian Mooyman 1ad9e46727 API Support tika server 2015-02-25 17:55:41 +13:00
Damian Mooyman 2977f85cb5 API Implement Tika support
API Implement support for detection via mime-type as well as file extension
API Implement FileContent property for safe usage in templates
API instead of returning the list of extensions / mime types supported, support is determined on a per-file bases
Marking dev-master as version 2.0 as this contains breaking changes
2015-02-20 15:12:20 +13:00
Sam Minnee 526de4586c FIX: Fixed broken test caused by file being modified. 2014-02-18 10:42:54 +13:00
Ingo Schommer a380bb7c8f Don't write file, since it'd rename the file and make it inaccessible for subsequent tests 2013-05-07 22:21:56 +02:00
Ingo Schommer 788a49bf9f BUG Improved HTMLTextExtractor, remove non-content tags 2012-09-06 13:41:21 +02:00
Ingo Schommer 98f847c946 Added rudimentary test coverage 2012-08-22 18:23:06 +02:00