Commit Graph

6 Commits

Author SHA1 Message Date
Robbie Averill edb02e9189 API FileTextExtractable::getContent now takes a File instance instead of a path 2018-07-03 15:55:02 +12:00
Russell Michell f341010d7a FIX: First-pass SS4 compatibility.
- Added namespaces, use statements
- Added missing docblocks etc
- Uses SS4's new Cache system
- Uses proper environment vars
- Cannot instantiate 'FileTextCache' (interface) as a service. This can be configured through YML, so default to FileTextCache_Cache
- Modded YML config to make it run.
- Fixes to allow TIKA to actually get file contents.
- Addresses issues raised by @robbieaverill
- Rebased against github.com/silverstripe/silverstripe-textextraction:master
- Replaced `SS_Log` with Monolog.
2017-12-21 10:41:06 +13:00
Daniel Hensby aaf9238384
FIX UnexpectedValueException thrown when trying to set SolrCellTextExtraction.base_url in config 2016-10-03 20:19:30 +01:00
Ingo Schommer 8aca06aef2 Truncate FileContentCache by default to avoid SQL query errors
MySQL has a packet limit of 1MB as a default
(http://dev.mysql.com/doc/refman/5.0/en/packet-too-large.html).
This interferes with the UPDATE queries required
to add file content caches. Since the query can't be terminated
correctly, the whole content will be discarded with a query error.

This change allows to truncate content prior to the UPDATE operation,
and defaults to 500 characters. This leaves some room for multibyte
characters as well as other parts of the SQL query.
2015-05-07 19:14:02 +12:00
Damian Mooyman 98fd4228f9 Provide alternative backends for caching of extracted content
Implement Flushable for clearing the cache
2015-05-05 17:22:45 +12:00
Ingo Schommer 9af389f51b NEW SolrCellTextExtractor 2013-02-01 15:35:16 +01:00