mirror of
https://github.com/silverstripe/silverstripe-framework
synced 2024-10-22 12:05:37 +00:00
html5lib - php flavour This is an implementation of the tokenization and tree-building parts of the HTML5 specification in PHP. Potential uses of this library can be found in web-scrapers and HTML filters. Warning: This is a pre-alpha release, and as such, certain parts of this code are not up-to-snuff (e.g. error reporting and performance). However, the code is very close to spec and passes 100% of tests not related to parse errors. Nevertheless, expect to have to update your code on the next upgrade. Usage notes: <?php require_once '/path/to/HTML5/Parser.php'; $dom = HTML5_Parser::parse('<html><body>...'); $nodelist = HTML5_Parser::parseFragment('<b>Boo</b><br>'); $nodelist = HTML5_Parser::parseFragment('<td>Bar</td>', 'table'); Documentation: HTML5_Parser::parse($text) $text : HTML to parse return : DOMDocument of parsed document HTML5_Parser::parseFragment($text, $context) $text : HTML to parse $context : String name of context element return : DOMDocument of parsed document Developer notes: * To setup unit tests, you need to add a small stub file test-settings.php that contains $simpletest_location = 'path/to/simpletest/'; This needs to be version 1.1 (or, until that is released, SVN trunk) of SimpleTest. * We don't want to ultimately use PHP's DOM because it is not tolerant of certain types of errors that HTML 5 allows (for example, an element "foo@bar"). But the current implementation uses it, since it's easy. Eventually, this html5lib implementation will get a version of SimpleTree; and may possibly start using that by default. vim: et sw=4 sts=4