mirror of
https://github.com/silverstripe/silverstripe-framework
synced 2024-10-22 14:05:37 +02:00
48 lines
1.6 KiB
Plaintext
48 lines
1.6 KiB
Plaintext
|
html5lib - php flavour
|
||
|
|
||
|
This is an implementation of the tokenization and tree-building parts
|
||
|
of the HTML5 specification in PHP. Potential uses of this library
|
||
|
can be found in web-scrapers and HTML filters.
|
||
|
|
||
|
Warning: This is a pre-alpha release, and as such, certain parts of
|
||
|
this code are not up-to-snuff (e.g. error reporting and performance).
|
||
|
However, the code is very close to spec and passes 100% of tests
|
||
|
not related to parse errors. Nevertheless, expect to have to update
|
||
|
your code on the next upgrade.
|
||
|
|
||
|
|
||
|
Usage notes:
|
||
|
|
||
|
<?php
|
||
|
require_once '/path/to/HTML5/Parser.php';
|
||
|
$dom = HTML5_Parser::parse('<html><body>...');
|
||
|
$nodelist = HTML5_Parser::parseFragment('<b>Boo</b><br>');
|
||
|
$nodelist = HTML5_Parser::parseFragment('<td>Bar</td>', 'table');
|
||
|
|
||
|
|
||
|
Documentation:
|
||
|
|
||
|
HTML5_Parser::parse($text)
|
||
|
$text : HTML to parse
|
||
|
return : DOMDocument of parsed document
|
||
|
|
||
|
HTML5_Parser::parseFragment($text, $context)
|
||
|
$text : HTML to parse
|
||
|
$context : String name of context element
|
||
|
return : DOMDocument of parsed document
|
||
|
|
||
|
|
||
|
Developer notes:
|
||
|
|
||
|
* To setup unit tests, you need to add a small stub file test-settings.php
|
||
|
that contains $simpletest_location = 'path/to/simpletest/'; This needs to
|
||
|
be version 1.1 (or, until that is released, SVN trunk) of SimpleTest.
|
||
|
|
||
|
* We don't want to ultimately use PHP's DOM because it is not tolerant
|
||
|
of certain types of errors that HTML 5 allows (for example, an element
|
||
|
"foo@bar"). But the current implementation uses it, since it's easy.
|
||
|
Eventually, this html5lib implementation will get a version of SimpleTree;
|
||
|
and may possibly start using that by default.
|
||
|
|
||
|
vim: et sw=4 sts=4
|