This component provides a unified html parser and writer. The writer allows for readable and correct html in code, not using templates. The parser is a wrapper around both DOMDocument and SimpleXML.
The parser and writer also work on fragments of HTML. The parser also makes sure that the output is identical to the input. When converting a node to a string, \arc\html will return the full html string, including tags. If you don't want that, you can always access the 'nodeValue' property to get the original SimpleXMLElement.
Finally the parser also adds the ability to use basic CSS selectors to find elements in the HTML.
use \arc\html as h;
$htmlString = h::doctype()
.h::html(
h::head(
h::title('Example site')
),
h::body(
['class' => 'homepage'],
h::h1('An example site')
)
);
$html = \arc\html::parse($htmlString);
$title = $html->head->title->nodeValue; // SimpleXMLElement 'Example site'
$titleTag = $html->head->title; // <title>Example site</title>
$title = current($html->find('title'));
The find() method always returns an array, which may be empty. By using current() you get the first element found, or null if nothing was found.
The following CSS selectors are supported:
tag1 tag2
This matchestag2
which is a descendant oftag1
.tag1 > tag2
This matchestag2
which is a direct child oftag1
.tag:first-child
This matchestag
only if its the first child.tag1 + tag2
This matchestag2
only if its immediately preceded bytag1
.tag1 ~ tag2
This matchestag2
only if it has a previous sibling tag1.tag[attr]
This matchestag
if it has the attributeattr
.tag[attr="foo"]
This matchestag
if it has the attributeattr
with the valuefoo
in its value list.tag#id
This matches anytag
with idid
.#id
This matches any element with idid
.tag.class-name
Matches anytag
with a classclass-name
..class-name
Matches any element with a classclass-name
.
The parsed HTML behaves almost identical to a SimpleXMLElement, with the exceptions noted above. So you can access attributes just like SimpleXMLElement allows:
$class = $html->html->body['class'];
$class = $html->html->body->attributes('version');
You can walk through the node tree:
$title = $html->html->head->title;
Any method or property available in SimpleXMLElement is included in \arc\html parsed data.
In addition to SimpleXMLElement methods, you can also call any method and most properties available in DOMElement.
$class = $html->html->body->getAttributes('class');
$title = current($html->getElementsByTagName('title'));
The arc\html parser also accepts partial HTML content. It doesn't require a single root element.
$htmlString = <<< EOF
<li>
<a href="anitem/">An item</a>
</li>
<li>
<a href="anotheritem/">Another item</a>
</li>
EOF;
$html = \arc\html::parse($htmlString);
$links = $html->find('a');
And when you convert the html back to a string, it will still be a partial HTML fragment.
If you parse a single HTML tag, other than <html>
, you must still reference this element to access it:
$htmlString = <<< EOF
<ul>
<li>
<a href="anitem/">An item</a>
</li>
<li>
<a href="anotheritem/">Another item</a>
</li>
</ul>
EOF;
$html = \arc\html::parse($htmlString);
$ul = $html->ul;
arc\html::parse has the following differences:
- When converted to string, it returns the original HTML, without additions you didn't make.
- You can use it with partial HTML fragments.
- No need to remember calling importNode() before appendChild() or insertBefore()
- No need to switch between SimpleXML and DOMDocument, because you need that one method only available in the other API.
- When returning a list of elements, you always get a simple Array, not a magic NodeList.
In addition arc\html doubles as a simple way to generate valid and indented HTML, with readable and self-validating code.