7 DOM::Tiny - Minimalistic HTML/XML DOM parser with CSS selectors
14 my $dom = DOM::Tiny->new('<div><p id="a">Test</p><p id="b">123</p></div>');
17 say $dom->at('#b')->text;
18 say $dom->find('p')->map('text')->join("\n");
19 say $dom->find('[id]')->map(attr => 'id')->join("\n");
22 $dom->find('p[id]')->reverse->each(sub { say $_->{id} });
25 for my $e ($dom->find('p[id]')->each) {
26 say $e->{id}, ':', $e->text;
30 $dom->find('div p')->last->append('<p id="c">456</p>');
31 $dom->find(':not(p)')->map('strip');
38 L<DOM::Tiny> is a minimalistic and relaxed HTML/XML DOM parser with CSS
39 selector support based on L<Mojo::DOM>. It will even try to interpret broken
40 HTML and XML, so you should not use it for validation.
42 =head1 NODES AND ELEMENTS
44 When we parse an HTML/XML fragment, it gets turned into a tree of nodes.
48 <head><title>Hello</title></head>
52 There are currently eight different kinds of nodes, C<cdata>, C<comment>,
53 C<doctype>, C<pi>, C<raw>, C<root>, C<tag> and C<text>. Elements are nodes of
65 While all node types are represented as L<DOM::Tiny> objects, some methods like
66 L</"attr"> and L</"namespace"> only apply to elements.
68 =head1 CASE-SENSITIVITY
70 L<DOM::Tiny> defaults to HTML semantics, that means all tags and attribute
71 names are lowercased and selectors need to be lowercase as well.
74 my $dom = DOM::Tiny->new('<P ID="greeting">Hi!</P>');
75 say $dom->at('p[id]')->text;
77 If XML processing instructions are found, the parser will automatically switch
78 into XML mode and everything becomes case-sensitive.
81 my $dom = DOM::Tiny->new('<?xml version="1.0"?><P ID="greeting">Hi!</P>');
82 say $dom->at('P[ID]')->text;
84 XML detection can also be disabled with the L</"xml"> method.
87 my $dom = DOM::Tiny->new->xml(1)->parse('<P ID="greeting">Hi!</P>');
88 say $dom->at('P[ID]')->text;
90 # Force HTML semantics
91 my $dom = DOM::Tiny->new->xml(0)->parse('<P ID="greeting">Hi!</P>');
92 say $dom->at('p[id]')->text;
96 L<DOM::Tiny> implements the following methods.
100 my $dom = DOM::Tiny->new;
101 my $dom = DOM::Tiny->new('<foo bar="baz">I ♥ DOM::Tiny!</foo>');
103 Construct a new scalar-based L<DOM::Tiny> object and L</"parse"> HTML/XML
104 fragment if necessary.
108 my $trimmed = $dom->all_text;
109 my $untrimmed = $dom->all_text(0);
111 Extract text content from all descendant nodes of this element, smart
112 whitespace trimming is enabled by default.
115 $dom->parse("<div>foo\n<p>bar</p>baz\n</div>")->at('div')->all_text;
118 $dom->parse("<div>foo\n<p>bar</p>baz\n</div>")->at('div')->all_text(0);
122 my $collection = $dom->ancestors;
123 my $collection = $dom->ancestors('div ~ p');
125 Find all ancestor elements of this node matching the CSS selector and return a
126 L<DOM::Tiny::Collection> object containing these elements as L<DOM::Tiny>
127 objects. All selectors from L<DOM::Tiny::CSS/"SELECTORS"> are supported.
129 # List tag names of ancestor elements
130 say $dom->ancestors->map('tag')->join("\n");
134 $dom = $dom->append('<p>I ♥ DOM::Tiny!</p>');
136 Append HTML/XML fragment to this node.
138 # "<div><h1>Test</h1><h2>123</h2></div>"
139 $dom->parse('<div><h1>Test</h1></div>')
140 ->at('h1')->append('<h2>123</h2>')->root;
143 $dom->parse('<p>Test</p>')->at('p')
144 ->child_nodes->first->append(' 123')->root;
146 =head2 append_content
148 $dom = $dom->append_content('<p>I ♥ DOM::Tiny!</p>');
150 Append HTML/XML fragment (for C<root> and C<tag> nodes) or raw content to this
153 # "<div><h1>Test123</h1></div>"
154 $dom->parse('<div><h1>Test</h1></div>')
155 ->at('h1')->append_content('123')->root;
157 # "<!-- Test 123 --><br>"
158 $dom->parse('<!-- Test --><br>')
159 ->child_nodes->first->append_content('123 ')->root;
161 # "<p>Test<i>123</i></p>"
162 $dom->parse('<p>Test</p>')->at('p')->append_content('<i>123</i>')->root;
166 my $result = $dom->at('div ~ p');
168 Find first descendant element of this element matching the CSS selector and
169 return it as a L<DOM::Tiny> object or return C<undef> if none could be found.
170 All selectors from L<DOM::Tiny::CSS/"SELECTORS"> are supported.
172 # Find first element with "svg" namespace definition
173 my $namespace = $dom->at('[xmlns\:svg]')->{'xmlns:svg'};
177 my $hash = $dom->attr;
178 my $foo = $dom->attr('foo');
179 $dom = $dom->attr({foo => 'bar'});
180 $dom = $dom->attr(foo => 'bar');
182 This element's attributes.
184 # Remove an attribute
185 delete $dom->attr->{id};
187 # Attribute without value
188 $dom->attr(selected => undef);
191 say $dom->find('*')->map(attr => 'id')->compact->join("\n");
195 my $collection = $dom->child_nodes;
197 Return a L<DOM::Tiny::Collection> object containing all child nodes of this
198 element as L<DOM::Tiny> objects.
200 # "<p><b>123</b></p>"
201 $dom->parse('<p>Test<b>123</b></p>')->at('p')->child_nodes->first->remove;
204 $dom->parse('<!DOCTYPE html><b>123</b>')->child_nodes->first;
207 $dom->parse('<b>123</b><!-- Test -->')->child_nodes->last->content;
211 my $collection = $dom->children;
212 my $collection = $dom->children('div ~ p');
214 Find all child elements of this element matching the CSS selector and return a
215 L<DOM::Tiny::Collection> object containing these elements as L<DOM::Tiny>
216 objects. All selectors from L<DOM::Tiny::CSS/"SELECTORS"> are supported.
218 # Show tag name of random child element
219 say $dom->children->shuffle->first->tag;
223 my $str = $dom->content;
224 $dom = $dom->content('<p>I ♥ DOM::Tiny!</p>');
226 Return this node's content or replace it with HTML/XML fragment (for C<root>
227 and C<tag> nodes) or raw content.
230 $dom->parse('<div><b>Test</b></div>')->at('div')->content;
232 # "<div><h1>123</h1></div>"
233 $dom->parse('<div><h1>Test</h1></div>')->at('h1')->content('123')->root;
235 # "<p><i>123</i></p>"
236 $dom->parse('<p>Test</p>')->at('p')->content('<i>123</i>')->root;
238 # "<div><h1></h1></div>"
239 $dom->parse('<div><h1>Test</h1></div>')->at('h1')->content('')->root;
242 $dom->parse('<!-- Test --><br>')->child_nodes->first->content;
244 # "<div><!-- 123 -->456</div>"
245 $dom->parse('<div><!-- Test -->456</div>')
246 ->at('div')->child_nodes->first->content(' 123 ')->root;
248 =head2 descendant_nodes
250 my $collection = $dom->descendant_nodes;
252 Return a L<DOM::Tiny::Collection> object containing all descendant nodes of
253 this element as L<DOM::Tiny> objects.
255 # "<p><b>123</b></p>"
256 $dom->parse('<p><!-- Test --><b>123<!-- 456 --></b></p>')
257 ->descendant_nodes->grep(sub { $_->type eq 'comment' })
258 ->map('remove')->first;
260 # "<p><b>test</b>test</p>"
261 $dom->parse('<p><b>123</b>456</p>')
262 ->at('p')->descendant_nodes->grep(sub { $_->type eq 'text' })
263 ->map(content => 'test')->first->root;
267 my $collection = $dom->find('div ~ p');
269 Find all descendant elements of this element matching the CSS selector and
270 return a L<DOM::Tiny::Collection> object containing these elements as
271 L<DOM::Tiny> objects. All selectors from L<DOM::Tiny::CSS/"SELECTORS"> are
274 # Find a specific element and extract information
275 my $id = $dom->find('div')->[23]{id};
277 # Extract information from multiple elements
278 my @headers = $dom->find('h1, h2, h3')->map('text')->each;
280 # Count all the different tags
281 my $hash = $dom->find('*')->reduce(sub { $a->{$b->tag}++; $a }, {});
283 # Find elements with a class that contains dots
284 my @divs = $dom->find('div.foo\.bar')->each;
288 my $collection = $dom->following;
289 my $collection = $dom->following('div ~ p');
291 Find all sibling elements after this node matching the CSS selector and return
292 a L<DOM::Tiny::Collection> object containing these elements as L<DOM::Tiny>
293 objects. All selectors from L<DOM::Tiny::CSS/"SELECTORS"> are supported.
295 # List tags of sibling elements after this node
296 say $dom->following->map('tag')->join("\n");
298 =head2 following_nodes
300 my $collection = $dom->following_nodes;
302 Return a L<DOM::Tiny::Collection> object containing all sibling nodes after
303 this node as L<DOM::Tiny> objects.
306 $dom->parse('<p>A</p><!-- B -->C')->at('p')->following_nodes->last->content;
310 my $bool = $dom->matches('div ~ p');
312 Check if this element matches the CSS selector. All selectors from
313 L<DOM::Tiny::CSS/"SELECTORS"> are supported.
316 $dom->parse('<p class="a">A</p>')->at('p')->matches('.a');
317 $dom->parse('<p class="a">A</p>')->at('p')->matches('p[class]');
320 $dom->parse('<p class="a">A</p>')->at('p')->matches('.b');
321 $dom->parse('<p class="a">A</p>')->at('p')->matches('p[id]');
325 my $namespace = $dom->namespace;
327 Find this element's namespace or return C<undef> if none could be found.
329 # Find namespace for an element with namespace prefix
330 my $namespace = $dom->at('svg > svg\:circle')->namespace;
332 # Find namespace for an element that may or may not have a namespace prefix
333 my $namespace = $dom->at('svg > circle')->namespace;
337 my $sibling = $dom->next;
339 Return L<DOM::Tiny> object for next sibling element or C<undef> if there are no
343 $dom->parse('<div><h1>Test</h1><h2>123</h2></div>')->at('h1')->next;
347 my $sibling = $dom->next_node;
349 Return L<DOM::Tiny> object for next sibling node or C<undef> if there are no
353 $dom->parse('<p><b>123</b><!-- Test -->456</p>')
354 ->at('b')->next_node->next_node;
357 $dom->parse('<p><b>123</b><!-- Test -->456</p>')
358 ->at('b')->next_node->content;
362 my $parent = $dom->parent;
364 Return L<DOM::Tiny> object for parent of this node or C<undef> if this node has
369 $dom = $dom->parse('<foo bar="baz">I ♥ DOM::Tiny!</foo>');
371 Parse HTML/XML fragment with L<DOM::Tiny::HTML>.
374 my $dom = DOM::Tiny->new->xml(1)->parse($xml);
378 my $collection = $dom->preceding;
379 my $collection = $dom->preceding('div ~ p');
381 Find all sibling elements before this node matching the CSS selector and return
382 a L<DOM::Tiny::Collection> object containing these elements as L<DOM::Tiny>
383 objects. All selectors from L<DOM::Tiny::CSS/"SELECTORS"> are supported.
385 # List tags of sibling elements before this node
386 say $dom->preceding->map('tag')->join("\n");
388 =head2 preceding_nodes
390 my $collection = $dom->preceding_nodes;
392 Return a L<DOM::Tiny::Collection> object containing all sibling nodes before
393 this node as L<DOM::Tiny> objects.
396 $dom->parse('A<!-- B --><p>C</p>')->at('p')->preceding_nodes->first->content;
400 $dom = $dom->prepend('<p>I ♥ DOM::Tiny!</p>');
402 Prepend HTML/XML fragment to this node.
404 # "<div><h1>Test</h1><h2>123</h2></div>"
405 $dom->parse('<div><h2>123</h2></div>')
406 ->at('h2')->prepend('<h1>Test</h1>')->root;
409 $dom->parse('<p>123</p>')
410 ->at('p')->child_nodes->first->prepend('Test ')->root;
412 =head2 prepend_content
414 $dom = $dom->prepend_content('<p>I ♥ DOM::Tiny!</p>');
416 Prepend HTML/XML fragment (for C<root> and C<tag> nodes) or raw content to this
419 # "<div><h2>Test123</h2></div>"
420 $dom->parse('<div><h2>123</h2></div>')
421 ->at('h2')->prepend_content('Test')->root;
423 # "<!-- Test 123 --><br>"
424 $dom->parse('<!-- 123 --><br>')
425 ->child_nodes->first->prepend_content(' Test')->root;
427 # "<p><i>123</i>Test</p>"
428 $dom->parse('<p>Test</p>')->at('p')->prepend_content('<i>123</i>')->root;
432 my $sibling = $dom->previous;
434 Return L<DOM::Tiny> object for previous sibling element or C<undef> if there
435 are no more siblings.
438 $dom->parse('<div><h1>Test</h1><h2>123</h2></div>')->at('h2')->previous;
442 my $sibling = $dom->previous_node;
444 Return L<DOM::Tiny> object for previous sibling node or C<undef> if there are
448 $dom->parse('<p>123<!-- Test --><b>456</b></p>')
449 ->at('b')->previous_node->previous_node;
452 $dom->parse('<p>123<!-- Test --><b>456</b></p>')
453 ->at('b')->previous_node->content;
457 my $parent = $dom->remove;
459 Remove this node and return L</"root"> (for C<root> nodes) or L</"parent">.
462 $dom->parse('<div><h1>Test</h1></div>')->at('h1')->remove;
464 # "<p><b>456</b></p>"
465 $dom->parse('<p>123<b>456</b></p>')
466 ->at('p')->child_nodes->first->remove->root;
470 my $parent = $dom->replace('<div>I ♥ DOM::Tiny!</div>');
472 Replace this node with HTML/XML fragment and return L</"root"> (for C<root>
473 nodes) or L</"parent">.
475 # "<div><h2>123</h2></div>"
476 $dom->parse('<div><h1>Test</h1></div>')->at('h1')->replace('<h2>123</h2>');
478 # "<p><b>123</b></p>"
479 $dom->parse('<p>Test</p>')
480 ->at('p')->child_nodes->[0]->replace('<b>123</b>')->root;
484 my $root = $dom->root;
486 Return L<DOM::Tiny> object for C<root> node.
490 my $parent = $dom->strip;
492 Remove this element while preserving its content and return L</"parent">.
495 $dom->parse('<div><h1>Test</h1></div>')->at('h1')->strip;
500 $dom = $dom->tag('div');
502 This element's tag name.
504 # List tag names of child elements
505 say $dom->children->map('tag')->join("\n");
509 $dom = $dom->tap(sub {...});
511 Alias for L<Mojo::Base/"tap">.
515 my $trimmed = $dom->text;
516 my $untrimmed = $dom->text(0);
518 Extract text content from this element only (not including child elements),
519 smart whitespace trimming is enabled by default.
522 $dom->parse("<div>foo\n<p>bar</p>baz\n</div>")->at('div')->text;
525 $dom->parse("<div>foo\n<p>bar</p>baz\n</div>")->at('div')->text(0);
529 my $str = $dom->to_string;
531 Render this node and its content to HTML/XML.
534 $dom->parse('<div><b>Test</b></div>')->at('div b')->to_string;
538 my $tree = $dom->tree;
539 $dom = $dom->tree(['root']);
541 Document Object Model. Note that this structure should only be used very
542 carefully since it is very dynamic.
546 my $type = $dom->type;
548 This node's type, usually C<cdata>, C<comment>, C<doctype>, C<pi>, C<raw>,
549 C<root>, C<tag> or C<text>.
552 $dom->parse('<![CDATA[Test]]>')->child_nodes->first->type;
555 $dom->parse('<!-- Test -->')->child_nodes->first->type;
558 $dom->parse('<!DOCTYPE html>')->child_nodes->first->type;
561 $dom->parse('<?xml version="1.0"?>')->child_nodes->first->type;
564 $dom->parse('<title>Test</title>')->at('title')->child_nodes->first->type;
567 $dom->parse('<p>Test</p>')->type;
570 $dom->parse('<p>Test</p>')->at('p')->type;
573 $dom->parse('<p>Test</p>')->at('p')->child_nodes->first->type;
577 my $value = $dom->val;
579 Extract value from form element (such as C<button>, C<input>, C<option>,
580 C<select> and C<textarea>) or return C<undef> if this element has no value. In
581 the case of C<select> with C<multiple> attribute, find C<option> elements with
582 C<selected> attribute and return an array reference with all values or C<undef>
583 if none could be found.
586 $dom->parse('<input name="test" value="a">')->at('input')->val;
589 $dom->parse('<textarea>b</textarea>')->at('textarea')->val;
592 $dom->parse('<option value="c">Test</option>')->at('option')->val;
595 $dom->parse('<select><option selected>d</option></select>')
599 $dom->parse('<select multiple><option selected>e</option></select>')
600 ->at('select')->val->[0];
604 $dom = $dom->wrap('<div></div>');
606 Wrap HTML/XML fragment around this node, placing it as the last child of the
607 first innermost element.
609 # "<p>123<b>Test</b></p>"
610 $dom->parse('<b>Test</b>')->at('b')->wrap('<p>123</p>')->root;
612 # "<div><p><b>Test</b></p>123</div>"
613 $dom->parse('<b>Test</b>')->at('b')->wrap('<div><p></p>123</div>')->root;
615 # "<p><b>Test</b></p><p>123</p>"
616 $dom->parse('<b>Test</b>')->at('b')->wrap('<p></p><p>123</p>')->root;
618 # "<p><b>Test</b></p>"
619 $dom->parse('<p>Test</p>')->at('p')->child_nodes->first->wrap('<b>')->root;
623 $dom = $dom->wrap_content('<div></div>');
625 Wrap HTML/XML fragment around this node's content, placing it as the last
626 children of the first innermost element.
628 # "<p><b>123Test</b></p>"
629 $dom->parse('<p>Test<p>')->at('p')->wrap_content('<b>123</b>')->root;
631 # "<p><b>Test</b></p><p>123</p>"
632 $dom->parse('<b>Test</b>')->wrap_content('<p></p><p>123</p>');
636 my $bool = $dom->xml;
637 $dom = $dom->xml($bool);
639 Disable HTML semantics in parser and activate case-sensitivity, defaults to
640 auto detection based on processing instructions.
644 L<DOM::Tiny> overloads the following operators.
650 Alias for L</"child_nodes">.
653 $dom->parse('<!-- Test --><b>123</b>')->[0];
665 Alias for L</"attr">.
668 $dom->parse('<div id="test">Test</div>')->at('div')->{id};
674 Alias for L</"to_string">.
678 Report any issues on the public bugtracker.
682 Dan Book <dbook@cpan.org>
684 =head1 COPYRIGHT AND LICENSE
686 This software is Copyright (c) 2015 by Dan Book.
688 This is free software, licensed under:
690 The Artistic License 2.0 (GPL Compatible)
694 L<Mojo::DOM>, L<XML::LibXML>, L<XML::Twig>, L<HTML::TreeBuilder>, L<XML::Smart>