Description, Requirement & Features
- A HTML DOM parser written in PHP5+ let you manipulate HTML in a very easy way!
- Require PHP 5+.
- Supports invalid HTML.
- Find tags on an HTML page with selectors just like jQuery.
- Extract contents from HTML in a single line.
How to get HTML Element
// Create DOM from URL or file
$html = file_get_html('http://www.google.com/');
// Find all images
foreach($html->find('img') as $element)
echo $element->src . '<br>';
// Find all links
foreach($html->find('a') as $element)
echo $element->href . '<br>';
How to modify HTML element
// Create DOM from string
$html = str_get_html('<div id="hello">Hello</div><div id="world">World</div>');
$html->find('div', 1)->class = 'bar';
$html->find('div[id=hello]', 0)->innertext = 'foo';
echo $html; // Output: <div id="hello">foo</div><div id="world" class="bar">World</div>
$html = str_get_html('<div id="hello">Hello</div><div id="world">World</div>');
$html->find('div', 1)->class = 'bar';
$html->find('div[id=hello]', 0)->innertext = 'foo';
echo $html; // Output: <div id="hello">foo</div><div id="world" class="bar">World</div>
Extract Content from HTML
// Dump contents (without tags) from HTML
echo file_get_html('http://www.google.com/')->plaintext;
Scrapping Slashdot
// Create DOM from URL
$html = file_get_html('http://slashdot.org/');
// Find all article blocks
foreach($html->find('div.article') as $article) {
$item['title'] = $article->find('div.title', 0)->plaintext;
$item['intro'] = $article->find('div.intro', 0)->plaintext;
$item['details'] = $article->find('div.details', 0)->plaintext;
$articles[] = $item;
}
print_r($articles);
Sample uses
// Include the library
include('simple_html_dom.php');
// Retrieve the DOM from a given URL
$html = file_get_html('http://davidwalsh.name/');
// Find all "A" tags and print their HREFs
foreach($html->find('a') as $e)
echo $e->href . '<br>';
// Retrieve all images and print their SRCs
foreach($html->find('img') as $e)
echo $e->src . '<br>';
// Find all images, print their text with the "<>" included
foreach($html->find('img') as $e)
echo $e->outertext . '<br>';
// Find the DIV tag with an id of "myId"
foreach($html->find('div#myId') as $e)
echo $e->innertext . '<br>';
// Find all SPAN tags that have a class of "myClass"
foreach($html->find('span.myClass') as $e)
echo $e->outertext . '<br>';
// Find all TD tags with "align=center"
foreach($html->find('td[align=center]') as $e)
echo $e->innertext . '<br>';
// Extract all text from a given cell
echo $html->find('td[align="center"]', 1)->plaintext.'<br><hr>';
No comments:
Post a Comment