htmlSQL class - quite nice way of parsing html
General December 11th, 2007
Today I found a class written in PHP which implement the idea of using SQL syntax while parsing HTML documents - htmlSQL. The idea of the author - Jonas David John - is quite simple. If you want to parse a HTML document, let’s way you want to parse all divs with class “row”,the syntax will look like normal SQL:
Looking familiar isn’t it? There is also function connect() /which define the html source/ and fetch_array() which contain the results after the query.
There are few examples included in the library package and here is the simplest one:
/*
** htmlSQL - Example 1
**
** Shows a simple query
*/
include_once("../snoopy.class.php");
include_once("../htmlsql.class.php");
$wsql = new htmlsql();
// connect to a URL
if (!$wsql->connect('url', 'http://codedump.jonasjohn.de/')){
print 'Error while connecting: ' . $wsql->error;
exit;
}
/* execute a query:
This query extracts all links with the classname = nav_item
*/
if (!$wsql->query('SELECT * FROM a WHERE $class == "nav_item"')){
print "Query error: " . $wsql->error;
exit;
}
// show results:
foreach($wsql->fetch_array() as $row){
print_r($row);
/*
$row is an array and looks like this:
Array (
[href] => /feedback.htm
[class] => nav_item
[tagname] => a
[text] => Feedback
)
*/
}
?>
I really like this approach - it’s looking very familiar to me - at least 99% of your time you do exactly the same in your apps - fetching some data.
Well in CakePHP is not like this ;), but I really like this lib and I definitely will use it in my projects!
Add to:


There exists a DBO driver for CakePHP for parsing HTML: http://myeasyscripts.com/loudbaking/htmlsource-a-new-dbo-driver-for-cakephp/