htmlSQL class – quite nice way of parsing html

Today I found a class written in PHP which implement the idea of using SQL syntax while parsing HTML documents – htmlSQL. The idea of the author – Jonas David John – is quite simple. If you want to parse a HTML document, let’s way you want to parse all divs with class “row”,the syntax will look like normal SQL:

SELECT * FROM div WHERE $class == "row"

Looking familiar isn’t it? There is also function connect() /which define the html source/ and fetch_array() which contain the results after the query.

There are few examples included in the library package and here is the simplest one:

<?php

    /*
    ** htmlSQL - Example 1
    **
    ** Shows a simple query
    */

   
    include_once("../snoopy.class.php");
    include_once("../htmlsql.class.php");
   
    $wsql = new htmlsql();
   
    // connect to a URL
    if (!$wsql->connect('url', 'http://codedump.jonasjohn.de/')){
        print 'Error while connecting: ' . $wsql->error;
        exit;
    }
   
    /* execute a query:
       
       This query extracts all links with the classname = nav_item  
    */

    if (!$wsql->query('SELECT * FROM a WHERE $class == "nav_item"')){
        print "Query error: " . $wsql->error;
        exit;
    }

    // show results:
    foreach($wsql->fetch_array() as $row){
   
        print_r($row);
       
        /*
        $row is an array and looks like this:
        Array (
            [href] => /feedback.htm
            [class] => nav_item
            [tagname] => a
            [text] => Feedback
        )
        */

       
    }
   
?>

I really like this approach – it’s looking very familiar to me – at least 99% of your time you do exactly the same in your apps – fetching some data.

Well in CakePHP is not like this ;), but I really like this lib and I definitely will use it in my projects!

One thought on “htmlSQL class – quite nice way of parsing html

Leave a Reply

Your email address will not be published. Required fields are marked *