Htmlys manual for PHP

Print

Load the extension into PHP

There are many ways to load the Htmlys extension into PHP. In this manual, we cover the most common cases.

In most cases, relative paths can be used as long as the extension lies in the extension_dir configured in your php.ini, or in the default extension_dir compiled into PHP, if extension_dir is not set.

From the php.ini

If you can modify your php.ini, the easiest way is to add the line:

Print

  1. extension=/path/to/html.so

And restart your web server if required for the changes to take effect.

If you cannot modify your php.ini (because you are on a shared host, for example), be aware of the fact that you can sometimes create your own php.ini at the root of your website and put directives of your choice inside (some restrictions may have been set by the administrator), including the one stated above. On some systems though, the name of this configuration file is different, use the phpinfo() function to get it or ask your hosting provider/system administrator for more details.

From an .htaccess (Apache only)

For security reasons, it is not possible to load an extension directly from an .htaccess file at the time of this writing (PHP 5.4.3, Apache 2.4), but because PHP already supports changing most values from these files, it may become possible in a future release of PHP.

When you are using Apache though, .htaccess files can still be used to set a custom location of your CGI binary of PHP, which can in turn load the php.ini of your choice. See loading the extension from the CGI environment for more information.

From the command line

If you are using PHP with the CLI or CGI SAPIIf you are using PHP with the CLI or CGI SAPIs, you can instruct it to load the extension directly on the command line, like this:

php -d extension=/path/to/html.so -f script.php

This is ideal to test your scripts (or the extension itself) because it does not require to change the configuration of PHP and the web server (and its possible restart).

From the CGI environment

If you are using PHP with the CGI SAPI, be aware of the fact that it automatically loads any php.ini located in the same directory as the CGI binary of PHP. If you have sufficient rights on the file system, create a php.ini here, and use the php.ini directive stated earlier for a similar effect.

If you don't have the rights to create a file in this location, you may still copy the CGI binary of PHP in a directory where you have writing privileges (eg. cgi-bin at the root of your website), and create the required php.ini in the same directory, but you would still need to alter the configuration of your web server to use this copy of the CGI binary of PHP instead of the default one. On some Apache configurations, this can be done simply by changing the .htaccess. Please, note that if you use this method and put a PHP binary under your web root, special care has to be taken about the security of your website, as warned by the PHP team.

From PHP code

Another way to load the extension is to do it directly from PHP code, using the dl() function:

Print

  1. <?php
  2. dl('/path/to/html.so');
  3. // Start using the extension here
  4. ?>

One could consider this method as the most portable, since PHP scripts can decide to load or not he library at run time, and provide fallback capabilities if not loaded. The method is however discouraged by PHP, and as such, the dl() function is not available in all SAPIs.

Other configurations

If none of the above methods is suitable for your system, or if you need more information about your specific configuration, please check the PHP website for more information.

Use the extension

Once loaded in, the extension provides to the PHP developer one abstract class HtmlHandler and two functions html_parse_string() and html_parse_file(). They are defined like so:

Print

  1. <?php
  2. abstract class HtmlHandler
  3. {
  4. public function OnParseError();
  5. public function OnDoctype(string $name, string $publicId, string $systemId);
  6. public function OnStartTag(string $name, array $attributes, bool $selfClosing);
  7. public function OnEndTag(string $name);
  8. public function OnComment(string $data);
  9. public function OnChar(string $c);
  10. public function OnEof();
  11. }
  12.  
  13. bool function html_parse_string(HtmlHandler $handler, string $string);
  14. bool function html_parse_file(HtmlHandler $handler, string $path);
  15. ?>

First, the class HtmlHandler has to be extended into your own class, and it should override the methods of the HTML tokens you are interested in. Then, an object of your class has to be instantiated, and the functions html_parse_string() and/or html_parse_file() called with this object as the first parameter.

The second parameter of html_parse_string() has to be plain HTML code, while the second parameter of html_parse_file() has to be a path to an HTML file on your filesystem. Both functions will parse the HTML content and call the methods of your HtmlHandler object as needed.

Example

Here is an example script using the extension, feel free to adapt it to your needs:

Print

  1. <?php
  2. /*
  3.  * Htmlys demonstration script.
  4.  * -----------------------------------------------------------------------------
  5.  * Copyright (c) 2009 - 2013 Krizalys (http://www.krizalys.com/)
  6.  *
  7.  * Script to demonstate the use of the Htmlys binding for PHP. All the methods
  8.  * have an empty body and could be implemented freely by the developer. Methods
  9.  * that are not needed can be removed.
  10.  *
  11.  * Call to functions html_parse_string() and html_parse_file() can be adjusted
  12.  * or removed as needed.
  13.  */
  14.  
  15. class MyHtmlHandler extends HtmlHandler
  16. {
  17. public function OnParseError()
  18. {
  19. // Handle parse error
  20. }
  21.  
  22. public function OnDoctype($name, $publicId, $systemId)
  23. {
  24. // Handle DOCTYPE token
  25. }
  26.  
  27. public function OnStartTag($name, $attributes, $selfClosing)
  28. {
  29. // Handle start tag token
  30. }
  31.  
  32. public function OnEndTag($name)
  33. {
  34. // Handle end tag token
  35. }
  36.  
  37. public function OnComment($data)
  38. {
  39. // Handle comment token
  40. }
  41.  
  42. public function OnChar($c)
  43. {
  44. // Handle character token
  45. }
  46.  
  47. public function OnEof()
  48. {
  49. // Handle EOF token
  50. }
  51. }
  52.  
  53. $handler = new MyHtmlHandler();
  54.  
  55. html_parse_string($handler,
  56. '<!DOCTYPE html>
  57. <html>
  58. <head>
  59. <meta charset="utf-8" />
  60. <title>Test</title>
  61. </head>
  62. <body class="test">
  63. This a test HTML document
  64. <!-- a comment here -->
  65. </body>
  66. </html>
  67. ');
  68.  
  69. // /path/to/document.html has to exist
  70. html_parse_file($handler, '/path/to/document.html');
  71. ?>