/usr/lib/swipl/library/ext/sgml/sgml.pl
All Application Manual Name SummaryHelp

  • ext
    • sgml
      • sgml.pl -- SGML, XML and HTML parser
        • dtd/2
        • load_dtd/3
        • load_structure/3
        • load_sgml_file/2
        • load_xml_file/2
        • load_html_file/2
        • load_html/3
        • load_xml/3
        • load_sgml/3
        • xml_quote_attribute/2
        • xml_quote_cdata/2
        • xml_name/1
        • xml_basechar/1
        • xml_ideographic/1
        • xml_combining_char/1
        • xml_digit/1
        • xml_extender/1
        • xml_is_dom/1
      • xpath.pl -- Select nodes in an XML DOM
      • sgml_write.pl -- XML/SGML writer module
      • c14n2.pl -- C14n2 canonical XML documents
      • xsdp_types.pl -- XML-Schema primitive types
 load_html(+Input, -DOM, +Options) is det
Load HTML text from Input and unify the resulting DOM structure with DOM. Options are passed to load_structure/3, after adding the following default options:
dtd(DTD)
Pass the DTD for HTML as obtained using dtd(html, DTD).
dialect(Dialect)
Current dialect from the Prolog flag html_dialect
max_errors(-1)
syntax_errors(quiet)
Most HTML encountered in the wild contains errors. Even in the context of errors, the resulting DOM term is often a reasonable guess at the intent of the author.

You may also want to use the library(http/http_open) to support loading from HTTP and HTTPS URLs. For example:

:- use_module(library(http/http_open)).
:- use_module(library(sgml)).

load_html_url(URL, DOM) :-
    load_html(URL, DOM, []).