The component gb.xml.html provides classes with which you can write a new HTML document or change an existing HTML document.
The component gb.xml.html from Adrien Prokopowicz has only the classes HtmlDocument and XmlElement. The class XmlElement is a reimplementation of the class XmlElement from http://gambaswiki.org/wiki/comp/gb.xml/xmlelement and is not described here → chapter 27.0 XML.
The component is based on gb.xml and the class HtmlDocument (gb.xml.html) inherits the class XmlDocument in gb.xml.
You can create an object of the class HtmlDocument. It represents an HTML document:
Dim hHtmlDocument As HtmlDocument hHtmlDocument = New HtmlDocument ( [ FileName As String ] )
The class HtmlDocument has these properties:
Property | Data type | Description |
---|---|---|
All | XmlNode[] | Returns an array with all XML nodes. |
Base | String | Returns the base URL used for all relative URLs contained in a document. If there is no <base> tag in the document, this property returns zero and creates a new <base> element when it is set. |
Html5 | Boolean | If the property is set to True, an HTML5 document is created and the document type is specified in the prologue in accordance with html5: <! DOCTYPE html>. If the property is set to False, then the document type is declared as follows: <! DOCTYPE html PUBLIC "-/W3C*DTD XHTML 1.0 Strict**EN""http: *www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd"> |
Title | String | Sets the title for the HTML document. |
Lang | String | Sets the language in the lang attribute in the document type tag. |
Favicon | String | A favicon as an icon in the address line of a web browser is defined via a file path to the icon image file. |
Head | XmlElement | Returns the <head> element or resets the <head> element of the document. If there is no <head> element in the document, it is created when this property is read. |
StyleSheets | .HtmlDocumentStyleSheets | The virtual class has three methods, the most important of which is the add method: Sub Add (Source As String[, Media As String]. The method inserts a CSS style sheet tag into the HTML document. The default value for the optional parameter' Media' is' screen'. |
Script | .HtmlDocumentScripts | The virtual class has three methods, the most important of which is the Add method: Sub Add (Source As String). The method inserts a script tag into the HTML document. |
Body | XmlElement | Returns or sets the <body>-element of the document. If there is no <body> element in the document, it is created when reading this property:? https://developer.mozilla.org/de/docs/Web/HTML/Element/body. |
Content | String | Returns the complete content of the HTML document in a string. |
Table 27.4.1.1: Properties of the HtmlDocument class
The class HtmlDocument has these methods:
Method | Return type | Description |
---|---|---|
Sub Open (FileName As String) | ~ | Removes the entire content of this document and loads its new content from the specified file. FileName contains the file path. If the specified file is invalid or cannot be opened, an error is triggered. |
GetElementById (Id As String[, Depth As Integer]) | XmlElement | The function returns the element with the specified ID as an XML element. The optional parameter Depth determines the search depth. |
GetElementsByTagName (TagName As String[, Mode As Integer, Depth As Integer]) | XmlElement | Returns all elements of the HTML document in an array of XML elements whose tag name' TagName' matches the name in the document. The mode argument defines the comparison method used. It supports GB.Binary, GB.IgnoreCase and GB.Like. For more information, see' Predefined Constants'. The Depth argument defines where to stop the search: If a negative value is only stopped at the end of the tree (default), 1 only checks the root element, 2 only checks the direct children of the root element and so on…. . |
GetElementsByClassName (ClassName As String[, Depth As Integer] | XmlElement | Returns all elements of the HTML document in an array of XML elements whose class name matches the name. The Depth argument defines where to stop the search: If a negative value is only stopped at the end of the tree (default), 1 only checks the root element, 2 only checks the direct children of the root element and so on…. . |
CreateElement (TagName As String) | XmlElement | The function creates a new element' TagName'. The function value is an XML element. |
FromString (Data As String) | ~ | Removes the existing content of the HTML document completely and loads the new content from the specified XML string. If the specified string is invalid, an error is triggered. |
ToString ([ Indent As Boolean]) | String | This function returns the content of the HTML document in a string that can be re-read later, for example. If the optional' Indent' parameter is set to True, the content is returned with appropriate indentations. |
Save (FileName As String[, Indent As Boolean] | ~ | Saves the contents of the HTML document under the file path specified with FileName. If the optional' Indent' parameter is set to True, the content of the HTML document is formatted with appropriate indentations. |
Table 27.4.2.1: Methods of the class HtmlDocument
What may be the reasons why someone should write an HTML document with the classes of the component gb. xml. html and not with the methods of the class XmlWriter, which would be obvious? One reason is that the classes of gb. xml. html are specialized in writing an HTML document. In addition, the class HtmlDocument provides you with a document of the type HtmlDocument, a DOM tree that you can build up and change specifically.
Imagine the following situation: You write a CMS similar to the DokuWiki in Gambas. At the request of a web browser, this CMS runs as a CGI script on the server and generates an HTML document from various resources, such as database data, entries from forms or data from XML files, for example, XML configuration files, according to the requested URL. Depending on which modules are loaded in your CMS, additional elements are added to the HTML page, such as an automatically generated table of contents or a visitor counter at the bottom of the page. Such a HTML page sequentially - that is, with a series of lines in the form of “sHTML &=…” It is time-consuming and costly to write, because every routine must be called at exactly the right place. How to react if you want to include a CSS or Javascript file for the visitor counter? This must be the procedure that uses the HTML header <head>..</head> assemble of course already known before! Here it is much better to use a document of the type HtmlDocument. Each resource to be included can then be attached as a new node to the places in the DOM tree where it belongs. It is no problem to add a new stylesheet to an HtmlDocument using the method HtmlDocument.stylesheets.add (…). Only when the DOM tree is complete is it converted into a (well-formed!) HTML string. As far as the use of gb. xml. html is concerned, the component impresses with the use of the' Document Object Model' (DOM). You can also search elegantly for interesting information in the DOM tree. For example, the method returns
HtmlDocument.Root.GetChildrenByFilter("a[href^=ftp://]")
as function value all link tags <a>….</a> in an array of XmlElement[] - a link list whose entries refer to an FTP server. You will find that even with a regular expression this would be much more difficult, because for example you can't rely on href being the first attribute listed in a link tag!
The presented projects show you how to use the classes of the component gb.xml.html.
An HTML5 document is rewritten in the first project. Its content is displayed in a web browser:
Figure 27.4.3.1.1.1: Content of the HTML5 file test.html in a web browser
The source text is completely specified and then commented on:
[1] ' Gambas class file [2] [3] Public Sub Form_Open() [4] FMain.Resizable = True [5] FMain.Caption = "HTML5-GENERATOR" [6] btnHTMLShow.Enabled = False [7] End [8] [9] Public Sub btnGenerateHTMLDokument_Click() [10] TextArea1.Clear() [11] TextArea1.Insert(WriteHTMLDocumentDOM().ToString(True)) [12] SaveHTMLFile() [13] btnHTMLShow.Enabled = True [14] End [15] [16] Public Sub btnHTMLShow_Click() [17] If Exist(Application.Path &/ "files/test.html") Then [18] Shell "firefox " & Application.Path &/ "files/test.html" [19] Endif [20] btnHTMLShow.Enabled = False [21] End [22] [23] Private Sub SaveHTMLFile() [24] WriteHTMLDocumentDOM().Save(Application.Path &/ "files/test.html", True) [25] End [26] [27] Private Function WriteHTMLDocumentDOM() As HtmlDocument [28] [29] Dim hHtmlDocument As New HtmlDocument [30] Dim hXMLElement As XmlElement [31] Dim sLongText As String [32] [33] sLongText = File.Load("texts/image.description.txt") [34] [35] hHtmlDocument.Html5 = True [36] hHtmlDocument.Lang = "de" [37] hHtmlDocument.Title = "Test-HTML-Document" [38] hHtmlDocument.StyleSheets.Add("../css/main.css") [39] hHtmlDocument.Favicon = "../images/favicon.png" [40] hHtmlDocument.Scripts.Add("../scripts/datetime.js") [41] [42] hHtmlDocument.Body.NewElement("h1") [43] hXMLElement = hHtmlDocument.GetElementsByTagName("h1", gb.IgnoreCase)[0] [44] hXMLElement.AppendText("Flowers in the Alps") [45] [46] hHtmlDocument.Body.NewElement("hr") [47] hXMLElement = hHtmlDocument.GetElementsByTagName("hr", gb.IgnoreCase)[0] [48] hXMLElement.SetAttribute("class", "line") [49] [50] hHtmlDocument.Body.NewElement("br") [51] [52] hHtmlDocument.Body.NewElement("img") [53] hXMLElement = hHtmlDocument.GetElementsByTagName("img", gb.IgnoreCase)[0] [54] hXMLElement.SetAttribute("src", Application.Path &/ "images/augentrost.jpg") [55] hXMLElement.SetAttribute("width", "255") [56] hXMLElement.SetAttribute("height", "148") [57] hXMLElement.SetAttribute("alt", "euphrasia rostkoviana") [58] [59] hHtmlDocument.Body.NewElement("p") [60] hXMLElement = hHtmlDocument.GetElementsByTagName("p", gb.IgnoreCase)[0] [61] hXMLElement.AppendText("Bildbeschreibung:") [62] [63] hHtmlDocument.Body.NewElement("p") [64] hXMLElement = hHtmlDocument.GetElementsByTagName("p", gb.IgnoreCase)[1] [65] hXMLElement.AppendText(sLongText) [66] [67] hHtmlDocument.Body.AppendText("Stand:") [68] hHtmlDocument.Body.NewElement("span") [69] hXMLElement = hHtmlDocument.GetElementsByTagName("span", gb.IgnoreCase)[0] [70] hXMLElement.SetAttribute("id", "datetime") [71] hXMLElement.SetAttribute("class", "id") [72] [73] Return hHtmlDocument [74] [75] End
Comment:
[54] hXMLElement.SetAttribute("src", Application.Path &/ "images/augentrost.jpg") [55] hXMLElement.SetAttribute("width", "255") [56] hXMLElement.SetAttribute("height", "148") [57] hXMLElement.SetAttribute("alt", "euphrasia rostkoviana") [54*] hXMLElement.SetAttribute("src", Application.Path &/ "images/augentrost.jpg\" width=\"255\" height=\"148\" alt=\"euphrasia rostkoviana")
The program generates the following HTML file contents:
<!DOCTYPE html> <html lang="de"> <head> <meta http-equiv="Content-Type" content="text/html; charset=utf-8" /> <title> Test-HTML-Dokument </title> <meta charset="utf-8" /> <link rel="stylesheet" href="/home/hans/.../css/main.css" type="text/css" media="screen" /> <link rel="icon" href="/home/hans/.../images/favicon.png" /> <script src="/home/hans/.../scripts/datetime.js" type="text/javascript"> </script> </head> <body> <h1> Blumen in den Alpen </h1> <hr class="line" /> <br /> <img src="/home/hans/.../images/augentrost.jpg" width="255" height="148" alt="Euphrasia rostkoviana" /> <p> Bildbeschreibung: </p> <p> Augentrost oder Euphrasia rostkoviana: Lorem ipsum dolor sit amet, consectetur adipiscing elit. Mauris pretium elit massa. Nulla libero est, vestibulum eu lobortis ut, pulvinar eget tortor. Fusce congue laoreet commodo. Aliquam non nisl dolor, at posuere felis. Maecenas ac tortor erat. Duis a erat lectus, sed rutrum velit. Aliquam vehicula luctus ultricies. Nam dapibus elit eget lectus tincidunt pharetra. </p> Stand: <span id="datetime" class="id"> </span> </body> </html>
Comment:
If you are using the stable Gambas version 3.10.0 or lower, the following function can help you to create a well-formed HTML document:
Private Function WorkAround(HtmlDocument As HtmlDocument) As HtmlDocument Dim hHtmlDocument As New HtmlDocument Dim hXMLElement As XmlElement hHtmlDocument = HtmlDocument hXMLElement = hHtmlDocument.GetElementsByTagName("meta")[0] ' Get the first <meta>-Tag ' You must access the parent element to remove the specified <meta> tag hXMLElement.Parent.RemoveChild(hXMLElement) Return hHtmlDocument End
The first <meta> tag is removed from the DOM tree. Now all you have to do is change the line 11 in the source text to the one below:
TextArea1.Insert(WriteHTMLDocumentDOM().ToString(True)) TextArea1.Insert(WorkAround(WriteHTMLDocumentDOM()).ToString(True))
Same for line 24:
WriteHTMLDocumentDOM().Save(Application.Path &/ "files/test.html", True) WorkAround(WriteHTMLDocumentDOM()).Save(Application.Path &/ "files/test.html", True)
In Project 1, which is made available to you as an archive in the download area for testing, the necessary corrections are already made automatically depending on the used Gambas version (→ System. FullVersion).
In the second project, the content of an existing HTML file is heavily modified. The focus is on searching for selected file contents to be changed, editing and changing these contents as well as adding the changed contents to the DOM tree. Changing also includes deleting individual HTML tags and their contents? 27.2.3 Changes to an XML document. Changes are also implemented there via the DOM tree.
For the second project, only the source code of the UpdateHTMLDocument function (sPath As String) is specified, which is extensively commented internally. Note that the following source code represents only an individual change function:
Private Function UpdateHTMLDocument(sPath As String) As HtmlDocument Dim hHtmlDocument As New HtmlDocument Dim hXMLElement, hExistChild, hNewChild As XmlElement Dim aH1, aStyle As XmlElement[] Dim sAttribute, sLongText, sText, sOldText, sNewText As String Dim iWidth, iHeight As Integer Dim iWH As New Integer[] sLongText = File.Load(Application.Path &/ "texts/description.txt") ' The content of the original file is written to an HTML document. ' Existing content is overwritten. hHtmlDocument.Open(sPath) ' Special case: An attribute is added to the existing <html> tag using the property .Lang. hHtmlDocument.Lang = "DE_de" ' Find and save the first <meta> tag in the head area by its name hExistChild = hHtmlDocument.Head.GetChildrenByTagName("meta")[0] ' Create a second <meta> tag hNewChild = New XmlElement("meta") ' Insert the second <meta>-Tag AFTER the first <meta>-Tag into the document tree hHtmlDocument.Head.InsertAfter(hExistChild, hNewChild) ' Add 2 attributes to the second <meta> tag hXMLElement = hHtmlDocument.Head.GetChildrenByTagName("meta")[1] hXMLElement.NewAttribute("name", "author") hXMLElement.NewAttribute("content", "Hans Lehmann - Osterburg - 2017") ' Back to the head area in the document tree hXMLElement = hXMLElement.Parent ' Find and save the second <meta> tag in the head area by its name hExistChild = hHtmlDocument.Head.GetChildrenByTagName("meta")[1] ' Create a third <meta> tag hNewChild = New XmlElement("meta") ' Insert the third <meta>-Tag AFTER the second <meta>-Tag into the document tree hHtmlDocument.Head.InsertAfter(hExistChild, hNewChild) ' Add 2 attributes to the third <meta> tag hXMLElement = hHtmlDocument.Head.GetChildrenByTagName("meta")[2] hXMLElement.NewAttribute("name", "description") hXMLElement.NewAttribute("content", "Modification and extension of an HTML-File (DOM).") ' Back to the head area in the document tree hXMLElement = hXMLElement.Parent ' Find and save the first <link> tag in the head area by its name hXMLElement = hHtmlDocument.GetElementsByTagName("link")[0] ' Save the old attribute value sAttribute = hXMLElement.GetAttribute("href") ' Set the new attribute value sAttribute = "../images/favicon.ico" ' Assigning the new attribute value hXMLElement.SetAttribute("href", sAttribute) ' Make changes to CSS definitions in <style> tag (head) ' Find and save the <style> tag in the head area by its name aStyle = hHtmlDocument.Head.GetChildrenByTagName("style") If aStyle.Count > 0 Then ' Save CSS definitions for the <body> tag sText = aStyle[0].TextContent sOldText = "font-size: 10px" sNewText = "font-size: 14px" ' Replace font size sText = Replace$(sText, sOldText, sNewText) sOldText = "Verdana" sNewText = "\"DejaVu Sans Mono\"" ' Replace font sText = Replace$(sText, sOldText, sNewText) ' Assign new CSS definitions for <body> tag aStyle[0].TextContent = sText Endif ' Add a (CSS color value) attribute to the <body> tag hHtmlDocument.Body.SetAttribute("style", "color:darkblue;") ' Find and save the first <h1>-Tag in the body area by its name. aH1 = hHtmlDocument.Body.GetChildrenByTagName("h1") ' Replace text im <h1>-Tag If aH1.Count > 0 Then aH1[0].TextContent = Replace$(aH1[0].TextContent, "/var/www", "~/public_html") Endif ' Find and save the first <img tag using its name hXMLElement = hHtmlDocument.GetElementsByTagName("img")[0] ' Find and save style attribute in <img>Tag by name sAttribute = hXMLElement.GetAttribute("style") ' Determine and redefine the width and height of the image (normalize) For Each sText In Scan(sAttribute, "*:*px;*:*px") If IsInteger(sText) Then iWH.Add(CInteger(sText)) Next If iWH[0] < 32 Or iWH[0] > 64 Then If (iWH[0] >= iWH[1] And iWH[0] <> 0 And iWH[1] <> 0) Then iWidth = 64 iHeight = 64 * (iWH[1] / iWH[0]) Endif Endif If iWH[1] < 32 Or iWH[1] > 64 Then If (iWH[1] >= iWH[0] And iWH[0] <> 0 And iWH[1] <> 0) Then iHeight = 64 iWidth = 64 * (iWH[0] / iWH[1]) Endif Endif ' Set new attribute value sAttribute = "width:" & Str(iWidth) & "px;height:" & Str(iHeight) & "px" ' Assign the new attribute value hXMLElement.SetAttribute("style", sAttribute) ' Back to body area in the document tree hXMLElement = hXMLElement.Parent ' Create a new <p> tag hXMLElement.NewElement("p") ' Find and save the first <p> tag by its name hXMLElement = hHtmlDocument.GetElementsByTagName("p")[0] ' Insert text into the <p> tag hXMLElement.AppendText(sLongText) ' Create a new <p> tag - Variant hHtmlDocument.Body.NewElement("p") ' Find and save the second <p> tag by its name hXMLElement = hHtmlDocument.GetElementsByTagName("p")[1] ' Insert text into the <p> tag hXMLElement.AppendText("An ordinary HTML-File basically consists of the following sections:") ' Create a new <ol> tag hHtmlDocument.Body.NewElement("ol") ' Find and save the first <ol> tag by its name hXMLElement = hHtmlDocument.GetElementsByTagName("ol")[0] ' Add an attribute to the 1. <ol> tag hXMLElement.SetAttribute("style", "text-indent:2em; list-style:upper-roman") ' Create a new <li> tag hXMLElement.NewElement("li") ' Find and save the first <li> tag by its name hXMLElement = hHtmlDocument.GetElementsByTagName("li")[0] ' Insert list text into the <li> tag hXMLElement.AppendText("Documenttyp – Deklaration/Prolog (Specification of the HTML used - Version).") ' Back to the 'ol' level in the document tree hXMLElement = hXMLElement.Parent hXMLElement.NewElement("li") hXMLElement = hHtmlDocument.GetElementsByTagName("li")[1] hXMLElement.AppendText("Kopf (head) - Header data such as 'title' or 'meta' details.") hXMLElement = hXMLElement.Parent hXMLElement.NewElement("li") hXMLElement = hHtmlDocument.GetElementsByTagName("li")[2] hXMLElement.AppendText("Körper (body) – Content as text with headings, references, graphic references ... .") hXMLElement = hXMLElement.Parent ' Create a new <p> tag hHtmlDocument.Body.NewElement("p") ' Find and save the third <p> tag by its name hXMLElement = hHtmlDocument.GetElementsByTagName("p")[2] ' Insert text into the <p> tag hXMLElement.AppendText("END -> LAST PARAGRAPH") Return hHtmlDocument End
With the help of the above function, the content of the original HTML file with inline CSS statements:
<!DOCTYPE html> <html> <head> <title> INDEX.HTML </title> <meta charset="utf-8" /> <link rel="icon" href="../images/favicon.png" /> <style> body {background-color: #C3DDFF;font-family: Verdana;font-size: 10px} h1 {font-family: Arial;font-size: 32px;color: blue;} .line {border:none; border-top:1px solid #0000FF; color:#FFFFFF; background-color: #FFFFFF; height: 1px;} </style> </head> <body> <br /> <img src="../images/html5_logo.png" alt="HTML5" style="width:256px;height:256px"> <br /> <h1> HTML-Datei im Webordner /var/www </h1> <hr class="line"></hr> </body> </html>
to an HTML5 file with modified content:
<!DOCTYPE html> <html lang="DE_de"> <head> <title> INDEX.HTML </title> <meta charset="utf-8" /> <meta name="author" content="Hans Lehmann - Osterburg - 2017" /> <meta name="description" content="Modification and extension of an HTML-File (DOM)." /> <link rel="icon" href="../images/favicon.ico" /> <style> body {background-color: #C3DDFF;font-family: "DejaVu Sans Mono";font-size: 14px} h1 {font-family: Arial;font-size: 32px;color: blue;} .line {border: none; border-top: 1px solid #0000FF; color: #FFFFFF; … ; height: 1px;} </style> </head> <body style="color:darkblue;"> <br /> <img src="../images/html5_logo.png" alt="HTML5" style="width:64px;height:64px" /> <br /> <h1> HTML-File in the web folder ~/public_html </h1> <hr class="line" /> <p> Lorem ipsum dolor sit amet, consectetur adipiscing elit. Mauris pretium elit massa. Nulla libero est, vestibulum eu lobortis ut, pulvinar eget tortor. Fusce congue laoreet commodo. Aliquam non nisl dolor, at posuere felis. Maecenas ac tortor erat. Duis a erat lectus, sed rutrum velit. Aliquam vehicula luctus ultricies. Nam dapibus elit eget lectus tincidunt pharetra. </p> <p> An ordinary HTML-File basically consists of the following sections: </p> <ol style="text-indent:2em; list-style:upper-roman"> <li> Documenttyp – Deklaration/Prolog (Specification of the HTML - Version is used). </li> <li> Head - Header data such as 'title' or 'meta' information. </li> <li> Körper (body) – Content as text with headings, references, graphic references ... . </li> </ol> <p> END -> LAST PARAGRAPH </p> </body> </html>
Figure 27.4.3.2.1: Content of the modified HTML file in a web browser
Projects