For searching and replacing strings in a given text, regular expressions on the one hand and methods for manipulating strings on the other hand were used in combination in a project. You can search for different - fixed - strings in the text that have a special meaning or enter your patterns yourself:
Figure 19.6.6.1: Searching and replacing strings in a text
It turns out that for complex texts, a pattern must be constructed for each class of strings so that it has at least one unique feature that distinguishes it from other classes. Otherwise, for example, you would not be able to distinguish the monetary value 12.44 € from 12.44 as a normal decimal number. Currently, you can only search for ISBN (10) formatted with hyphens according to ISO 2108.
The source code is given almost in its entirety, mainly to show you the interaction of regular expressions and methods for manipulating strings:
' Gambas class file Private $rExpression As Regexp Private $sPattern As String Public Sub Form_Open() $rExpression = New Regexp ... End Public Sub btnSearch_Click() If Not cbxRegexpression.Text Then Message.Error("Es ist KEIN Suchtext vorhanden!") cbxRegexpression.SetFocus Return Endif Try $rExpression.Compile(SetPattern(cbxRegexpression.Text)) ' Fehler im Muster abfangen! If Error Then Message.Error("FEHLER IM REGULÄREN AUSDRUCK!") Return Endif lbxSearch.List = Search(txaText.Text) End Public Sub btnReplace_Click() If Not cbxRegexpression.Text Then Message.Error("Es ist KEIN Suchtext vorhanden!") Return cbxRegexpression.SetFocus() Endif If Not txtReplace.Text Then If Message.Error("KEIN Text vorhanden!", "Mit NICHTS ersetzen!", "Abbrechen!") = 1 Then Try $rExpression.Compile(SetPattern(cbxRegexpression.Text)) If Error Then Message.Error("FEHLER IM REGULÄREN AUSDRUCK!") Return Endif txaText.Text = TextReplace(txaText.Text, txtReplace.Text) Else Return cbxRegexpression.SetFocus() Endif Endif End Private Function Search(sText As String) As String[] Dim aSearchList As New String[] Dim iStart As Integer = 1 Try $rExpression.Exec(sText) ' ---> Fehler abfangen! If Error Then Message.Error("Es trat ein Fehler auf ...!") Return Endif While $rExpression.Offset <> -1 aSearchList.Add($rExpression.Text) iStart += $rExpression.Offset + Len($rExpression.Text) If iStart > Len(sText) Then Break $rExpression.Exec(Mid$(sText, iStart)) Wend Return aSearchList End Private Function TextReplace(sText As String, sReplace As String) As String Dim iStart As Integer = 1 Try $rExpression.Exec(sText) ' ---> Fehler abfangen! If Error Then Message.Error("Es trat ein Fehler auf ...!") Return Endif While $rExpression.Offset <> -1 iStart += $rExpression.Offset sText = Mid$(sText,1,iStart - 1) & sReplace & Mid$(sText, iStart + Len($rExpression.Text)) iStart += Len(sReplace) If iStart > Len(sText) Then Break $rExpression.Exec(Mid$(sText, iStart)) Wend Return sText End '-- Die Funktion SetPattern() ist nur notwendig, um den Mustern einen Hinweistext voranzustellen. Private Function SetPattern(sInput As String) As String Dim iPosition As Integer iPosition = InStr(sInput, "--->") If iPosition = 0 Then $sPattern = sInput Else $sPattern = Replace(sInput, Left(sInput, iPosition + 3), "") $sPattern = Trim($sPattern) Endif Return $sPattern End
As a special feature, the two methods Regexp.Compile() and Regexp.Exec() are used in the project.
Syntax: Sub Compile ( Pattern As String [ , CompileOptions As Integer ] )
The Compile() method allows you to pre-compile a regular expression (pattern) for later execution by the Exec method. This is useful if you want to use a pattern often for a lot of text.
Syntax: Sub Exec ( Subject As String [ , ExecOptions As Integer ] )
The Exec() method allows you to use a previously compiled regular expression. This is particularly useful if you want to check many different texts. You only need to compile a regular expression once and can then use Exec(..) repeatedly, which is expected to be faster.
The Compile(..) and Exec(..) methods are automatically executed when you specify a pattern and a text (subject) and call a (new) regexp object.
You can use a selection of the following constants as options in the two methods:
Anchored BadMagic BadOption BadUTF8 BadUTF8Offset Callout Caseless DollarEndOnly DotAll Extended Extra MatchLimit MultiLine NoAutoCapture NoMatch NoMemory NoSubstring NoUTF8Check NotBOL NotEOL NotEmpty Null UTF8 Ungreedy UnknownNode
Example: Regexp.Anchored → Constant Anchored As Integer = 16
If this constant is specified as an option, the pattern will be “anchored” to the first matching position in the subject. This effect can also be achieved by suitable constructs in the pattern itself.
In the next source code excerpt from a project for converting Gambas source code into HTML source code, among other things, a URL is searched for in the Gambas source code and replaced by a suitable hyperlink if the result is positive:
Private Sub parseLinks(sURL As String) As String Dim sRegex As Regexp Dim sPattern, sReplace As String sPattern = "(https?://([-\\w\\.]+)+(:\\d+)?(/([\\w/_\\.-]*(\\?\\S+)?)?)?)|(@[\\w]*|#[\\w]*)" sRegex = New Regexp(sURL, sPattern) sReplace = "<a href=\"" & sRegex.Text & "\">" & sRegex.Text & "</a>" Return Replace(sURL, sRegex.Text, sReplace) End Public Sub Test_Click() Print parseLinks("http://www.gambas-buch.de") End
The result can be seen:
<a href="http://www.gambas-buch.de">http://www.gambas-buch.de</a>