The PdfPage class is virtual and cannot be created. This virtual class represents a page of a PDF document. This allows you to access a specific page in a PDF document via its page index:
Dim iIndex As Integer $hPdfDocument = New PdfDocument(Dialog.Path) If $hPdfDocument.Count > 0 Then For iIndex = 0 To $hPdfDocument.Max Print "Label = "; $hPdfDocument[iIndex].Label Print "Height = "; $hPdfDocument[iIndex].H Print "Weight = "; $hPdfDocument[iIndex].W Print "Text = "; $hPdfDocument[iIndex].Text Next Endif
The virtual class PdfPage has the following properties:
Property | Data type | Description |
---|---|---|
Height or H | Integer | Returns the page height of a PDF page in pixels. |
Width or W | Integer | Returns the page width of a PDF page in pixels. |
Label | String | Returns the label (page number) of a PDF page. |
Text | String | Returns the text of a PDF page. |
Thumbnail | Image | Returns the thumbnail image of a PDF page. The return value is zero if there is no thumbnail image of the PDF page in the PDF document or if it is in an unsupported image format. |
Table 23.12.3.1.1 : Properties of the virtual class PdfPage
Please note that the page label and page index do not necessarily match. This is because Adobe Acrobat ©, for example, makes it possible to change the page numbers so that a PDF document starts with page number 3. In a PDF reader such as XReader, the result would then be that the PDF document starts with page number 3. Obviously, a PDF document actually has labels for the page number and these can be changed at will.
The PdfPage class has these three methods:
Method | Return type | Description |
---|---|---|
FindText ( Search As String [ , Options As Integer ] ) | RectF[ ] | Returns an array of rectangles with position and size containing the character string you are looking for. Note: The y-coordinates refer to the bottom of the page. ‘Search’ is the search text and ‘Options’ (optional) is a search option or a combination of search options (constants) from the Pdf class. |
GetText ( X As Float, Y As Float, Width As Float, Height As Float ) | String | Returns the text in the specified rectangle. The following applies: X is the x-coordinate of the top left rectangle point, Y is the y-coordinate of the top left rectangle point, Width is the width of the rectangle and Height is the height of the rectangle. |
Render ( [ X As Integer, Y As Integer, Width As Integer, Height As Integer, Rotation As Integer, Resolution As Float ] ) | Image | Render the page and returns the resulting image. Note the information in section 23.12.3.3 |
Table 23.12.3.2.1 : Methods of the virtual class PdfPage
To demonstrate the use of the FindText method ( Search As String [ , Options As Integer ] ), it is assumed that the file searchtext.pdf contains this text, for example:
G A M B A S - I N F O R M A T I O N E N ------------------------------------------------- Er programmiert in der Programmiersprache Gambas. Viele Konstanten gelten nur gambas-intern. Seit gestern spricht er nur noch gambasisch ... . Fazit: Gambas ist toll!
The following source code can be used to search for a specific text in a PDF file. The page, the number of locations found on the page and the coordinates of the text-enclosing rectangle are displayed in the IDE console:
Public Sub SearchText(sSearchText As String, Optional iSearchOption As Integer) Dim i As Integer Dim aRectF As New RectF[] Dim iFound As Boolean If IsNull(iSearchOption) Then iSearchOption = 0 For i = 0 To $hPdfDocument.Max aRectF = $hPdfDocument[i].FindText(sSearchText, iSearchOption) If aRectF.Count > 0 Then iFound = True Print ("Found on page ") & $hPdfDocument[i].Label & (" | Number of places where the word `") & sSearchText & ("` was found = "); aRectF.Count Print ("with the coordinates:") Print String(40, "-") For Each hRectF As RectF In aRectF Print "x: "; Round(hRectF.X, -1); " y: "; Round(hRectF.Y, -1); Print " | w: "; Round(hRectF.W, -1); " h: "; Round(hRectF.H, -1) Next Endif Next If Not iFound Then Print ("The word `") & sSearchText & ("` was not found in the PDF file!") End
Call the above procedure with the search text “Gambas” and two linked search options:
SearchText("Gambas", Pdf.CaseSensitive Or Pdf.WholeWordsOnly)
Gefunden auf Seite 1 | Anzahl der Fundstellen des Wortes `Gambas` = 1 Rechteck-Koordinaten: ---------------------------------------- x: 66,6 y: 736,3 | w: 52,9 h: 15,6
Calling the procedure (with internal standard search option) and the search text “Gambas”:
SearchText("Gambas")
Gefunden auf Seite 1 | Anzahl der Fundstellen des Wortes `Gambas` = 4 Rechteck-Koordinaten: ---------------------------------------- x: 306,9 y: 784,6 | w: 52,9 h: 15,6 x: 204,1 y: 768,5 | w: 49,8 h: 15,6 x: 226,9 y: 752,4 | w: 49,8 h: 15,6 x: 66,6 y: 736,3 | w: 52,9 h: 15,6
Calling the procedure with the search text “Gambass” and a search option:
SearchText("Gambass", Pdf.CaseSensitive)
Das Wort `Gambass` wurde in der PDF-Datei *nicht* gefunden!
If you use the GetText method ( X As Float, Y As Float, Width As Float, Height As Float ), you can either extract the complete existing text from each page of a PDF document or from a specific section of the PDF document:
Public Sub GetPlainText() Dim i As Integer For i = 0 To $hPdfDocument.Max Print "::::: PAGE "; i + 1; " :::::::::::::::::::::::::" Print "SEITEN-LABEL = "; $hPdfDocument[i].Label If $hPdfDocument[i].Text.Len > 0 Then Print $hPdfDocument[i].GetText(0, 0, $hPdfDocument[i].W, $hPdfDocument[i].H) '-- Variant: '-- Print $hPdfDocument[i].Text & gb.NewLine Else Print "No text exists!" Endif Next End
Benoit Minisini has changed the Render(…) method as of version 3.19.1 so that the best resolution for display in a DocumentView is calculated automatically if you specify the width and height arguments, but not the resolution argument. In this way, you no longer have to deal with the problem of converting between pixels, resolution and absolute size! You can then use the modified source code in a PDF project:
Public Sub DocumentView1_Draw(Page As Integer, Width As Integer, Height As Integer) Dim hImage As Image hImage = $PDF_Doc[Page].Render(0, 0, Width, Height) Paint.DrawImage(hImage, 0, 0) End