Table of Contents

23.12.3 Class PdfPage (gb.poppler)

The PdfPage class is virtual and cannot be created. This virtual class represents a page of a PDF document. This allows you to access a specific page in a PDF document via its page index:

  Dim iIndex As Integer
 
  $hPdfDocument = New PdfDocument(Dialog.Path)
 
  If $hPdfDocument.Count > 0 Then
     For iIndex = 0 To $hPdfDocument.Max
         Print "Label  = "; $hPdfDocument[iIndex].Label
         Print "Height = "; $hPdfDocument[iIndex].H
         Print "Weight = "; $hPdfDocument[iIndex].W
         Print "Text   = "; $hPdfDocument[iIndex].Text
     Next
  Endif

23.12.3.1 Properties

The virtual class PdfPage has the following properties:

PropertyData typeDescription
Height or HIntegerReturns the page height of a PDF page in pixels.
Width or WIntegerReturns the page width of a PDF page in pixels.
LabelStringReturns the label (page number) of a PDF page.
TextStringReturns the text of a PDF page.
ThumbnailImageReturns the thumbnail image of a PDF page. The return value is zero if there is no thumbnail image of the PDF page in the PDF document or if it is in an unsupported image format.

Table 23.12.3.1.1 : Properties of the virtual class PdfPage

23.12.3.1.1 Notes on the Label property

Please note that the page label and page index do not necessarily match. This is because Adobe Acrobat ©, for example, makes it possible to change the page numbers so that a PDF document starts with page number 3. In a PDF reader such as XReader, the result would then be that the PDF document starts with page number 3. Obviously, a PDF document actually has labels for the page number and these can be changed at will.

23.12.3.2 Methods

The PdfPage class has these three methods:

MethodReturn typeDescription
FindText ( Search As String [ , Options As Integer ] ) RectF[ ]Returns an array of rectangles with position and size containing the character string you are looking for. Note: The y-coordinates refer to the bottom of the page. ‘Search’ is the search text and ‘Options’ (optional) is a search option or a combination of search options (constants) from the Pdf class.
GetText ( X As Float, Y As Float, Width As Float, Height As Float ) StringReturns the text in the specified rectangle. The following applies: X is the x-coordinate of the top left rectangle point, Y is the y-coordinate of the top left rectangle point, Width is the width of the rectangle and Height is the height of the rectangle.
Render ( [ X As Integer, Y As Integer, Width As Integer, Height As Integer, Rotation As Integer, Resolution As Float ] )ImageRender the page and returns the resulting image. Note the information in section 23.12.3.3

Table 23.12.3.2.1 : Methods of the virtual class PdfPage

To demonstrate the use of the FindText method ( Search As String [ , Options As Integer ] ), it is assumed that the file searchtext.pdf contains this text, for example:

G A M B A S - I N F O R M A T I O N E N
-------------------------------------------------
Er programmiert in der Programmiersprache Gambas.
Viele Konstanten gelten nur gambas-intern.
Seit gestern spricht er nur noch gambasisch ... .
Fazit: Gambas ist toll!

The following source code can be used to search for a specific text in a PDF file. The page, the number of locations found on the page and the coordinates of the text-enclosing rectangle are displayed in the IDE console:

Public Sub SearchText(sSearchText As String, Optional iSearchOption As Integer)
 
  Dim i As Integer
  Dim aRectF As New RectF[]
  Dim iFound As Boolean
 
  If IsNull(iSearchOption) Then iSearchOption = 0
 
  For i = 0 To $hPdfDocument.Max
      aRectF = $hPdfDocument[i].FindText(sSearchText, iSearchOption)
      If aRectF.Count > 0 Then
         iFound = True
         Print ("Found on page ") &
               $hPdfDocument[i].Label & (" | Number of places where the word `") &
               sSearchText & ("` was found = "); aRectF.Count
         Print ("with the coordinates:")
         Print String(40, "-")
         For Each hRectF As RectF In aRectF
             Print "x: "; Round(hRectF.X, -1); "  y: "; Round(hRectF.Y, -1);
             Print "  |  w: "; Round(hRectF.W, -1); "  h: "; Round(hRectF.H, -1)
         Next
      Endif
  Next
 
  If Not iFound Then Print ("The word `") & sSearchText & ("` was not found in the PDF file!")
 
End

Call the above procedure with the search text “Gambas” and two linked search options:

SearchText("Gambas", Pdf.CaseSensitive Or Pdf.WholeWordsOnly)
Gefunden auf Seite 1 | Anzahl der Fundstellen des Wortes `Gambas` = 1
Rechteck-Koordinaten:
----------------------------------------
x: 66,6  y: 736,3  |  w: 52,9  h: 15,6

Calling the procedure (with internal standard search option) and the search text “Gambas”:

SearchText("Gambas")
Gefunden auf Seite 1 | Anzahl der Fundstellen des Wortes `Gambas` = 4
Rechteck-Koordinaten:

----------------------------------------
x: 306,9  y: 784,6  |  w: 52,9  h: 15,6
x: 204,1  y: 768,5  |  w: 49,8  h: 15,6
x: 226,9  y: 752,4  |  w: 49,8  h: 15,6
x: 66,6   y: 736,3  |  w: 52,9  h: 15,6

Calling the procedure with the search text “Gambass” and a search option:

SearchText("Gambass", Pdf.CaseSensitive)
Das Wort `Gambass` wurde in der PDF-Datei *nicht* gefunden!

If you use the GetText method ( X As Float, Y As Float, Width As Float, Height As Float ), you can either extract the complete existing text from each page of a PDF document or from a specific section of the PDF document:

Public Sub GetPlainText()
 
  Dim i As Integer
 
  For i = 0 To $hPdfDocument.Max
      Print "::::: PAGE "; i + 1; " :::::::::::::::::::::::::"
      Print "SEITEN-LABEL = "; $hPdfDocument[i].Label
      If $hPdfDocument[i].Text.Len > 0 Then
         Print $hPdfDocument[i].GetText(0, 0, $hPdfDocument[i].W, $hPdfDocument[i].H)
     '-- Variant:
     '-- Print $hPdfDocument[i].Text & gb.NewLine
      Else
         Print "No text exists!"
      Endif
  Next
 
End

23.12.3.3 Change to the render() method

Benoit Minisini has changed the Render(…) method as of version 3.19.1 so that the best resolution for display in a DocumentView is calculated automatically if you specify the width and height arguments, but not the resolution argument. In this way, you no longer have to deal with the problem of converting between pixels, resolution and absolute size! You can then use the modified source code in a PDF project:

Public Sub DocumentView1_Draw(Page As Integer, Width As Integer, Height As Integer)
 
  Dim hImage As Image
 
  hImage = $PDF_Doc[Page].Render(0, 0, Width, Height)
  Paint.DrawImage(hImage, 0, 0)
 
End