Extracts the text of any page in a PDF file.
This function internally uses the direct access functionality. The entire file is not loaded into memory, so this function can be used on arbitrarily large documents.
Two different methods are provided for extracting text from the selected page in a variety of output formats.
Password: WideString; Page, Options: Integer): WideString;
InputFileName As String, Password As String, Page As Long,
Options As Long) As String
wchar_t * DPLExtractFilePageText(int InstanceID, wchar_t * InputFileName,
wchar_t * Password, int Page, int Options)
|InputFileName||The path and file name of the file to extract text from.|
|Password||The password to use, if any, when opening the file|
|Page||The number of the page that must be extracts. The first page in the document is page 1.|
Using the standard text extraction algorithm:
0 = Extract text in human readable format
1 = Deprecated
2 = Return a CSV string including font, color, size and position of each piece of text on the page
Using the more accurate but slower text extraction algorithm:
3 = Return a CSV string for each piece of text on the page with the following format:
Font Name, Text Color, Text Size, X1, Y1, X2, Y2, X3, Y3, X4, Y4, Text
The co-ordinates are the four points bounding the text, measured using the units set with the SetMeasurementUnits function and the origin set with the SetOrigin function. Co-ordinate order is anti-clockwise with the bottom left corner first.
4 = Similar to option 3, but individual words are returned, making searching for words easier
5 = Similar to option 3 but character widths are output after each block of text
6 = Similar to option 4 but character widths are output after each line of text
7 = Extract text in human readable format with improved accuracy compared to option 0
8 = Similar output format as option 0 but using the more accurate algorithm. Returns unformatted lines.