DAExtractPageText
Extraction, Direct access functionality, Page manipulation
Description
This function provides two different methods for extracting text from the selected page, and presents the results in a variety of formats.
The DASetTextExtractionWordGap, DASetTextExtractionOptions and DASetTextExtractionArea functions can be used to adjust the text extraction process.
Syntax
Delphi
function TDebenuPDFLibrary1811.DAExtractPageText(FileHandle, PageRef,
Options: Integer): WideString;
ActiveX
Function DebenuPDFLibrary1811.PDFLibrary::DAExtractPageText(FileHandle As Long,
PageRef As Long, Options As Long) As String
DLL
wchar_t * DPLDAExtractPageText(int InstanceID, int FileHandle, int PageRef,
int Options)
Parameters
FileHandle | A handle returned by the DAOpenFile, DAOpenFileReadOnly or DAOpenFromStream functions |
PageRef | A page reference returned by the DAFindPage or DANewPage functions |
Options |
Using the standard text extraction algorithm: 0 = Extract text in human readable format 1 = Deprecated 2 = Return a CSV string including font, color, size and position of each piece of text on the page Using the more accurate but slower text extraction algorithm: 3 = Return a CSV string for each piece of text on the page with the following format: Font Name, Text Color, Text Size, X1, Y1, X2, Y2, X3, Y3, X4, Y4, Text The co-ordinates are the four points bounding the text, measured using the units set with the SetMeasurementUnits function and the origin set with the SetOrigin function. Co-ordinate order is anti-clockwise with the bottom left corner first. 4 = Similar to option 3, but individual words are returned, making searching for words easier 5 = Similar to option 3 but character widths are output after each block of text 6 = Similar to option 4 but character widths are output after each line of text 7 = Extract text in human readable format with improved accuracy compared to option 0 8 = Similar output format as option 0 but using the more accurate algorithm. Returns unformatted lines. |