Debenu Quick PDF Library includes a range of functionality for extracting text from PDF files, but usually it’s for extract text from an entire page. The extract functions which include “area” in the name let you specify a rectangular area from which you wish to extract text. The key functions for this using regular memory functions are SetTextExtractionArea and for direct access (DA) functions it is DASetTextExtractionArea.

Sample code demonstrating the use of the regular and DA functions for extracting text from a portion of the page is shown below:

SetTextExtractionArea with GetPageText

1
2
3
4
5
DPL.LoadFromFile(@"Sample.pdf", "");
DPL.SetOrigin(1); // Sets 0,0 coordinate position to top left of page, default is bottom left
DPL.SetTextExtractionArea(35, 35, 229, 30); // Left, Top, Width, Height
string ExtractedContent = DPL.GetPageText(8);
Console.WriteLine(ExtractedContent);

DASetTextExtractionArea with ExtractFilePageText

SetOrigin cannot be used with DASetTextExtractionArea so the 0,0 coordinates are at the bottom left of the page by default. This means we need to adjust top parameter so that the top is measured bottom up, rather than from top down. The page height is 792 points so it’s just a matter of subtracting 35 in our example above from 792 to give us 757 points.

1
2
3
DPL.DASetTextExtractionArea(35, 757, 229, 30); // Left, Top, Width, Height
ExtractedContent = DPL.ExtractFilePageText(@"Sample.pdf", "", 1, 8);
Console.WriteLine(ExtractedContent);

DASetTextExtractionArea with DAExtractPageText

1
2
3
4
5
int fileHandle = DPL.DAOpenFile(@"C:\Users\Rowan\Dropbox (Debenu)\DQPL ReleaseTester\TestFiles\Text\Adobe PDF Library.pdf", "");
int pageRef = DPL.DAFindPage(fileHandle, 1);
DPL.DASetTextExtractionArea(35, 757, 229, 30); // Left, Top, Width, Height
ExtractedContent = DPL.DAExtractPageText(fileHandle, pageRef, 8);
Console.WriteLine(ExtractedContent);

Debenu Quick PDF Library gives you precision control over which text is extracted from the document.