Extract text from PDFs as a text block list

Debenu Quick PDF Library provides an extensive API for programmatically extracting text from PDF files. This includes the options of just plain text output and also returning the text in a formatted CSV string with details about the font, size and style of the text. The API now includes additional text extraction functions for extracting […]

Memory Management and the DLL and LIB editions of Debenu Quick PDF Library

The creation and release of memory buffers are handled automatically in most scenarios by the DLL and LIB editions of Debenu Quick PDF Library. There is an internal buffer where all string results are stored. The AnsiStringResultLength function can be called to get the length of this buffer. This function returns the length of the […]

Incremental Updates in PDF files

Incremental updates provide a method for updating a PDF file without completely re-writing it, according to the PDF specification (1.7), incremental updates work like this: The contents of a PDF file can be updated incrementally without rewriting the entire file. Changes are appended to the end of the file, leaving its original contents intact. This […]

Extract paths from a PDF

Debenu Quick PDF Library does not currently support the extraction of path information. However, the GetContentStreamToString function will extract the content stream which contains all of the drawing commands. You would need to parse the content stream to extract the paths as well as processing transformations including rotation and scaling. Here is the contents of […]

Controlling the precision of numeric values in PDF files

Debenu Quick PDF Library includes a function called SetPrecision which allows you to control the precision of numeric values in PDF files. In a PDF all numeric values are stored as strings. So using a smaller precision would mean a number takes up less characters in the file. If a PDF has a lot of […]

Automatic Generation of Table of Contents from PDF Bookmarks

Using Debenu PDF Aerialist it is simple to create a Table of Contents or TOC from existing bookmarks in your PDF document. The steps are as follows: Open your PDF in Adobe Acrobat (make sure you have Debenu PDF Aerialist installed first) Go to Plug-Ins > Debenu PDF Aerialist > Table of Contents The Build […]

Automatically generate bookmarks in PDF files using text masks

Debenu PDF Aerialist provides a range of options for automatically generating bookmarks in PDF files based on font name, font size, font color, left indentation, text masks and keyword lists. These options are present to give you a range of options to generate bookmarks based on text in your documents because not all PDF files […]

Using Text Masking in Debenu PDF Aerialist

Text masks are supported when generating bookmarks or tables of content in Debenu PDF Aerialist. The text masking property has a number of special characters allowing you to selectively create bookmarks. They are described in detail below with simple examples provided. Note: The user should be aware that the Bookmarks feature processes text by lines […]

Converting Pixels and Inches to PostScript Points

Pages in a PDF use points (1/72 of an inch) as the default measurement units with the origin or the coordinate system at the bottom left corner of the page. So there is a constant 72 involved in calculations when converting to or from points in PDF. This is the ratio of the “points” measurement units […]

Can I compress my PDF?

A PDF can’t be compressed like an image can be compressed. “PDF” is more of a container for various elements. So inside a PDF various things such as image data, font data, content data and so on can be compressed, but these have to be compressed separately. If you want to reduce the file size […]