2/18/2023 0 Comments Foxit pdf extract textIf you are looking for support for Foxit PDF SDK, please click here. This article refers to a deprecated product. Int pageRef = DPL.DAFindPage(fileHandle, 1) ĭPL.DASetTextExtractionArea(35, 757, 229, 30) // Left, Top, Width, HeightĮxtractedContent = DPL.DAExtractPageText(fileHandle, pageRef, 8) įoxit Quick PDF Library gives you precision control over which text is extracted from the document. DPL.DASetTextExtractionArea(35, 757, 229, 30) // Left, Top, Width, HeightĭASetTextExtractionArea with DAExtractPageText int fileHandle = (Debenu)\DQPL ReleaseTester\TestFiles\Text\Adobe PDF Library.pdf", "") The page height is 792 points so it’s just a matter of subtracting 35 in our example above from 792 to give us 757 points. This means we need to adjust top parameter so that the top is measured bottom up, rather than from top down. SetOrigin cannot be used with DASetTextExtractionArea so the 0,0 coordinates are at the bottom left of the page by default. String ExtractedContent = DPL.GetPageText(8) ĭASetTextExtractionArea with ExtractFilePageText SetTextExtractionArea with GetPageText "") ĭPL.SetOrigin(1) // Sets 0,0 coordinate position to top left of page, default is bottom leftĭPL.SetTextExtractionArea(35, 35, 229, 30) // Left, Top, Width, Height Sample code demonstrating the use of the regular and DA functions for extracting text from a portion of the page is shown below: The key functions for this using regular memory functions are SetTextExtractionArea and for direct access (DA) functions it is DASetTextExtractionArea. The extract functions which include “area” in the name let you specify a rectangular area from which you wish to extract text. The interface provides methods to get a Shape object and a quadrilateral that encloses the text selection on the page as well as a method to retrieve the selected text as a string.Extract text from a defined rectangular area on a pageįoxit Quick PDF Library includes a range of functionality for extracting text from PDF files, but usually it’s for extract text from an entire page. The returned object TextSelection is an interface that describes the text found. PdfDoc.saveDocument ("C:\\doc_with_red_rectangle.pdf") where the text is being extracted from This is for debugging purposes only so you can see the rectangular area This will draw a red rectangle around the search area on the PDF page ("Text found in the defined rectangle " selection.getText()) TextSelection selection = page.getTextInArea(rectangle) Get text contained within a rectangle in a PDF page Rectangle rectangle = new Rectangle(100, 150, 250, 10) Define your position rectangle for the text to be identified on the page PDFDocument pdfDoc = new PDFDocument ("C:\\doc.pdf", null) saveDocument ( "C: \\doc_with_red_rectangle.pdf" ) draw (rectangle ) // save the PDF with the red rectangle getTextInArea (rectangle ) if (selection != null ) // This will draw a red rectangle around the search area on the PDF page // This is for debugging purposes only so you can see the rectangular area // where the text is being extracted from Graphics2D g2d = page. Rectangle rectangle = new Rectangle ( 100, 150, 250, 10 ) // Get text contained within a rectangle in a PDF page getPage ( 0 ) // Define your position rectangle for the text to be identified on the page // These coordinates are in 72 dpi. The full range of text extraction functions can be found in our online reference for. The text block functions let you retrieve the text block as well as information about the text bounds, font, color and size. PDFDocument pdfDoc = new PDFDocument ( "C: \\doc.pdf", null ) // get the first page The API now includes additional text extraction functions for extracting text as text blocks which can be easier to manage and parse.
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |