Extract Highlighted Text & Removing All Text from PDF Document using .NET

It has has introduced new features related to text manipulations and PDF/UA validation. This release supports extracting highlighted Text from PDF Documents. This release also supports removing all text from PDF Document.
By: Aspose
 
LANE COVE, Australia - July 17, 2018 - PRLog -- What's New in this Release?

Aspose team is very excited to announce the new version of Aspose.PDF for .NET 18.6. This new release has introduced new features related to text manipulations and PDF/UA validation. Along with that, it has also made some fixes to the bugs, reported in earlier versions of the API. It has been an essential requirement to extract highlighted text from PDF documents. Earlier it was possible to extract text from PDF documents on the basis of some specific regular expressions or by specifying a string to be searched. TextFragmentAbsorber and TextAbsorber classes of the API, have been being used quite often and efficiently to serve the purpose. However, regarding the requirement of extracting highlighted text from PDF document, it has investigated the feature and introduced TextMarkupAnnotation.GetMarkedText() and TextMarkupAnnotation.GetMarkedTextFragments() methods in API. Users can extract highlighted text from PDF document by filtering TextMarkupAnnotation and using mentioned methods. An example, demonstrating the feature usage has also been showcased in the API documentation. While removing text from PDF documents using earlier versions of the API, users needed to set found text as empty string. The performance overhead in this case was, to invoke a number of checks and adjustment operations of text position. Which was why, several performance issues were observed while performing such operations. It could not minimize the number of checks and adjustment operations, as they are essential in text editing scenarios. Moreover, users cannot determine, how many of text fragments will removed and adjusted when they are processed in loop. In Aspose.PDF for .NET 18.6, new Aspose.Pdf.Operators.TextShowOperator() method has been introduced, in order to remove all text from PDF pages. Therefore, we recommend using this method to remove all text from PDF document, as it surely minimizes the time and works very fast. In latest release of Aspose.PDF for .NET, all descendants of Aspose.Pdf.Operator were moved into namespace Aspose.Pdf.Operators. Thus 'new Aspose.Pdf.Operators.GSave()' should be used, instead of 'new Aspose.Pdf.Operator.GSave()'. While upgrading to latest version of the API, users will need to upgrade your existing code where users has used previous Aspose.Pdf.Operator namespace. It has have also worked for introducing Accessibility Features, thus introduced new features as part of work on 508 compliance (WCAG) such as PDF/UA validation feature was added and Tagged PDF support was added.  The list of important new and improved features are given below

·         Add feature "Extract Highlighted Text from HighlightTextMarkUpAnnotations" to the TextFragmentAbsorber class

·         Add support of OTF font when embedding in PDF

·         Text Extraction - Spaces are improperly embedded inside words

·         TableAbsorber throws exception while trying to access any row other than first row of first table or any other table than first

·         PDF to Image - Some contents are overlapping

·         PDF to JPEG - Incorrect output

·         TableAbsorber: incorrect table count in PDF

·         Text is overlapped when saving particular document as image or HTML

·         PDF to HTML - Object reference not set to an instance of an object

·         Conversion HTML to PDF produces incorrect output

·         PDF to PDFA - Comments are broken in resultant document

·         Flattening Fields is not flattening the Print button inside PDF

·         The output is too big after conversion to PDFA_1B format

·         After conversion PDF-to-PDFA the output contains corrupted diagram

·         The document loaded from HMTL file looks different then original

·         PDF to PDF/A-1b - the output PDF does not pass compliance test

·         PDF to PDF/A-1b - the output PDF does not pass compliance test

·         PDF to JPG - Blue gradient is darker in the JPG compared to the PPT slide PDF

·         PDF to JPG - Objects fading to transparent

·         PDF to JPG - transparent turns to white

·         DF to JPG - Objects fading to transparent causes image differences

·         PDF to JPG - Objects fading to transparent causes image differences

·         Yellow background not same after converting PDF to PDF/A

·         JPEG output loses the fade effect on the source document

Other most recent bug fixes are also included in this release.

Newly added documentation pages and articles

- Extract Highlighted Text from PDF Document: https://docs.aspose.com/display/pdfnet/Extract+Text+from+PDF#ExtractTextfromPDF-ExtractHighlightedTextfromPDFDocument

- Remove All Text from PDF Document: https://docs.aspose.com/display/pdfnet/Replace+Text+in+a+PDF+Document#ReplaceTextinaPDFDocument-RemoveAllTextfromPDFDocument

Overview: Aspose.PDF for .NET

Aspose.Pdf is a .Net Pdf component for the creation and manipulation of Pdf documents without using Adobe Acrobat. Create PDF by API, XML templates & XSL-FO files. It supports form field creation, PDF compression options, table creation & manipulation, graph objects, extensive hyperlink functionality, extended security controls, custom font handling, add or remove bookmarks; TOC; attachments & annotations; import or export PDF form data and many more. Also convert HTML, XSL-FO and MS WORD to PDF.

More about Aspose.PDF for .NET

- Homepage of Aspose.PDF for .NET: http://www.aspose.com/products/pdf/net

- Download Aspose.PDF for .NET at: http://www.aspose.com/downloads/pdf/net

- Read online documentation of Aspose.PDF for .NET at: http://www.aspose.com/docs/display/pdfnet/Home

Contact Information

Aspose Pty Ltd, Suite 163,

79 Longueville Road

Lane Cove, NSW, 2066

Australia

http://www.aspose.com/

sales@aspose.com


Phone: 888.277.6734

Fax: 866.810.9465

Contact
Aspose
***@aspose.com
End
Source:Aspose
Email:***@aspose.com Email Verified
Tags:Extract Highlighted PDF Text, Remove all Text PDF, .NET PDF APIs
Industry:Software
Location:Lane Cove - New South Wales - Australia
Account Email Address Verified     Account Phone Number Verified     Disclaimer     Report Abuse



Like PRLog?
9K2K1K
Click to Share