Thank you for your interest in our data extraction add-on pdf2Data, we hope you will enjoy using our product and share your experiences with us and the iText community. We will walk you through the installation process, from downloading iText 7 pdf2Data to adding the dependency to your Java build tool.

If you require any extra help please have a look at our FAQs or the community discussion at StackOverflow. If you are interested in getting support from our in-house developers and/or a license key for commercial iText products, you will need to acquire a commercial license.

Before you install

  • Make sure you have purchased a commercial license for iText 7 Core and pdf2Data if using them for commercial purposes. All downloads we offer closed-source come with our commercial license model.
  • Check the compatibility matrix to ensure the version you specify when adding the add-on's dependency matches the version of iText 7 Core you have a license for.

  • Install iText 7 Core, you can find the installation guide here .
  • Important remark: in the installation guide we use Maven as a build tool for Java : iText 7 Core


The fastest way to start with pdf2Data is to create a data field template in the online editor. Upload your template PDF file to the  first step of the process , use the data field editor to mark entities to be recognized, download the template and use it in your automated environment for extracting data from a series of PDF files.

Refer to the  videos  section on the pdf2Data demo site to quickly get familiarized with the user interface.

License key

If you want to use pdf2Data in your environment, you need to have a license key. The license key is an XML file which you have to load into the license key library before using any API.

If you are using other iText add-ons as well, your license keys might be stored in multiple files, especially if you purchased the add-ons separately. In this case you can load several licenses into the license key library one by one, or by passing an array of the license keys to the license key library.

To get a free trial license please fill out this  form . To get information about pricing, please use  request a quote  form or  contact us directly .

Using pdf2Data in code

The preferred way to set up pdf2Data in Java is to use a build system like Maven or Gradle and download pdf2Data artifacts from the iText Artifactory located at  https://repo.itextsupport.com/pdf2data/ .

The groupId is   com.duallab.pdf2data , and the artifactId is   pdf2data

In Maven, the configuration would look similar to the example below :

<repository>
	<id>pdf2Data</id>
	<name>pdf2Data Maven Repository</name>
	<url>https://repo.itextsupport.com/pdf2data</url>
</repository>


<dependency>
	<groupId>com.duallab.pdf2data</groupId>
	<artifactId>pdf2data</artifactId>
	<version>$release-pdf2Data-variable</version>
</dependency>

Example of how pdf2Data can be used in code:

// Make sure to load license file before invoking any code
LicenseKey.loadLicenseFile(pathToLicenseFile);

// Parse template into an object that will be used later on
Template template = Pdf2DataExtractor.parseTemplateFromPDF(pathToPdfTemplate);

// Create an instance of Pdf2DataExtractor for the parsed template
Pdf2DataExtractor extractor = new Pdf2DataExtractor(template);

// Feed file to be parsed against the template. Can be called multiple times for different files
ParsingResult result = extractor.recognize(pathToFileToParse);

// Save result to XML or explore the ParsingResult object to fetch information programmatically
result.saveToXML(pathToOutXmlFile);


Installation instructions for the data fields editor web application (aka pdf2Data template editor)

If you want to use the editor in your environment, follow these installation instructions:

Prerequisites

  • Apache Tomcat 7 (≥ 7.0.77) or 8
  • Java 8

Installation steps

Command Line Interface

  1. Download the war file of the version you are interested in from the iText Artifactory
  2. Create a properties file with the following contents:
    # Set temporary directory for resources
    dir.temp=your_folder_for_resources
    
    # Path to iText license file, e.g. licensekey=/home/user/license.xml
    licensekey=path_to_license_file.xml
  3. Create an environment variable PDF2DATA_PROPERTIES and set it to the path of the file from the previous step
  4. Deploy the application on the installed Tomcat server. In most cases it is sufficient to copy the war file into the webapps subdirectory in the Tomcat directory
  5. Start the Tomcat server, if it was not running before, and you are ready to go

Command Line Interface

It is possible to use pdf2Data from the command line as long as you have Java 7 or 8 installed .

You can download the CLI application from the  iText Artifactory .

The steps are similar to the ones you would typically do in code. The output format for data extraction is XML .

Creating template entity from a template PDF

java -jar cli.jar preprocess -t template.pdf -x template.xml -l license.xml

File recognition

java -jar cli.jar parse -t template.xml -s file_for_parsing.pdf -p recognized.pdf -x recognized.xml

Help information

java -jar cli.jar help preprocess
java -jar cli.jar help parse