Dynamically Generating Document Samples — Part Two | by Raymond Camden


We’ve already looked at ways to generate document samples using Adobe Acrobat Services. But how can we do even better?

In our previous post, we discussed how to use Adobe Acrobat Services to generate PDF “samples” of documents.

So for example, an online bookseller could use our APIs to automate the process of creating a 5-page preview of the book with content prepended in front describing how the file was a sample only.

While that process worked well (and be sure to read that post!), we can take it a step further and create even better samples. How?

In our previous example, we used a static PDF with generic text to ‘prepend’ our samples. While it was kept simple for the demo and could have been embellished, the sample didn’t actually reflect the content of the same in any particular way.

We can greatly enhance our workflow by using dynamic content for our prepended document instead. This will be done using the Document Generation API.

The Document Generation API lets you use Microsoft Word to create a “template”.

The template is just a regular Word document with tokens intermixed with the regular content. The Document Generation API then lets you pass that Word doc to our service, along with your custom data, and together the content is merged and a new PDF is created. (You can also create a new Word document as well.)

If you haven’t had a chance yet to play with this API, check out our excellent quick stats for an introduction.

For now though just know we are going to use Word to create a template for our sample creation process. Let’s take a look at that document now.

Screenshot from MS Word, showing text with some words surrounded by tokens representing our variables.

In the screenshot above, you can see a mix of “regular” Word text along with some words surrounded by curly braces ({{ }}) in the text.

In the first paragraph, the words title, author, and pageCount are all surrounded by curly braces and represent tokens.

With the Document Generation API, we can send this Word document along with data for those values, and get a PDF generated with ‘real’ values replacing the tokens.

Now our sample will actually address the content it’s sampling, giving details about the document’s title, author, and number of pages.

As an aside, this is one of the simplest things you can do with the Document Generation API. Along with simple token replacements, you can have dynamic lists, tables, conditional content, and even dynamic images.

None of that is required for this particular demo, but keep in mind we’re barely scratching the surface here.

Your next question may be — how do we get the title, author, and number of pages of a PDF? We can make use of the PDF Properties API which, as you can probably guess, will return the properties of a PDF document.

Here’s an example of the output returned from a PDF:

Note that there is an optional argument, pageLevel, that returns even more information about page content. Another thing to note is that the SDKs return properties in a slightly different format. It’s the same data, just with different labels essentially.

Looking at the sample above, we can see our three values (Author, Title, page_count) are there, so let’s look at how we can bring this all together.

Let’s now turn to the updated workflow that will process our PDFs. If you read the previous post (and I highly encourage you to do so), you’ll remember there were two scripts.

The first one simply figured out what PDFs needed to be processed. It did this by looking at an input directory and seeing which files did not exist in a corresponding output folder.

In this version, the code in this file has changed in one small way. Instead of specifying a file for the PDF to prepend to our samples, we instead point to the Word doc:

As that’s the only real change, so let’s get right into the new PDF processing script. We’ll focus on the changes from the previous version.

First, let’s look at the main entry point, makeSample, which is called by the previous script:

From the top, we begin by getting our access token. We use the credentials generated by Adobe’s developer console and exchange them for the token.

Next, we upload the PDF that we’ll be creating a sample from and store it in the asset variable.

Now we need to generate our dynamic PDF that will be prepended. First, the Word doc is uploaded and the result is stored as prependAsset.

We need our data for the token. To accomplish that, we use a new function, makePropertiesJob, which wraps a REST API call to the PDF Properties endpoint.

Here’s that call:

When this job is done, we then need to copy out the value we care about: author, title, and page count.

Note that in our testing, author and title weren’t always available, hence the use of the Elvis operator to handle those cases. The end result is a new, simpler object named documentInfo, that contains the information we want to use in our generated PDF.

You can see this passed to our next new function, makeDocumentGenerationJob. Not surprisingly, this looks very similar to the last function, just with slightly different attributes and a new endpoint:

And that’s it! The rest of the main function simply uses the Combine PDF functionality that we used in the last post.

Remember, this API not only lets you combine multiple PDFs but also lets you specify a range of pages to combine. We use this to combine our ‘prepend’ PDF and a slice of the main PDF.

Here’s an example of what that first page would look like:

PDF version of previous screenshot, with tokens replaced with real values.

As an aside, in our use of Document Generation, all of the data came from the result of the PDF Properties API.

In case it wasn’t obvious, we can use any data. So for example, I could have included a token for the URL showing where someone might purchase the full book.

We would do that by adding the token to the Word document, and in our code, adding another key and value to the data passed to the Document Generation API.

If you want the complete source of this version, check out the repo at https://github.com/cfjedimaster/document-services-demos/tree/main/article_support/book_demo. This version uses makeSamplesv2.js and pdfProcessorv2.js. Be sure to sign up for your own credentials and let us know what you think!



Source link

Leave a Comment