PDF Clown 0.1.2 — Multimedia and lots of good stuff

LATEST NEWS — On February 10, 2013 PDF Clown 0.1.2 has been released!

This release cycle revolves around these topics:

  1. Multimedia
  2. Text line alignment
  3. File references (file specifications, file identifiers, PDF stream object externalization)
  4. Advanced cloning
  5. Article threads

1. Multimedia


For a long time I kept low priority over multimedia features (chapter 9 of PDF Reference 1.7), but recently I received some solicitation about that on the project’s forum… so yes, video embedding through Screen annotations is now ready!

Screen annotations as implemented by PDF Clown feature a couple of nice JavaScript-based enhancements: video preview at arbitrary position (video is automatically loaded on page opening, ready to be played starting on a given time frame) and user control (YouTube-like play/pause behavior by mouse click on the player — this may seem obvious, but anyone who worked with these annotations knows how painful it is, requiring awkward workarounds like dedicated play/pause buttons…). Furthermore, a useful fall-back FileAttachment annotation is placed along its Screen annotation for gentle degradation in case the PDF viewer has no multimedia capabilities.

package org.pdfclown.samples.cli;

import java.awt.geom.Rectangle2D;

import org.pdfclown.documents.Document;
import org.pdfclown.documents.Page;
import org.pdfclown.documents.interaction.annotations.Screen;
import org.pdfclown.files.File;

/**
  This sample demonstrates how to insert screen annotations to display media clips inside
  a PDF document.

  @author Stefano Chizzolini (http://www.stefanochizzolini.it)
  @since 0.1.2
  @version 0.1.2, 09/14/12
*/
public class VideoEmbeddingSample
  extends Sample
{
  @Override
  public void run(
    )
  {
    // 1. Instantiate the PDF file!
    File file = new File();
    Document document = file.getDocument();

    // 2. Insert a new page!
    Page page = new Page(document);
    document.getPages().add(page);

    // 3. Insert a video into the page!
    new Screen(
      page,
      new Rectangle2D.Double(10, 10, 320, 180),
      "JOBI 4 - Sunflower",
      getResourcePath("video" + java.io.File.separator + "JOBI_4_Sunflower.mpg"),
      "video/mpeg"
      );

    // 4. Serialize the PDF file!
    serialize(file, "Video embedding", "inserting screen annotations to display media clips inside a PDF document");
  }
}

PS: The video clip depicted above represents the official “Sunflower” by milanese jazz-pop band JOBI 4, lead singer Federica Caiozzo (aka Thony). Check it out, they are really lovely: http://www.youtube.com/watch?v=yc6_Fj31Jbo

2. Text line alignment

Enhancing an appreciated code contribution by Manuel Guilbault, text line alignment now supports all the standard modes commonly available in typesetting environments (Top, Middle, Bottom, Super (absolute/relative) and Sub (absolute/relative)) and image inlining.

3. File references (file specifications, file identifiers, PDF stream object externalization)

Spurred by an engaging user request, file specification management (now modelled in org.pdfclown.documents.files namespace instead of the old org.pdfclown.documents.fileSpec) has been thoroughly revised to smoothly support PDF stream objects import/export from/to external files.

This practically means that, instead of embedding stream data directly into a PDF file, such data can reside in an external (local or remote) file and be linked from within the PDF file through a file specification object (org.pdfclown.documents.files.FileSpecification). Thus common resources such as images can be shared among multiple documents (useful for example in a server scenario where documents may be assembled on-the-fly).

Anyway, there’s a caveat to consider before approaching externalized streams: as they are prone to security issues, their actual support by PDF viewers is very restricted (e.g., see so-called “privileged locations” in Adobe Acrobat’s Enhanced Security preferences) or even non-existent (e.g., see Evince).

Here it is a code sample demonstrating how external references are applied to PDF stream objects:

  1. PDF stream data is exported and linked back [lines 62-68];
  2. linked files are imported back into their respective PDF stream objects [lines 95-98].
package org.pdfclown.samples.cli;

import org.pdfclown.documents.Document;
import org.pdfclown.documents.files.FileSpecification;
import org.pdfclown.files.File;
import org.pdfclown.files.SerializationModeEnum;
import org.pdfclown.objects.PdfDataObject;
import org.pdfclown.objects.PdfIndirectObject;
import org.pdfclown.objects.PdfStream;

/**
  This sample demonstrates how to move stream data outside PDF files and keep external
  references to them; it demonstrates also the inverse process (reimporting stream data
  from external files).
  Note that, due to security concerns, external streams are a discouraged feature which
  is often unsupported on third-party viewers and disabled by default on recent  Adobe
  Acrobat versions; in the latter case, in order to bypass restrictions and allow access
  to external streams, users have to enable Enhanced Security from the Preferences dialog,
  specifying privileged locations.

  @author Stefano Chizzolini (http://www.stefanochizzolini.it)
  @since 0.1.2
  @version 0.1.2, 09/24/12
*/
public class StreamExternalizationSample
  extends Sample
{
  @Override
  public void run(
    )
  {
    // 1. Externalizing the streams...
    String externalizedFilePath;
    {
      // 1.1. Opening the PDF file...
      File file;
      {
        String filePath = promptPdfFileChoice("Please select a PDF file");
        try
        {file = new File(filePath);}
        catch(Exception e)
        {throw new RuntimeException(filePath + " file access error.",e);}
      }
      Document document = file.getDocument();
      /*
        NOTE: As we are going to export streams using paths relative to the output path,
        it's necessary to ensure they are properly resolved (otherwise they will be
        written relative to the current user directory).
      */
      file.setPath(getOutputPath());

      // 1.2. Iterating through the indirect objects to externalize streams...
      int filenameIndex = 0;
      for(PdfIndirectObject indirectObject : file.getIndirectObjects())
      {
        PdfDataObject dataObject = indirectObject.getDataObject();
        if(dataObject instanceof PdfStream)
        {
          PdfStream stream = (PdfStream)dataObject;
          if(stream.getDataFile() == null) // Internal stream to externalize.
          {
            stream.setDataFile(
              FileSpecification.get(
                document,
                getClass().getSimpleName() + "-external" + filenameIndex++
                ),
              true // Forces the stream data to be transferred to the external location.
              );
          }
        }
      }

      // 1.3. Serialize the PDF file!
      externalizedFilePath = serialize(file, SerializationModeEnum.Standard);
    }

    // 2. Reimporting the externalized streams...
    {
      // 2.1. Opening the PDF file...
      File file;
      try
      {file = new File(externalizedFilePath);}
      catch(Exception e)
      {throw new RuntimeException(externalizedFilePath + " file access error.",e);}

      // 2.2. Iterating through the indirect objects to internalize streams...
      for(PdfIndirectObject indirectObject : file.getIndirectObjects())
      {
        PdfDataObject dataObject = indirectObject.getDataObject();
        if(dataObject instanceof PdfStream)
        {
          PdfStream stream = (PdfStream)dataObject;
          if(stream.getDataFile() != null) // External stream to internalize.
          {
            stream.setDataFile(
              null,
              true // Forces the stream data to be transferred to the internal location.
              );
          }
        }
      }

      // 2.3. Serialize the PDF file!
      String externalizedFileName = new java.io.File(externalizedFilePath).getName();
      String internalizedFilePath = externalizedFileName.substring(0, externalizedFileName.indexOf(".pdf")) + "-reimported.pdf";
      serialize(file, internalizedFilePath, SerializationModeEnum.Standard);
    }
  }
}

Working on file specifications involved also the support to file identifiers (PDF 1.7, § 10.3 — modelled by org.pdfclown.files.FileIdentifier class), which enforce referential integrity on document interchange. Their generation and update are now part of the document life cycle automatically managed by PDF Clown.

4. Advanced cloning

Since its inception, PDF Clown has supported a cloning mechanism capable of elegantly copying any structure/content of a PDF file without specialized code or torture-chamber algorithms (those exotic, lengthy, exhaustingly cumbersome monster methods you may sometime see when peering through the source of some well-known library…). Its implementation wasn’t complete, though: it couldn’t deal with circular references (which precluded annotations and some other structures) and there was no way to customize its filters on-the-fly in order to select just a graph subset to clone (which practically resolved in an identity transformation).

The good news is that 0.1.2 implementation overcomes such limitations leveraging the generic object visitor (org.pdfclown.objects.Visitor) through the Cloner class (org.pdfclown.objects.Cloner), which hosts a customizable collection of filters used to apply arbitrary transformations on cloning structures.

Let’s see an example. We want to copy a page into another PDF document (by the way: there’s a utility, org.pdfclown.tools.PageManager, which is purposely devoted to this activity, but here we want to dig deeply into its inner workings…):

import org.pdfclown.documents.Document;
import org.pdfclown.documents.Page;
import org.pdfclown.files.File;

...

String sourceFilePath = "myFilePath";
File sourceFile = null;
try
{sourceFile = new File(sourceFilePath);}
catch(Exception e)
{throw new RuntimeException(sourceFilePath + " file access error.",e);}
Page sourcePage = sourceFile.getDocument().getPages().get(0);

File targetFile = new File();
Document targetDocument = file.getDocument();
Page importPage = sourcePage.clone(targetDocument);
targetDocument.getPages().add(importPage);

That’s all: just one line [line 17] and our page is copied into the target document! Here it is the implementation of the PdfObjectWrapper.clone(Document) method inherited by Page class:

public Object clone(
  Document context
  )
{
  PdfObjectWrapper clone;
  try
  {clone = (PdfObjectWrapper)super.clone();}
  catch(CloneNotSupportedException e)
  {throw new RuntimeException(e);}
  clone.setBaseObject((PdfDirectObject)getBaseObject().clone(context.getFile()));
  return clone;
}

The magic is done by PdfObject.clone(File) [line 10], which clones the base PDF object (a PdfDictionary in this case) wrapped inside the high-level Page representation:

public PdfObject clone(
  File context
  )
{return accept(context.getCloner(), null);}

public PdfObject accept(
  IVisitor visitor,
  Object data
  )
{return visitor.visit(this, data);}

As mentioned above, Cloner is nothing but a specialized Visitor — for further details, it’s time you check out PDF Clown’s source code from its SVN repo, enjoy!

Thanks to Andreas Pinter for his contribution to solve the circular reference puzzle… ;-)

5. Article threads

The implementation of article threads offers, as usual, a smooth yet rich interface (see ComplexTypesettingSample for a live demonstration):

import org.pdfclown.documents.Document;
import org.pdfclown.documents.Page;
import org.pdfclown.documents.contents.composition.BlockComposer;
import org.pdfclown.documents.contents.composition.PrimitiveComposer;
import org.pdfclown.documents.interaction.navigation.page.Article;
import org.pdfclown.documents.interaction.navigation.page.ArticleElement;
import org.pdfclown.documents.interaction.navigation.page.ArticleElements;
import org.pdfclown.documents.interchange.metadata.Information;
import org.pdfclown.files.File;

...

File file = new File();
Document document = file.getDocument();

// Create the article thread!
Article article = new Article(document);
{
  Information articleInfo = article.getInformation();
  articleInfo.setTitle("The Free Software Definition");
  articleInfo.setAuthor("Free Software Foundation, Inc.");
}
// Get the article beads collection to populate!
ArticleElements articleElements = article.getElements();

Page page = new Page(document);
document.getPages().add(page);
PrimitiveComposer composer = new PrimitiveComposer(page);
BlockComposer blockComposer = new BlockComposer(composer);

... // adding contents through BlockComposer...

// Add the bead to the article thread!
articleElements.add(new ArticleElement(page, blockComposer.getBoundBox()));

...

2 thoughts on “PDF Clown 0.1.2 — Multimedia and lots of good stuff

    1. During the last months the library has been thoroughly enhanced, while the DOM Inspector is still under development — at the moment I cannot give you a realistic estimate about its release.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s