PDF Clown's Blog

Developing a free/libre open source PDF library

Waiting for PDF Clown 0.2.0 release

leave a comment »

NOTE — As 0.2.0 version is currently under development, new features described below will appear in the trunk (HEAD revision) of PDF Clown’s SVN repository before the distribution release.
NOTE — If you are interested in the current developments of PDF Clown, you may follow PDF Clown on Twitter for the latest news and live comments!
NOTE — Version 0.1.3 release has been frozen due to ContentScanner refactoring (see section “Content stream manipulation”, below): once it’s completed, DOM Inspector‘s development will be restored.

1. Content stream manipulation

Since its very inception, I have been really delighted by the concept subtending the ContentScanner class, as it proved to be a versatile processor for handling content stream object trees along with their graphics state: you could use it directly to read existing content streams, modify them and also create new ones in a convenient object-oriented fashion, or it could be plugged into specialized tools (e.g. PrimitiveComposer, TextExtractor, Renderer, etc.) for more advanced applications. But till version series 0.1.x it suffered a significant drawback: it lacked separation of concerns from its object model, that is the algorithmic responsibility to carry out the tasks was delegated to the respective content stream operations. This may work well in case there’s just a single task (“read/write the content stream”), but when further tasks are required (e.g. rendering the content stream into a graphics context) it rapidly becomes unbearable.

Therefore I proceeded with a massive refactoring which was informed by two main concurrent requirements: algorithmic separation between process and structure (accomplished through the classic Visitor pattern) and preservation of the distinctive cursor-based behavior of ContentScanner (solved through dedicated impedance-matching logic).

All the non-core functionalities which were bloating the original ContentScanner (like rendering and content wrappers) have been extracted into specialized processors (respectively: ContentRenderer and ContentModeller), resulting in the following classes:

  • ContentVisitor: abstract content stream processor supporting all the common graphics state transformations;
  • ContentScanner: (read/write) multi-purpose cursor-based processor;
  • ContentModeller: (read-only) modelling processor (generates declarative forms (GraphicsElement hierarchy) of the corresponding content objects);
  • ContentRenderer: (read-only) rendering processor (generates raster representations of the content stream).

ContentScanner refactored

1.1. ContentScanner

ContentScanner‘s new implementation focuses exclusively on its core purpose, that is to enable users to manipulate content streams through their low-level, procedural, stacked model (operations and composite objects along with their graphics state).

1.2. ContentModeller

ContentModeller works as a parser which maps the low-level content stream model to its high-level, declarative, flat representation through a dedicated model rooted in GraphicsElement abstract class (which corresponds to GraphicsObjectWrapper hierarchy of ContentScanner’s old implementation). This simplified-yet-equivalent representation can be modified and saved back into the content stream.

1.3. ContentRenderer

ContentRenderer works on content rasterization (that is page imaging and printing). Its reimplementation spurred enhancements in text rendering, image object rasterization and color space management (more on that soon — stay tuned!).

Written by stechio

September 12, 2014 at 3:35 pm

Waiting for PDF Clown 0.1.3 release

with 4 comments

NOTE — As 0.1.3 version is currently under development, the new features described below are available (except PDF Clown DOM Inspector, which is still offline) through the trunk (HEAD revision) of PDF Clown’s SVN repository.

1. PDF Clown DOM Inspector

Since its earliest versions, PDF Clown has been shipped including a simple Swing-based proof of concept for viewing PDF file structures. Now that little fledgling is going to become a comprehensive tool for the visual editing of the structure of PDF files: PDF Clown DOM Inspector. It was initially planned to be part of 0.1.2 version as a dedicated project within the PDF Clown distribution, but approaching the release deadline it wasn’t ready yet.

This tool conforms to the PDF model as defined by PDF Clown (see the diagram above), which adheres to the official PDF Reference 1.7/ISO 32000-1. This implies that a PDF file is represented through several concurrent views which work at different abstraction levels: Document view (document layer), File view (file/object layer, hierarchical) and XRef view (file/object layer, flat).

1.1. Document view

Document view (see the left pane in the above screenshot) shows the high-level structure of a PDF file; selecting a node, its data is shown in the right pane through several views — in this case, selecting a page node shows its content stream structure (Contents view, see below) and its rendering (Render view [¹], see above). Note that the page model represented by both Contents view and Render view corresponds to the content (sub)layer described in the diagram above.

Here it is just one of the possible functionalities: hovering the mouse pointer over a show-text-operation node, a tooltip pops up revealing the actual text encoded inside it (in this example, inspecting a russian-language document):

There’s such a potential for custom features that I’m considering to make it pluggable so as to let it be extended with additional modules, at user’s will.

1.2. File view

File view shows the low-level representation of the same entities you found in the above-mentioned Document view, expressed as primitive objects like dictionaries (PdfDictionary), arrays (PdfArray), streams (PdfStream) and so on.

1.3. XRef view

XRef view lists the entries of the cross-reference index (either table or stream, but that’s a technical detail you can happily ignore as it’s transparently handled by the library).

It’s really interesting to note that all the views (Document, File, XRef) are always kept synchronized: when you select a node in one of these views, its corresponding entities in each of the others are automatically selected, allowing to seamlessly switch from one view to another.

[¹] Rendering is still partial as it’s under development (pre-alpha stage).

Written by stechio

February 11, 2013 at 1:12 am

Posted in Development

Tagged with , , , ,

PDF Clown 0.1.2 has been released!

leave a comment »

This release enhances several base structures, providing fully automated object change tracking and object cloning (allowing, for example, to copy page annotations and Acroform fields). It adds support to video embedding, article threads, page labels and several other functionalities.

This release may be downloaded from:
https://sourceforge.net/projects/clown/files/PDFClown-devel/0.1.2%20Beta/

Written by stechio

February 11, 2013 at 12:59 am

Posted in Release

Tagged with ,

What about screencasts on PDF Clown use?

with 2 comments

I’m considering to make screencasts on the use of the library.

Topics are still under definition: what would you like to see in action?

Unleash your curiosity and let me know!

PS: I use open-source IDEs only, so don’t expect me to tweak around with proprietary tools like MS Visual Studio… ;-)

Written by stechio

January 20, 2012 at 7:18 pm

Posted in Help

Tagged with ,

Waiting for PDF Clown 0.1.2 release

with 2 comments

[NOTE: this post was updated on February 10, 2013]

Latest news: on February 10, 2013 PDF Clown 0.1.2 has been released!

1. Multimedia


For a long time I kept low priority over multimedia features (chapter 9 of PDF Reference 1.7), but recently I received some solicitation about that on the project’s forum… so yes, video embedding through Screen annotations is now ready!

Screen annotations as implemented by PDF Clown feature a couple of nice JavaScript-based enhancements: video preview at arbitrary position (video is automatically loaded on page opening, ready to be played starting on a given time frame) and user control (YouTube-like play/pause behavior by mouse click on the player — this may seem obvious, but anyone who worked with these annotations knows how painful it is, requiring awkward workarounds like dedicated play/pause buttons…). Furthermore, a useful fall-back FileAttachment annotation is placed along its Screen annotation for gentle degradation in case the PDF viewer has no multimedia capabilities.

package org.pdfclown.samples.cli;

import java.awt.geom.Rectangle2D;

import org.pdfclown.documents.Document;
import org.pdfclown.documents.Page;
import org.pdfclown.documents.interaction.annotations.Screen;
import org.pdfclown.files.File;

/**
  This sample demonstrates how to insert screen annotations to display media clips inside
  a PDF document.

  @author Stefano Chizzolini (http://www.stefanochizzolini.it)
  @since 0.1.2
  @version 0.1.2, 09/14/12
*/
public class VideoEmbeddingSample
  extends Sample
{
  @Override
  public void run(
    )
  {
    // 1. Instantiate the PDF file!
    File file = new File();
    Document document = file.getDocument();

    // 2. Insert a new page!
    Page page = new Page(document);
    document.getPages().add(page);

    // 3. Insert a video into the page!
    new Screen(
      page,
      new Rectangle2D.Double(10, 10, 320, 180),
      "JOBI 4 - Sunflower",
      getResourcePath("video" + java.io.File.separator + "JOBI_4_Sunflower.mpg"),
      "video/mpeg"
      );

    // 4. Serialize the PDF file!
    serialize(file, "Video embedding", "inserting screen annotations to display media clips inside a PDF document");
  }
}

PS: The video clip depicted above represents the official “Sunflower” by milanese jazz-pop band JOBI 4, lead singer Federica Caiozzo (aka Thony). Check it out, they are really lovely: http://www.youtube.com/watch?v=yc6_Fj31Jbo

2. Text line alignment

Enhancing an appreciated code contribution by Manuel Guilbault, text line alignment now supports all the standard modes commonly available in typesetting environments (Top, Middle, Bottom, Super (absolute/relative) and Sub (absolute/relative)) and image inlining.

3. File references (file specifications, file identifiers, PDF stream object externalization)

Spurred by an engaging user request, file specification management (now modelled in org.pdfclown.documents.files namespace instead of the old org.pdfclown.documents.fileSpec) has been thoroughly revised to smoothly support PDF stream objects import/export from/to external files.

This practically means that, instead of embedding stream data directly into a PDF file, such data can reside in an external (local or remote) file and be linked from within the PDF file through a file specification object (org.pdfclown.documents.files.FileSpecification). Thus common resources such as images can be shared among multiple documents (useful for example in a server scenario where documents may be assembled on-the-fly).

Anyway, there’s a caveat to consider before approaching externalized streams: as they are prone to security issues, their actual support by PDF viewers is very restricted (e.g., see so-called “privileged locations” in Adobe Acrobat’s Enhanced Security preferences) or even non-existent (e.g., see Evince).

Here it is a code sample demonstrating how external references are applied to PDF stream objects:

  1. PDF stream data is exported and linked back [lines 62-68];
  2. linked files are imported back into their respective PDF stream objects [lines 95-98].
package org.pdfclown.samples.cli;

import org.pdfclown.documents.Document;
import org.pdfclown.documents.files.FileSpecification;
import org.pdfclown.files.File;
import org.pdfclown.files.SerializationModeEnum;
import org.pdfclown.objects.PdfDataObject;
import org.pdfclown.objects.PdfIndirectObject;
import org.pdfclown.objects.PdfStream;

/**
  This sample demonstrates how to move stream data outside PDF files and keep external
  references to them; it demonstrates also the inverse process (reimporting stream data
  from external files).
  Note that, due to security concerns, external streams are a discouraged feature which
  is often unsupported on third-party viewers and disabled by default on recent  Adobe
  Acrobat versions; in the latter case, in order to bypass restrictions and allow access
  to external streams, users have to enable Enhanced Security from the Preferences dialog,
  specifying privileged locations.

  @author Stefano Chizzolini (http://www.stefanochizzolini.it)
  @since 0.1.2
  @version 0.1.2, 09/24/12
*/
public class StreamExternalizationSample
  extends Sample
{
  @Override
  public void run(
    )
  {
    // 1. Externalizing the streams...
    String externalizedFilePath;
    {
      // 1.1. Opening the PDF file...
      File file;
      {
        String filePath = promptPdfFileChoice("Please select a PDF file");
        try
        {file = new File(filePath);}
        catch(Exception e)
        {throw new RuntimeException(filePath + " file access error.",e);}
      }
      Document document = file.getDocument();
      /*
        NOTE: As we are going to export streams using paths relative to the output path,
        it's necessary to ensure they are properly resolved (otherwise they will be
        written relative to the current user directory).
      */
      file.setPath(getOutputPath());

      // 1.2. Iterating through the indirect objects to externalize streams...
      int filenameIndex = 0;
      for(PdfIndirectObject indirectObject : file.getIndirectObjects())
      {
        PdfDataObject dataObject = indirectObject.getDataObject();
        if(dataObject instanceof PdfStream)
        {
          PdfStream stream = (PdfStream)dataObject;
          if(stream.getDataFile() == null) // Internal stream to externalize.
          {
            stream.setDataFile(
              FileSpecification.get(
                document,
                getClass().getSimpleName() + "-external" + filenameIndex++
                ),
              true // Forces the stream data to be transferred to the external location.
              );
          }
        }
      }

      // 1.3. Serialize the PDF file!
      externalizedFilePath = serialize(file, SerializationModeEnum.Standard);
    }

    // 2. Reimporting the externalized streams...
    {
      // 2.1. Opening the PDF file...
      File file;
      try
      {file = new File(externalizedFilePath);}
      catch(Exception e)
      {throw new RuntimeException(externalizedFilePath + " file access error.",e);}

      // 2.2. Iterating through the indirect objects to internalize streams...
      for(PdfIndirectObject indirectObject : file.getIndirectObjects())
      {
        PdfDataObject dataObject = indirectObject.getDataObject();
        if(dataObject instanceof PdfStream)
        {
          PdfStream stream = (PdfStream)dataObject;
          if(stream.getDataFile() != null) // External stream to internalize.
          {
            stream.setDataFile(
              null,
              true // Forces the stream data to be transferred to the internal location.
              );
          }
        }
      }

      // 2.3. Serialize the PDF file!
      String externalizedFileName = new java.io.File(externalizedFilePath).getName();
      String internalizedFilePath = externalizedFileName.substring(0, externalizedFileName.indexOf(".pdf")) + "-reimported.pdf";
      serialize(file, internalizedFilePath, SerializationModeEnum.Standard);
    }
  }
}

Working on file specifications involved also the support to file identifiers (PDF 1.7, § 10.3 — modelled by org.pdfclown.files.FileIdentifier class), which enforce referential integrity on document interchange. Their generation and update are now part of the document life cycle automatically managed by PDF Clown.

4. Advanced cloning

Since its inception, PDF Clown has supported a cloning mechanism capable of elegantly copying any structure/content of a PDF file without specialized code or torture-chamber algorithms (those exotic, lengthy, exhaustingly cumbersome monster methods you may sometime see when peering through the source of some well-known library…). Its implementation wasn’t complete, though: it couldn’t deal with circular references (which precluded annotations and some other structures) and there was no way to customize its filters on-the-fly in order to select just a subset to clone (which practically resolved in an identity transformation).

The good news is that 0.1.2 implementation overcomes such limitations leveraging the generic object visitor (org.pdfclown.objects.Visitor) through the Cloner class (org.pdfclown.objects.Cloner), which hosts a customizable collection of filters used to apply arbitrary transformations on cloning structures.

Let’s see an example. We want to copy a page into another PDF document (by the way: there’s a utility, org.pdfclown.tools.PageManager, which is purposely devoted to this activity, but here we want to dig deeply into its inner workings…):

import org.pdfclown.documents.Document;
import org.pdfclown.documents.Page;
import org.pdfclown.files.File;

...

String sourceFilePath = "myFilePath";
File sourceFile = null;
try
{sourceFile = new File(sourceFilePath);}
catch(Exception e)
{throw new RuntimeException(sourceFilePath + " file access error.",e);}
Page sourcePage = sourceFile.getDocument().getPages().get(0);

File targetFile = new File();
Document targetDocument = file.getDocument();
Page importPage = sourcePage.clone(targetDocument);
targetDocument.getPages().add(importPage);

That’s all: just one line [line 17] and our page is copied into the target document! Here it is the implementation of the PdfObjectWrapper.clone(Document) method inherited by Page class:

public Object clone(
  Document context
  )
{
  PdfObjectWrapper clone;
  try
  {clone = (PdfObjectWrapper)super.clone();}
  catch(CloneNotSupportedException e)
  {throw new RuntimeException(e);}
  clone.setBaseObject((PdfDirectObject)getBaseObject().clone(context.getFile()));
  return clone;
}

The magic is done by PdfObject.clone(File) [line 10], which clones the base PDF object (a PdfDictionary in this case) wrapped inside the high-level Page representation:

public PdfObject clone(
  File context
  )
{return accept(context.getCloner(), null);}

public PdfObject accept(
  IVisitor visitor,
  Object data
  )
{return visitor.visit(this, data);}

As mentioned above, Cloner is nothing but a specialized Visitor — for further details, it’s time you check out PDF Clown’s source code from its SVN repo, enjoy!

Thanks to Andreas Pinter for his contribution to solve the circular reference puzzle… ;-)

5. Article threads

The implementation of article threads offers, as usual, a smooth yet rich interface (see ComplexTypesettingSample for a live demonstration):

import org.pdfclown.documents.Document;
import org.pdfclown.documents.Page;
import org.pdfclown.documents.contents.composition.BlockComposer;
import org.pdfclown.documents.contents.composition.PrimitiveComposer;
import org.pdfclown.documents.interaction.navigation.page.Article;
import org.pdfclown.documents.interaction.navigation.page.ArticleElement;
import org.pdfclown.documents.interaction.navigation.page.ArticleElements;
import org.pdfclown.documents.interchange.metadata.Information;
import org.pdfclown.files.File;

...

File file = new File();
Document document = file.getDocument();

// Create the article thread!
Article article = new Article(document);
{
  Information articleInfo = article.getInformation();
  articleInfo.setTitle("The Free Software Definition");
  articleInfo.setAuthor("Free Software Foundation, Inc.");
}
// Get the article beads collection to populate!
ArticleElements articleElements = article.getElements();

Page page = new Page(document);
document.getPages().add(page);
PrimitiveComposer composer = new PrimitiveComposer(page);
BlockComposer blockComposer = new BlockComposer(composer);

... // adding contents through BlockComposer...

// Add the bead to the article thread!
articleElements.add(new ArticleElement(page, blockComposer.getBoundBox()));

...

Written by stechio

December 9, 2011 at 6:06 pm

PDF Clown 0.1.1 has been released!

with 2 comments

Latest news: PDF Clown 0.1.1 has been superseded by PDF Clown 0.1.2

This release adds support to optional/layered contents, text highlighting, metadata streams (XMP), Type1/CFF font files, along with primitive object model and AcroForm fields filling enhancements. Lots of minor improvements have been applied too.

Last but not least: ICSharpCode.SharpZipLib.dll dependency has been removed from .NET implementation.

This release may be downloaded from:
https://sourceforge.net/projects/clown/files/PDFClown-devel/0.1.1%20Beta/

enjoy!

Written by stechio

November 14, 2011 at 7:19 pm

Posted in Release

Tagged with ,

Waiting for PDF Clown 0.1.1 release

with 2 comments

[NOTE: this post was updated on November 14, 2011]

Latest news: on November 14, 2011 PDF Clown 0.1.1 has been released!

Next release is going to introduce new exciting features (text highlighting, optional/layered contents, Type1/CFF font support, etc.) along with improvements and consolidations of existing ones (enhanced text extraction, enhanced content rendering, enhanced acroform creation and filling, etc.). This post will be kept updated according to development progress, so please stay tuned! ;-)
These are some of the things I have been working on till now:

  • primitive object model enhancements
  • text highlighting
  • metadata streams (XMP)
  • optional/layered contents
  • AcroForm fields filling

1. Primitive object model enhancements

PDF primitive object model (see org.pdfclown.objects namespace) has undergone a substantial revision in order to simplify its use (transparent update), extend its functionality (bidirectional traversal), enforce its consistency (simple object immutability) and consolidate its code base (parser classes refactoring).

Bidirectional traversal has been accomplished by the introduction of explicit references to ascendants: composite objects (PdfDictionary, PdfArray, PdfStream) are now aware of their parent container, so walking through the ascending path to the root PdfIndirectObject (and File) is absolutely trivial! This functionality has loads of engaging potential applications, such as fine-grained object cloning based on structure context (as in case of Acroform annotations residing on a given page).

Ascendant-aware objects are intelligent enough to automatically detect and notify changes to their parent container, making incremental updates transparent to the user.

Simple objects have been made immutable to avoid risks of unintended changes and promote their efficient reuse.

As expected (you may have noticed some TODO task comments about this within the project’s code base), object parsing of PostScript-related formats (PDF file, PDF content stream and CMaps) has been organized under the same class hierarchy to improve its consistency and maintainability.

2. Text highlighting

Text highlighting was a much-requested feature. It took me less than one hour of enjoyable coding to write a prototype which could populate a PDF file with highlight annotations matching an arbitrary text pattern, as you can see in the following figure representing a page of Alice in Wonderland resulting from the search of “rabbit” occurrences:

This text highlighting sample leverages both text extraction [line 55] and annotation [line 106] functionalities of PDF Clown, as you can see in its source code:

package org.pdfclown.samples.cli;

import java.awt.geom.Rectangle2D;
import java.util.ArrayList;
import java.util.List;
import java.util.Map;
import java.util.regex.Matcher;
import java.util.regex.Pattern;

import org.pdfclown.documents.Page;
import org.pdfclown.documents.contents.ITextString;
import org.pdfclown.documents.contents.TextChar;
import org.pdfclown.documents.interaction.annotations.TextMarkup;
import org.pdfclown.documents.interaction.annotations.TextMarkup.MarkupTypeEnum;
import org.pdfclown.files.File;
import org.pdfclown.tools.TextExtractor;
import org.pdfclown.util.math.Interval;
import org.pdfclown.util.math.geom.Quad;

/**
  This sample demonstrates how to highlight text matching arbitrary patterns.
  Highlighting is defined through text markup annotations.

  @author Stefano Chizzolini (http://www.stefanochizzolini.it)
  @since 0.1.1
  @version 0.1.1
*/
public class TextHighlightSample
  extends Sample
{
  @Override
  public boolean run(
    )
  {
    String filePath = promptPdfFileChoice("Please select a PDF file");

    // 1. Open the PDF file!
    File file;
    try
    {file = new File(filePath);}
    catch(Exception e)
    {throw new RuntimeException(filePath + " file access error.",e);}

    // Define the text pattern to look for!
    String textRegEx = promptChoice("Please enter the pattern to look for: ");
    Pattern pattern = Pattern.compile(textRegEx, Pattern.CASE_INSENSITIVE);

    // 2. Iterating through the document pages...
    TextExtractor textExtractor = new TextExtractor(true, true);
    for(final Page page : file.getDocument().getPages())
    {
      System.out.println("\nScanning page " + (page.getIndex()+1) + "...\n");

      // 2.1. Extract the page text!
      Map textStrings = textExtractor.extract(page);

      // 2.2. Find the text pattern matches!
      final Matcher matcher = pattern.matcher(TextExtractor.toString(textStrings));

      // 2.3. Highlight the text pattern matches!
      textExtractor.filter(
        textStrings,
        new TextExtractor.IIntervalFilter()
        {
          @Override
          public boolean hasNext()
          {return matcher.find();}

          @Override
          public Interval next()
          {return new Interval(matcher.start(), matcher.end());}

          @Override
          public void process(
            Interval interval,
            ITextString match
            )
          {
            // Defining the highlight box of the text pattern match...
            List highlightQuads = new ArrayList();
            {
              /*
                NOTE: A text pattern match may be split across multiple contiguous lines,
                so we have to define a distinct highlight box for each text chunk.
              */
              Rectangle2D textBox = null;
              for(TextChar textChar : match.getTextChars())
              {
                Rectangle2D textCharBox = textChar.getBox();
                if(textBox == null)
                {textBox = (Rectangle2D)textCharBox.clone();}
                else
                {
                  if(textCharBox.getY() > textBox.getMaxY())
                  {
                    highlightQuads.add(Quad.get(textBox));
                    textBox = (Rectangle2D)textCharBox.clone();
                  }
                  else
                  {textBox.add(textCharBox);}
                }
              }
              highlightQuads.add(Quad.get(textBox));
            }
            // Highlight the text pattern match!
            new TextMarkup(page, MarkupTypeEnum.Highlight, highlightQuads);
          }

          @Override
          public void remove()
          {throw new UnsupportedOperationException();}
        }
        );
    }

    // 3. Highlighted file serialization.
    serialize(file, false);

    return true;
  }
}

This is another example matching words which contain “co” (regular expression “\w*co\w*”):

Here you can appreciate the dehyphenation functionality applied to another search (words beginning with “devel” — regular expression “\bdevel\w*”):

3. Metadata streams (XMP)

XMP metadata streams are now available for reading and writing on any dictionary or stream entity within a PDF document (see PdfObjectWrapper.get/setMetadata()).

4. Optional/Layered contents

Smoothing out some PDF spec awkwardness while implementing the content layer (aka optional content) functionality proved to be an interesting challenge. The result was nothing but satisfaction: a clean, intuitive and rich programming interface which automates lots of annoying housekeeping tasks and lets you access even the whole raw structures in case of special needs! 8-)

The figure above represents a document generated by the following code sample; for the sake of comparison, I took an iText example and translated it to PDF Clown, adding some niceties like the cooperation between the PrimitiveComposer (whose lower-level role is graphics composition through primitive operations like showing text lines and drawing shapes) and the BlockComposer (whose higher-level role is to arrange text within page areas managing alignments, paragraph spacing and indentation, hyphenation, and so on).

package org.pdfclown.samples.cli;

import java.awt.Dimension;
import java.awt.Point;
import java.awt.Rectangle;

import org.pdfclown.documents.Document;
import org.pdfclown.documents.Document.PageModeEnum;
import org.pdfclown.documents.Page;
import org.pdfclown.documents.contents.composition.AlignmentXEnum;
import org.pdfclown.documents.contents.composition.AlignmentYEnum;
import org.pdfclown.documents.contents.composition.BlockComposer;
import org.pdfclown.documents.contents.composition.PrimitiveComposer;
import org.pdfclown.documents.contents.fonts.StandardType1Font;
import org.pdfclown.documents.contents.layers.Layer;
import org.pdfclown.documents.contents.layers.Layer.ViewStateEnum;
import org.pdfclown.documents.contents.layers.LayerDefinition;
import org.pdfclown.documents.contents.layers.LayerGroup;
import org.pdfclown.documents.contents.layers.Layers;
import org.pdfclown.files.File;

/**
  This sample demonstrates how to define layers to control content visibility.

  @author Stefano Chizzolini (http://www.stefanochizzolini.it)
  @since 0.1.1
  @version 0.1.1
*/
public class LayerCreationSample
  extends Sample
{
  @Override
  public boolean run(
    )
  {
    // 1. PDF file instantiation.
    File file = new File();
    Document document = file.getDocument();

    // 2. Content creation.
    populate(document);

    // 3. Serialize the PDF file!
    serialize(file, false, "Layer", "inserting layers");

    return true;
  }

  private void populate(
    Document document
    )
  {
    // Initialize a new page!
    Page page = new Page(document);
    document.getPages().add(page);

    // Initialize the primitive composer (within the new page context)!
    PrimitiveComposer composer = new PrimitiveComposer(page);
    composer.setFont(new StandardType1Font(document, StandardType1Font.FamilyEnum.Helvetica, true, false), 12);

    // Initialize the block composer (wrapping the primitive one)!
    BlockComposer blockComposer = new BlockComposer(composer);

    // Initialize the document layer configuration!
    LayerDefinition layerDefinition = new LayerDefinition(document); // Creates the document layer configuration.
    document.setLayer(layerDefinition); // Activates the document layer configuration.
    document.setPageMode(PageModeEnum.Layers); // Shows the layers tab on document opening.

    // Get the root layers collection!
    Layers rootLayers = layerDefinition.getLayers();

    // 1. Nested layers.
    {
      Layer nestedLayer = new Layer(document, "Nested layer");
      rootLayers.add(nestedLayer);
      Layers nestedSubLayers = nestedLayer.getLayers();

      Layer nestedLayer1 = new Layer(document, "Nested layer 1");
      nestedSubLayers.add(nestedLayer1);

      Layer nestedLayer2 = new Layer(document, "Nested layer 2");
      nestedSubLayers.add(nestedLayer2);
      nestedLayer2.setLocked(true);

      // NOTE: Text in this section is shown using PrimitiveComposer.
      composer.beginLayer(nestedLayer);
      composer.showText(nestedLayer.getTitle(), new Point(50, 50));
      composer.end();

      composer.beginLayer(nestedLayer1);
      composer.showText(nestedLayer1.getTitle(), new Point(50, 75));
      composer.end();

      composer.beginLayer(nestedLayer2);
      composer.showText(nestedLayer2.getTitle(), new Point(50, 100));
      composer.end();
    }

    // 2. Simple group (labeled group of non-nested, inclusive-state layers).
    {
      Layers simpleGroup = new Layers(document, "Simple group");
      rootLayers.add(simpleGroup);

      Layer layer1 = new Layer(document, "Grouped layer 1");
      simpleGroup.add(layer1);

      Layer layer2 = new Layer(document, "Grouped layer 2");
      simpleGroup.add(layer2);

      // NOTE: Text in this section is shown using BlockComposer along with PrimitiveComposer
      // to demonstrate their flexible cooperation.
      blockComposer.begin(new Rectangle(50, 125, 200, 50), AlignmentXEnum.Left, AlignmentYEnum.Middle);

      composer.beginLayer(layer1);
      blockComposer.showText(layer1.getTitle());
      composer.end();

      blockComposer.showBreak(new Dimension(0, 15));

      composer.beginLayer(layer2);
      blockComposer.showText(layer2.getTitle());
      composer.end();

      blockComposer.end();
    }

    // 3. Radio group (labeled group of non-nested, exclusive-state layers).
    {
      Layers radioGroup = new Layers(document, "Radio group");
      rootLayers.add(radioGroup);

      Layer radio1 = new Layer(document, "Radiogrouped layer 1");
      radioGroup.add(radio1);
      radio1.setViewState(ViewStateEnum.On);

      Layer radio2 = new Layer(document, "Radiogrouped layer 2");
      radioGroup.add(radio2);
      radio2.setViewState(ViewStateEnum.Off);

      Layer radio3 = new Layer(document, "Radiogrouped layer 3");
      radioGroup.add(radio3);
      radio3.setViewState(ViewStateEnum.Off);

      // Register this option group in the layer configuration!
      LayerGroup options = new LayerGroup(document);
      options.add(radio1);
      options.add(radio2);
      options.add(radio3);
      layerDefinition.getOptionGroups().add(options);

      // NOTE: Text in this section is shown using BlockComposer along with PrimitiveComposer
      // to demonstrate their flexible cooperation.
      blockComposer.begin(new Rectangle(50, 185, 200, 75), AlignmentXEnum.Left, AlignmentYEnum.Middle);

      composer.beginLayer(radio1);
      blockComposer.showText(radio1.getTitle());
      composer.end();

      blockComposer.showBreak(new Dimension(0, 15));

      composer.beginLayer(radio2);
      blockComposer.showText(radio2.getTitle());
      composer.end();

      blockComposer.showBreak(new Dimension(0, 15));

      composer.beginLayer(radio3);
      blockComposer.showText(radio3.getTitle());
      composer.end();

      blockComposer.end();
    }
    composer.flush();
  }
}

Some comments on the code:

  • document layer configuration initialization [lines 68-69]: this is the first operation to do;
  • layer creation [line 77] and insertion [line 78] into the hierarchical structure;
  • sublayer insertion [line 82];
  • content layering [lines 89, 91]: content is enclosed within a layer section, making its visibility dependent on the layer state. There’s a subtle discrepancy in the PDF spec when it comes to nested layers: one may assume they imply a hierarchical dependency of the sublayer states, but that’s NOT the case — if you hide a layer its descendants are still visible! To work around this counterintuitive behaviour, many software toolkits wrap contents within multiple nested layer blocks; for example, if you want to wrap the text “nested layer 1″ into a layer (resource name /Pr2) which is a sublayer of another one (resource name /Pr1), the content stream will contain this cumbersome syntax:

    4 0 obj
    << /Length 205 >>
    stream
    [...]
    /OC /Pr1 BDC
    /OC /Pr2 BDC

    q
    BT
    1 0 0 1 100 800 Tm
    /F1 12 Tf
    (nested layer 1)Tj
    ET
    Q
    EMC
    EMC

    [...]
    endstream
    endobj

    This beast is repeated as many times as there are distinct content chunks to include within the same layer; it goes even worse as the number of nesting levels increases — just awful! 8-O Instead of this, PDF Clown defines a default hierarchical membership for each layer which can be used as a single, terse wrapping block (resource name /Pr2):

    4 0 obj
    << /Length 185 >>
    stream
    [...]
    /OC /Pr2 BDC
    q
    BT
    1 0 0 1 100 800 Tm
    /F1 12 Tf
    (nested layer 1)Tj
    ET
    Q
    EMC
    [...]
    endstream
    endobj
     
    6 0 obj
    << /Type /Pages /Count 1 /Resources << /Font 7 0 R /Properties 15 0 R >> /Kids [5 0 R ] >>
    endobj
     
    15 0 obj
    << /Pr2 16 0 R >>
    endobj
     
    16 0 obj
    << /Type /OCMD /OCGs [12 0 R 11 0 R ] /P /AllOn >> % Membership containing the references to the layers belonging to the hierarchical path of nested layer 1.
    endobj

    This way code is concise and more maintainable (if you want to rearrange the hierarchical structure of the layers you don’t have to walk through the content stream hunting layer block occurrences for correction — just go to the membership associated to the layer and update its hierarchical path!). :-)
  • simple layer group creation and insertion [lines 104-105]
  • option group definition [lines 148-152]

5. AcroForm fields filling

Text fields have been enhanced to support automatic appearance update on value change.

Written by stechio

April 12, 2011 at 5:53 pm

Posted in Development

Tagged with , , ,

Follow

Get every new post delivered to your Inbox.