PDF Clown's Blog

Developing a free/libre open source PDF library

PDF Clown 0.1.0 has been released!

with 20 comments

Latest news: PDF Clown 0.1.0 has been superseded by PDF Clown 0.1.1

This release introduces support to cross-reference-stream-based PDF files (as defined since PDF 1.5 spec) along with page rendering and printing: a specialized tool provides a convenient way to convert PDF pages into images (aka rasterization). Lots of minor improvements have been applied too.

Last but not least: the project’s base namespace has changed to org.pdfclown

This release may be downloaded from:
https://sourceforge.net/projects/clown/files/PDFClown-devel/0.1.0%20Alpha/

enjoy!

Written by stechio

March 4, 2011 at 12:50 am

Posted in Release

Tagged with ,

20 Responses

Subscribe to comments with RSS.

  1. [...] stream, object stream, PDF, printing, rasterization « PDF Clown 0.0.8: Q&A PDF Clown 0.1.0 has been released! [...]

  2. Hello,

    Good work so far. I would like to see rendering of pages with text. I am a developer and would be willing to provide some input on this. I’ve not had a thorough look through the code yet as I’m new to PDF Clown, and I’m not familiar with rasterizing fonts either. Would this be a great deal of work. Roughly how long do you think it would be before this feature would be in PDF Clown?

    Thanks.

    Glen

    March 4, 2011 at 11:51 am

    • Hi Glen,

      text rendering of modern fonts is primarily a matter of outline drawing and filling through Bézier curves; there are also some cases where bitmap glyphs are still in use, such as some CJK fonts. Text rasterization will be part of my next developments; I can estimate some weeks of work (that is about 3-4 months during my release cycle) to reach a decent representation.

      Stefano

      stechio

      March 4, 2011 at 11:33 pm

      • I’ve had a good look through the source today and managed to rasterize the text to the page using just one extra line of code. Oviously this wasn’t anywhere near the overall standard of the library and I’ll be refining the code next week to get it closer to the PDF specification. I really like the coding you’ve done so far. It’s a great project. Good work.

        Glen

        March 5, 2011 at 12:51 am

      • Well done, crafty fellow! :-)
        Did you call GDI/AWT native text rendering methods, didn’t you? That’s good for an approximated rendition, but you know…

        stechio

        March 5, 2011 at 11:26 am

  3. Yes what I did is very rough. Currently I just need a small thumbnail of PDF documents and it looks OK. I am still working on doing it properly though..

    Glen

    March 7, 2011 at 9:37 am

    • It would be really nice if you could drop here your code lines for text rasterization to share with other users — after all, the spirit of this project is about cooperation. ;-)

      Thank you!

      stechio

      March 7, 2011 at 8:11 pm

      • All I did was replace
        textScanner.ScanChar(textChar,charBox);
        in the ShowText class with a DrawString method applied to the ContentScanner’s RenderContext.

        I was really lazy with it and just picked one font and one colour to use for all text. The characters were rendered upside-down too, so needed to be rotated.
        It was something like this:
        state.Scanner.RenderContext.DrawString(textChar.ToString(), new System.Drawing.Font("Arial", 4), System.Drawing.Brushes.Black, charBox);

        I’m planning on making this a lot better as it’s pretty useless as it is.
        I was looking at rendering the images today but didn’t get very far with it…

        Glen

        March 7, 2011 at 8:29 pm

  4. I’m also looking at adding images to the rendered pages.
    Looking at the code I’m guessing this would be added to the Scan method of the ContentObject class.

    Does PDF Clown only support jpg images?

    Glen

    March 7, 2011 at 3:34 pm

  5. I am still trying to render images to the page but am having difficulty.

    I have implemented the Scan method of the PaintXObject class and have successfully retrieved the image xobject but when I try to render it to the RenderContext the page is blank.

    I think this is because of the clipping on the RenderContext and Im guessing I need to do something with the matricies. Any idea what I’m doing wrong?

    Glen

    March 8, 2011 at 3:14 pm

    • I have tried manually setting the Clip property of the RenderContext to one that would not clip the image and I am drawing the image to the RenderContext using a Graphics.DrawImage method. I still can’t work out why the image does not appear on the output.

      Im fairly confident that the image is ok as I’ve tried saving it to a file and it is the image I’m trying to render.

      I have also tried using a DrawString to draw some text to the RenderContext in the Scan method of the PaintXObject, like I did in ShowText, and this isn’t appearing on the output either.

      Glen

      March 8, 2011 at 3:54 pm

      • I have now got the image onto the page, but it doesn’t look much like the image, apart from the main colour. Maybe I need to DCT decode…?

        At least I am now making some progress.

        Glen

        March 8, 2011 at 4:47 pm

  6. Is there any facility in PDF Clown about tags extraction from pdf files?

    puneeth

    March 9, 2011 at 5:08 am

    • Do you mean tags as described by Tagged PDF spec [PDF:1.7:10.7]?
      Those structures currently aren’t managed at high-level by PDF Clown (you can access them at low-level as primitive data structures though); anyway, marked contents within content streams are available for parsing through ContentScanner — see MarkedContent and MarkedContentPoint classes.

      The project’s Status page analytically decribes the level of implementation reached by PDF Clown.

      Thank you
      Stefano

      stechio

      March 9, 2011 at 5:26 pm

      • I have one more issue regarding pdf extraction: how can we identify the table (border line)? Is there any control to access that?

        puneeth

        March 10, 2011 at 3:44 am

      • As I stated in my previous reply, it’s a matter of heuristics — there’s no golden rule, just well-balanced analyses. Establishing such strategy is up to you, as it’s a non-trivial judgement which I haven’t done till now.

        stechio

        March 13, 2011 at 6:14 pm

  7. Nice work.
    I want to print a PDF, but cannot find an example. Is there already some java code for this?

    Thx
    Gamba

    Gamba

    March 9, 2011 at 8:41 am

    • There is a sample in the lastest version (0.1.0), but your printed output will not be the same as the PDF document as PDF Clown does not render text or images, only lines and shapes.

      Glen

      March 9, 2011 at 4:46 pm

    • Hello

      you’re a wee bit lazy, Gamba ;-) didn’t you read the documentation that comes along with the downloadable distribution?
      The User Guide (see userGuide.pdf file) features an Appendix (§ A. Samples) which is a complete directory keyed by topic to find the sample relevant to your use.
      Furthermore, if you walk through the pdfclown.samples.cli project, you can immediately spot a sample file called “PrintingSample.java”… so damn easy! ;-)

      Anyway, please note that printing functionality is currently pre-alpha (as stated in the above-mentioned documentation — see ISSUES), so it’s not expected to produce complete outputs (at the moment for example there’s no support to text rendering). It will be expanded in the next releases.

      Thank you
      Stefano

      stechio

      March 9, 2011 at 4:56 pm


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Connecting to %s

Follow

Get every new post delivered to your Inbox.