[Scribus] pdftk

Thu Feb 9 09:47:30 CET 2006

On Fri, Feb 03, 2006 at 01:38:13PM +0100, PLinnell wrote:
> On Friday 03 February 2006 06:40, Bart Alberti wrote:
> > The author of pdftk has a book PDF HACKS published by O'reilly
> > which is mostly a collection of recipes for dealing with pdf files.
> > However, there is a scheme for using a plug in to vim or gvim which
> > uncompresses the pdf and put it in the text editor. Since vim is
> > scriptable (I don't know personally how to) this may be a clue to
> > long sought import and/or edit pdf.
> > Bart Alberti
> > _______________________________________________
> > Scribus mailing list
> > Scribus at nashi.altmuehlnet.de
> > http://nashi.altmuehlnet.de/mailman/listinfo/scribus
> 
> PDF editing is vastly more complex than editing the raw PDF source in 
> vi. One of the major stumbling blocks is the non-linear nature of 
> PDF.
> 
> It will take a powerful parser to be able to edit PDF natively.

I'm beginning to believe that a traditional "parser" is the best way to
absolutely muck up PDF editing and import. You really need an
implemention of the PDF document and file structure that follows the
non-linear PDF model - reads the xref table and knows how to find
indirect objects, etc. Alongside that you need a (probably reasonably
simple) parser that can read PDF objects and traverse nested arrays and
dictionaries. You also need decoders for the PDF stream filter
algorithms. A final component would be a PDF content stream parser, but
it might well be veru simplistic and dumb depending on your needs.

By recognising that PDF is a series of related data formats in a
container, your life should be made a LOT easier. I'm working on this
for PDF output at the moment and I think I can in time adapt my design
for input/processing/editing of PDF as well.

I shudder at the thought of trying to write a comprehensive grammar for
PDF as a whole using a traditional parser generator. Nasty, inefficient,
error-prone, and generally not a nice prospect, I suspect.

-- 
Craig Ringer