[Scribus] Kerning problems with pdftotext

Wed May 23 21:45:07 CEST 2007

We are using pdftotext to strip out text from pdf's to prepare for  
search indexing and more.  This works well except with our own pdf's  
(produced in Scribus) which getting badly broken up - we suspect  
through kerning.  The text generated is simply fragmented into  
meaningless chunks. It remains in sequential order and some words are  
fine, but generally it's not working.

We are using (the great) Bitstream Vera which looks so good both on  
screen and in print, however we are also getting the same effect when  
we convert our text to Arial.

1. Has anybody experienced this?  Is this a pdftotext thing?
2. Are there alternative pdf-to-text parsers that anyone would recommend?

Lucien
Oxford Information Labs

----------------------------------------------------------------
This message was sent using IMP, the Internet Messaging Program.