[scribus] No spaces (code 32) in PDF files produced by Scribus
Gregory Pittman
gpittman at iglou.com
Sat Dec 13 14:36:38 UTC 2014
On 12/13/2014 08:47 AM, Eric Dodémont wrote:
> I am converting PDF files to fixed layout ePub files, which mean mainly
> converting PDF to HTML.
>
> I noticed something strange.
>
>
> When converting PDF files produced by Scribus, the HTML displays very well,
> but when I copy/paste a selected text, there is no “spaces” in the text!
>
> E.g.:
>
>
> - On the screen you see: “The red car was behind the house."
>
> - The copy/paste gives: “Theredcarwasbehindthehouse.”
>
> I found a PDF file produced by InDesign, and with that file the problem is
> not there.
>
> After analyzing with Acrobat the fonts embedded in the PDF files, I noticed:
>
>
> - Indesign: fonts contain the <SPACE> (code 20 in hexa, 32 in decimal).
>
> - Scribus: fonts does not contain the <SPACE> (code 20 in hexa, 32 in
> decimal).
>
> I am using PDFTron to convert PDF files to ePub files. When I use the
> pdf2htmlEX tool (available for Linux and Windows), the problem is not there.
>
> It seems that PDFTron will only insert a space in the text when the code 32
> is in the text.
>
> How comes there is no spaces with code 32 in the PDF produced by Scribus?
>
> I know there is a lot of different spaces: U+0020 SPACE, U+00A0 NO-BREAK
> SPACE, U+2000 EN QUAD 1 en (= 1/2 em), U+2001 EM QUAD 1 em, etc.
>
It may be a bit more complex than you think.
The first question is, where are you copying from? I presume you mean
highlighting text, then doing Ctrl+C or some equivalent. You might be
better off trying to use the PDF viewer to extract the text.
Next, what encoding system are you using? Scribus uses UTF-8, but some
other piece of software might use something else.
I just tried this in Fedora, by opening a Scribus-generated PDF in Adobe
Reader, highlighting text, copying then pasting to a text editor (in
this case Emacs), and saw all the spaces as I'd expect.
Greg
More information about the scribus
mailing list