[scribus] windows to linux charset issue for PDF annotation

Fri May 29 17:33:28 UTC 2020

On 5/29/20 1:16 PM, Gregory Pittman wrote:
> On 5/29/20 12:00 PM, JLuc wrote:
>> Hello
>> It's about a scribus created pdf but this is not strictly related to scribus :
>> A friend on windows has proofread and annotated a scribus PDF document.
>> When opening it on Ubuntu, some of the notes are OK readable,
>> but some other are scrambled and some other seem to be cut in the middle of the text.
>>
>> It could be related to notes having accenctuated or special characters as " or «
>> because none of the readable notes has such accenctuated characters afaict.
>> I've tried with Evince and Okular.
>>
>> Do you have an advice on how to access correctly these notes on linux ?
>> (Or on how to fix that in the annotation tool on windows ?)
> 
> Hi JLuc,
> 
> Here is an issue I just noticed yesterday, which might relate to your problem. I used a script called ExtractText.py, which spits out the text content of a document to a plain text file. I never seen problems with this before. When I did this yesterday and tried to import this text into a new document, the carriage returns were wrong -- running less on the file showed that they were Ctrl-M instead of LF (line feed).
> 
> There are 2 ways that will fix this text file. I used KWrite, which interpreted the carriage returns Ok, then saved the file, and they were all fixed to LFs and imported into Scribus properly.
> The other option is to use dos2unix on the command line:
> 
> dos2unix -n old.txt new.txt
> 
> I wrote this script ExtractText.py and it's never done this before, so something must have changed in Scribus, that it's not using UTF-8 consistently.
> 

Interestingly, I tried using File > Export > Save Text, and this also messes up the saved text, but it becomes nonfixable by dos2unix.

Greg