[Scribus] New wiki page about frameslist.py
Gregory Pittman
gpittman
Sat Oct 27 18:26:11 CEST 2007
After improving the script a bit, I have created a wiki page:
http://wiki.scribus.net/index.php/Extracting_All_Text_from_a_Document
which includes the script, frameslist.py for doing this. This will now
recognize text and image frames by frame type rather than name. I've
also dealt with the duplication problem in linked frames by testing for
this -- see sample output and note at the bottom of the page.
Something I discovered as I tested it out is that Scribus's files have
once again become not well-formed by XML standards, with the inclusion
of '' (Ctrl-M) for hard carriage returns. It doesn't interfere with
this script, but will pose a problem for XML parsing, and interestingly,
when I used 'cat' to show the text file's contents in the console, lines
with Ctrl-M did not display. kedit shows the Ctrl-M as carriage
returns, emacs displays '^M'. This was run in 1.3.3.10svn and also in
1.3.4 (don't have 1.3.5svn on this computer).
Greg
More information about the scribus
mailing list