[Scribus] importing HTML

bulk at katskave.com bulk
Thu Oct 30 07:01:47 CET 2003


-- Carol Kankelborg said:
> A simple tagging scheme a la HTML would suffice for this I think.

-- ephemeron said: 
> HTML import should be sufficient for most people.  HTML is by far
> the most common formatted text format.  I even receive it in my
> mailbox ;-)

I was looking at what a scribus doc looks like in a plain text editor 
and it looks kinda like HTML so I'm guessing it's XML.  Is there a URL 
to the DTD for a scribus doc?  Or is it somewhere in the source tarball 
(I didn't see it, but I could have missed it - I'm not used to looking 
through so many files)?  Cuz I *think* that if I have that, I could 
write a script that could turn a *very*simple* HTML doc into a scribus 
doc with one text frame with some simple formatting - basically only 
italics, bold, underline, superscript, and subscript.   I would have 
the script strip out and ignore any additional tags - kinda like the 
way a web browser ignores any tags it doesn't know.

I don't want to make any promises, but unless I'd be stepping on some 
ones toes - or someone knows that this is a really stupid idea - I'd 
like to try.  I would write it originally in perl, but I'm sure if I 
can get it to work it won't be that hard to learn enough python to 
convert it.  :)  And maybe once it was python script I could learn how 
to turn it into a plug in that imported an HTML file into a new text 
frame in the current document?  But that's kinda far off right now.

Back to the point, I tested it and I can copy paste a text frame from 
one scribus doc to another and keep the formatting intact.  So if I can 
get this to work, it might be a start of a way to import with some very 
basic formatting?

So uh, I guess this is a very long way of asking if a scribus doc 
really is an XML file and if so, where the DTD is?

Thanks for reading all of this.

-- mary




More information about the scribus mailing list