[scribus] Scribus sla to epub (export) q. (calibre does not work)
Richard Foley
rich.inud at naktiv.net
Thu Jun 5 13:29:57 UTC 2014
Hi Eric,
Many thanks for the helpful info which looks quite promising.
I'm trying to get this to work on SuSe 13.1 and am struggling with the
libraries you require. I managed to get poppler installed and am now doing
battle with fontforge, which I have downloaded from your fork, and compiled:
http://fontforge.sourceforge.net/index.html#source
fontforge_full-20120731-b.tar
and now get:
$> fontforge --version
Copyright (c) 2000-2012 by George Williams.
Executable based on sources from 14:57 GMT 31-Jul-2012.
Library based on sources from 14:57 GMT 31-Jul-2012.
fontforge 20120731
libfontforge 20120731
but from the pdf2htmlEX/ directory:
$> cmake .
-- checking for module 'libfontforge>=2.0.0'
-- package 'libfontforge>=2.0.0' not found
CMake Error at /usr/share/cmake/Modules/FindPkgConfig.cmake:279 (message):
A required package was not found
Call Stack (most recent call first):
/usr/share/cmake/Modules/FindPkgConfig.cmake:333 (_pkg_check_modules_internal)
CMakeLists.txt:75 (pkg_check_modules)
-- Configuring incomplete, errors occurred!
So, not being a regular c-compiler-nerd, I'm a bit stuck. Any ideas welcome...
--
Ciao
Richard Foley
Supporting Naked Activities
http://www.naktiv.net
On Thu, Jun 05, 2014 at 11:30:18AM +0200, Eric Dodémont wrote:
> I have studied the PDF to ePub fixed layout conversion these last weeks and
> wrote down my findings in a little ebook (20 pages):
>
> A Practical Guide to Convert a PDF File to an ePub Version 3 Fixed Layout
> File: With Free Open Source Tools.
> https://play.google.com/store/books/details?id=1pytAwAAQBAJ
>
> This is the beginning of the book (the rest is mainly technical stuffs to
> make the conversion from pdf to html, then from html to epub):
>
> Chapter 1: Fixed Layout
>
> Different file formats exist for fixed layout ebooks. Bellow a list of the
> main ones:
>
> - PDF (Portable Document Format) [.pdf]
> - DjVu (Déja Vu) [.djvu]
> - ePub (electronic Publication) [.epub]
> - Apple iBooks (similar to ePub) [.ibooks]
> - Amazon Kindle (similar to ePub) [.kf8]
>
> In this book, we will focus mainly on the conversion of a PDF file to a
> fixed layout ePub file. This is possible since the version 3 of the ePub
> format which includes now the fixed layout mode in addition to the
> traditional flowing text mode.
>
> This type of conversion can be very useful as the page layout programs
> (e.g. Scribus) are always exporting the final result as a PDF (optimized
> for paper or online publication).
>
> The "ePub 3.0 Fixed Layout (FXL) Format Specifications" published by the
> International Digital Publishing Forum (IDPF) can be found here:
>
> http://www.idpf.org/epub/fxl
>
> A "Field Guide to Fixed Layout for E-Books" published by the Book Industry
> Study Group (BISG) is available for free here:
>
> http://www.bisg.org/publications/field-guide-fixed-layout-e-books
>
> The ePub version 3 format uses all the modern Web technologies like HTML5,
> CSS3, JS, SVG, XML, XHTML, WOFF, etc.
>
> Important remarks:
>
> 1) This book is only about fixed layout ePub. Fixed layout can be used if
> the book has a sophisticated layout with lots of images. Such fixed layout
> books are made with desktop publishing (DTP) programs like Scribus, Adobe
> InDesign, Quark XPress, or Microsoft Publisher. For books with only text or
> with few images, a flowing text ePub is more suitable and more easy to do.
>
> 2) Most of the PDF to ePub converters do not work for sophisticated layout
> because they convert a fixed layout PDF into a flowing text ePub, which
> gives most of the time an ugly and unusable result unless the file is
> heavily adapted. They just extract the text and the images from the PDF,
> and put then sequentially into a flowing text ePub with all the layout gone.
>
> 3) Most of the ePub viewers do not support (yet) the fixed layout. If you
> try to display a fixed layout ePub with such viewer, the result will be
> ugly and unusable. Two good ePub viewers supporting the fixed layout are
> Google Play Books (for tablets running under Google Android or Apple iOS
> (iPad)) and Readium (for laptops or desktops running under Microsoft
> Windows, Apple OS X (Mac), or GNU Linux; it is a Google Chrome browser
> extension). Most of the time, small screens are not suitable for fixed
> layout books. Such books should be read on tablets, not on smartphones.
>
> * Conversion Methods
>
> There are three main methods to convert a PDF file to an ePub fixed layout
> file:
>
> 1) Method 1: Bitmap image only + Hidden text
>
> Each ePub page is a bitmap image (PNG8, possibly PNG24 or JPEG) of an exact
> replica of the PDF page. This bitmap image is the result of the rendering
> of the text (using vector fonts), bitmap images, and vector images. To
> maintain accessibility (select text, copy/paste text, search text, text to
> speech, etc.), an invisible text layer is added on top of the image. This
> is also the way used to convert a PDF file to a DjVu file. Some PDF files
> are also made like that, mainly when they are the results of scanning paper
> books (the text layer is made by OCR).
>
> 2) Method 2: Image + Text
>
> Probably the best method, but more sophisticated than the first one, is to
> add on each ePub page a bitmap image (JPEG, possibly PNG) which is made of
> all bitmap and vector images of the PDF page, or a bitmap and vector image
> (SVG). The text is not converted in a bitmap image or inserted in the SVG
> file, but added on the ePub page by using XHTML5 and CSS3. The CSS uses: a)
> absolute positioning to put the text at the exact same place than in the
> PDF page; b) styles and fonts for the text to look exactly the same as in
> the PDF page. These two last steps are challenging, because HTML5 cannot
> always do what the PDF format can; lots of free and commercial tools exist,
> but most of the time cannot do that correctly when it comes to fixed layout.
>
> 3) Method 3: SVG only
>
> The bitmap images, the vector images, and the text are embedded in SVG
> files (one SVG per page). The text should be rendered as true text (with
> fonts), not just outlines of the glyphs (vector images). Also called: SVG
> in the spine (no XHTML).
>
> In the following of this book, I will only focus on the second method
> (image + text).
>
> * Conversion Tools
>
> There are free open source and commercial tools to convert PDF to
> ePub3-fxl, but some have drawbacks. For example, one of these tools give a
> very good visual result, but the text accessibility has a problem: no
> spaces are present. The tool puts words at the correct positions, but does
> not care of the spaces between the words. When you copy/paste a phrase, all
> the spaces are gone. Or, if you search a word, the word is not found
> (unless this word is between parenthesis by example). In fact, all phrases
> are very long words.
>
> The tool and the method I will describe below is free, and give a very good
> result for the visual aspect and for the text accessibility. The tool I
> will use is pdf2htmlEX, developed by Lu Wang (speudo: coolwanglu), a
> Chinese PhD student at the Department of Computer Science and Engineering
> of the Hong Kong University of Science and Technology. You can find it here:
>
> http://coolwanglu.github.io/pdf2htmlEX
>
> This tool, as its name tells us, does a conversion of the PDF pages to HTML
> pages, and does not produce an ePub file. To get an ePub3-fxl file, I will
> show how to use the result produced by pdf2htmlEX, to create the ePub3-fxl
> file. It means mainly: a) remove the HTML viewer that pdf2htmlEX produces
> and integrates in the result; b) create all the files required by the ePub
> format and wrap the result into one unique file.
>
> Best regards,
>
> Eric Dodémont
>
>
> On 5 June 2014 11:16, Peter Nermander <peter at nermander.se> wrote:
>
> > It doesn't fix my problem, but it helps understand why it's sufficiently
> > > complex that the tool is not there, yet. The original point still stands
> > > though, and this makes it clearer, (at least to me), why Scribus is the
> > > right
> > > place to export the PDF, which Scribus knows how to write. Therefore it
> > > would
> > > also know how to export the epub correctly as well. I think.
> > >
> > >
> > No, it's still not that easy. Seems I have to take an example.
> >
> > Imagine that you on each page have 3 pictures with a caption. The caption
> > is next to the picture (not above or below). The picture and caption are
> > separate frames.
> >
> > Now, the pictures alternates between being at the left side (with the
> > caption to the right) and at the right side (with the caption to the
> > right). When you export to epub you surely want all the captions to go
> > either above or below each picture (same for all pictures). But could you
> > describe the algorithm Scribus should use to decide in what order it shall
> > export the pictures and the captions?
> >
> > Going from top left to bottom right will not work well. Note also that
> > going from top left to bottom right can be done sideways first (most
> > relevant for this case) or down first (more relevant for a regular 2 column
> > layout).
> >
> > /Peter
> > -------------- next part --------------
> > An HTML attachment was scrubbed...
> > URL: <
> > http://lists.scribus.net/pipermail/scribus/attachments/20140605/1dfc8202/attachment.html
> > >
> > ___
> > Scribus Mailing List: scribus at lists.scribus.net
> > Edit your options or unsubscribe:
> > http://lists.scribus.net/mailman/listinfo/scribus
> > See also:
> > http://wiki.scribus.net
> > http://forums.scribus.net
> >
> -------------- next part --------------
> An HTML attachment was scrubbed...
> URL: <http://lists.scribus.net/pipermail/scribus/attachments/20140605/546c14bb/attachment.html>
> ___
> Scribus Mailing List: scribus at lists.scribus.net
> Edit your options or unsubscribe:
> http://lists.scribus.net/mailman/listinfo/scribus
> See also:
> http://wiki.scribus.net
> http://forums.scribus.net
More information about the scribus
mailing list