[scribus] As if you’re not already sick of indexing

John Jason Jordan johnxj at comcast.net
Thu Nov 18 00:49:43 CET 2010


First, let me say that I agree fully with everything that John Culleton
has written about what constitutes a good index. For me, the secret to
success has always been to try to place myself in the shoes of the
reader who is trying to find something. Even if I am the author of the
book it is hard to think of all the terms that a reader will use to
look something up. Yet I must do so, lest the reader become frustrated.
And therefore, occasionally I need a term in the index that doesn’t
even appear in the book. 

My first efforts at indexing were for books that I had written myself
and laid out in an old version of PageMaker – too old to have an
indexing feature. Much as John does with tyro.tcl, I created the index
with a paper copy of the book next to me, leafing through page by page.
Except that instead of a separate program like tyro.tcl I just used
Word. I even remember having figured out that for sub-items I needed to
do Shift-Enter to create a new line, so that when I alphabetized the
index on the “paragraph” Word would keep the sub-items with the head
item.

Much later I created a series of books using Adobe InDesign CS. By the
time of the CS version InDesign had a built-in indexing utility.
Although the user interface needed some work, I found it delightful to
be able to page through the document with InDesign and add index
entries electronically. 

Now I am using Scribus, and sooner or later I am going to have to index
a book. I could always go back to the word processor approach, or I
could use John’s tyro.tcl utility, but I think I have a different
approach that I like better. Mind you, tyro.tcl is a wonderful tool,
but it still requires working with a paper copy of the book. I’d like
to be able to select a term on a page on screen and add it to the index.

While reading all the recent comments about indexing I was struck by
one fact that John mentioned, that is, if you open a PDF of your book
in Adobe Reader and export as text, Reader will put a page marker at
the end of each page of the PDF. I was unaware of this. In fact, I just
tried it with each of the eight PDF viewers that I have installed, and
Adobe Reader is the only one that will put page markers in the text.
Okular and PDFEdit will export as text, but do not put page markers in
the file. GSView will also export as text, but when I opened the file I
discovered that the encoding was so messed up that the text was
unreadable. The rest of them – Cabaret, Foxit, Evince, jPDF-Tweak –
couldn’t even export as text, although perhaps you could select the
text and copy and paste.

For the above experiments I used tyro.pdf, just because it was handy,
small, yet its 11 pages was enough to see if things worked. 

Now, after exporting the PDF to text from Adobe Reader is where I
diverged from John’s workflow. He opens the text file in Gvim and
replaces the page markers with code so TeX will understand the page
break. I wanted to avoid TeX. I also wanted a GUI that was as good as I
could get. And it turns out I have a GUI text editor right here at my
fingertips – OOo Writer. Not only that, but it has a fairly usable
index utility. All that remained was to see if Writer would recognize
the page breaks in the text file exported from Reader. And sure enough,
it does recognize the page breaks. The only difficulty was that
sometimes a page in the PDF became longer than one page in Writer. When
I opened the text file that I created by exporting tyro.pdf to text
from Adobe Reader, the Writer document became 14 pages. Pages 1, 3 and
6 slopped over and Writer created an extra page. I solved this problem
by changing the page size in Writer from the US letter default (8.5 x
11 inches) to “User size” where I specified 8.5 by 16 inches.

Now I can create an index all on the screen without needing a paper
copy.* When all the entries have been marked I can generate the index.
Then I can copy and paste into Scribus, or save as a Writer file and
import into Scribus (which saves styles). I haven’t fully tested the
Writer indexing utility but it appears to be able to index one word
under multiple terms in the index. E.g., suppose I want to index the
word “flier” as “flier” in the index, but also as “brochure.” It can
also create “See” and “See also” entries. 

*There is one flaw in my method. What if there is a graphic on a page
in the PDF, and you want an index entry that points to that graphic?
Exporting as text from the PDF will strip the graphic. Ditto for
anytext in the Scribus document that has been converted to outlines. You
can still create an index entry on the page; you just have to apply it
to a word on the page. I figured I’d just type in “graphic” on the page
and apply the entry to that word. The problem is knowing that there is a
graphic or outlined text on the page. Using a paper copy would solve the
problem. But I can do it by paging through the PDF in Reader and make
note of the location of any graphics that I want to reference in my
index. Outlined text would be harder to spot, but I hardly ever do that.

Until Scribus gets its own built-in indexing tool I have a workaround
that works for me.



More information about the scribus mailing list