[scribus] Hyphenation
Andreas Vox
avox at arcor.de
Thu Oct 27 20:19:07 UTC 2011
John Jason Jordan <johnxj at ...> writes:
...
> I agree with Gregory that hyphenation is not perfect in many programs.
> And Gregory has an excellent point about short syllable breaks (e.g.
> "re-ceived") being sometimes ambiguous, leading the reader to have to
> pause or re-read a line to connect the first and last parts of the
> hyphenated word. I'm not sure how a layout program can fix this,
> however.
...
>
> I don't know how Scribus does its hyphenation. And if it does use an
> algorithm, switching to dictionary-based hyphenation just for English
> may be impractical. Nevertheless, I wanted to point out that
> hyphenation is more problematic than just deciding whether to base it
> on the entire paragraph or one line at a time.
Scribus uses the same algorithm as TeX and OO.o. But that congenial
method is really also a dictionary approach: to create the hyphenation rules,
a large corpus of text is fed into the generation program. The program then
tries to condense that information into a ruleset, which contains rules like
"if you see this 'xyz" pattern, assume good break pos at 1 and bad break
pos at 2, unless it also matches "pxyzq", in which case the best break
position is 4, unless...."
This results in a file with hundreds of short patterns which indicate good and
bad break positions (priotized 1-5 iirc). Then the whole corpus is tested with
this algorithm and the remaining words which aren't hyphenated correctly
(usually just a dozen or so) are put into an exception list.
I don't know of any program that handles problems like "re-ceive" properly.
With a paragraph layouter it should be possible to include extra penalties
for such cases, so the layouter would automatically try to avoid those.
/Andreas
More information about the scribus
mailing list