Next: Manipulating Spacing, Previous: Manipulating Filling and Adjustment, Up: GNU troff Reference [Contents][Index]
When filling, GNU troff hyphenates words as needed at
user-specified and automatically determined hyphenation points. The
machine-driven determination of hyphenation points in words requires
algorithms and data, and is susceptible to conventions and preferences.
Before tackling such automatic hyphenation, let us consider how
hyphenation points can be set explicitly.
Explicitly hyphenated words such as “mother-in-law” are eligible for
breaking after each of their hyphens. Relatively few words in a
language offer such obvious break points, however, and automatic
detection of syllabic (or phonetic) boundaries for hyphenation is not
perfect,86 particularly for
unusual words found in technical literature. We can instruct GNU
troff how to hyphenate specific words if the need arises.
Define each argument word (comprising ordinary, special, or indexed characters) as a hyphenation exception word such that each occurence of a hyphen-minus ‘-’ in word indicates a hyphenation point. For example, the request
.hw in-sa-lub-rious alpha
marks potential hyphenation points in “insalubrious”, and prevents “alpha” from being hyphenated at all.
Besides the space character, any character whose hyphenation code is
zero can be used to separate the arguments (see the hcode request
below).
Hyphenation points specified with hw are not subject to the
within-word placement restrictions imposed by the hy request (see
below).
Hyphenation exception words are associated with the hyphenation language
(see the
hla
request below);
invoking the
hw
request in the absence of a hyphenation language is an error.
Each hyphenation language maintains an independent set
of hyphenation exception words.
The formatter ignores the request if it lacks arguments. 87
Obtain a report of hyphenation exception words
on the standard error stream
with the phw request.
See Debugging.
These are known as
hyphenation exception words
in the expectation that most users
will avail themselves of automatic hyphenation;
these exceptions override any rules
that would normally apply to a word
matching a hyphenation exception word defined with
hw.
Situations also arise when only a specific occurrence of a word needs its hyphenation altered or suppressed, or when a URL or similar specialized text needs to be breakable in sensible places without hyphenation.
To tell GNU troff how to hyphenate words as they occur in input,
use the \% escape sequence; it is the default hyphenation
character. Each instance within a word indicates to GNU troff
that the word may be hyphenated at that point, while prefixing a word
with this escape sequence prevents it from being otherwise hyphenated.
This mechanism affects only that occurrence of the word; to change the
hyphenation of a word for the remainder of input processing, use the
hw request.
GNU troff regards the escape sequences \X and \Y as
starting a word; that is, the \% escape sequence in, say,
‘\X'...'\%foobar’ or ‘\Y'...'\%foobar’ no longer
prevents hyphenation of ‘foobar’ but inserts a hyphenation point
just prior to it; most likely this isn’t what you want.
See Postprocessor Access.
\: inserts a non-printing break point; that is, a word can break
there, but the soft hyphen glyph (see below) is not written to the
output if it does. The remainder of the word is subject to hyphenation
as normal.
You can combine \: and \% to control breaking of a file
name or URL, or to permit hyphenation only after certain explicit
hyphens within a word.
The \%Lethbridge-Stewart-\:\%Sackville-Baggins divorce was, in retrospect, inevitable once the contents of \%/var/log/\:\%httpd/\:\%access_log on the family web server came to light, revealing visitors from Hogwarts.
Change the hyphenation character to char. This character then
works as the \% escape sequence normally does, and thus no longer
appears in the output.88 Without an
argument, hc resets the hyphenation character to \% (the
default). The hyphenation character is associated with the environment
(see Environments).
Set the soft hyphen character, inserted when a word is hyphenated
automatically or at a hyphenation character, to the ordinary or special
character c.89 If the argument is omitted, the soft
hyphen character is set to the default, \[hy]. If no glyph for
c exists in the font in use at a potential hyphenation point, then
the line is not broken there. Neither character definitions (specified
with the char and similar requests) nor translations (specified
with the tr request) are applied to c.
Several requests influence automatic hyphenation. Because conventions
vary, a variety of hyphenation modes is available to the hy
request; these determine whether hyphenation will apply to a
word prior to breaking a line at the end of a page (more or less; see
below for details), and at which positions within that word
automatically determined hyphenation points are permissible. The places
within a word that are eligible for hyphenation are determined by
language-specific data and lettercase relationships. Furthermore,
hyphenation of a word might be suppressed due to a limit on
consecutive hyphenated lines (hlm), a minimum line length
threshold (hym), or because the line can instead be adjusted with
additional inter-word space (hys).
Set automatic hyphenation mode to mode, an integer encoding
conditions for hyphenation; if omitted, the configured hyphenation mode
default (see below) is implied. The hyphenation mode is available in
the read-only register ‘.hy’; it is associated with the environment
(see Environments). The hyphenation mode default depends on the
localization file loaded when GNU troff starts up; see the
hpf request below. If no localization file is loaded, the
default is ‘1’.
Typesetting practice generally does not avail itself of every
opportunity for hyphenation, but the details differ by language and site
mandates. The hyphenation modes of AT&T troff were
implemented with English-language publishing practices of the 1970s in
mind, not a scrupulous enumeration of conceivable parameters. GNU
troff extends those modes such that finer-grained control is
possible, favoring compatibility with older implementations over a more
intuitive arrangement. The means of hyphenation mode control is a set
of numbers that can be added up to encode the behavior
sought.90
The entries in the following table
are termed
values;
the sum of the desired values is the
mode.
0disables hyphenation.
1enables hyphenation except after the first and before the last character of a word.
The remaining values “imply” 1; that is, they enable hyphenation under the same conditions as ‘.hy 1’, and then apply or lift restrictions relative to that basis.
2disables hyphenation of the last word on a page or column,91 even for explicitly hyphenated words.
4disables hyphenation before the last two characters of a word.
8disables hyphenation after the first two characters of a word.
16enables hyphenation before the last character of a word.
32enables hyphenation after the first character of a word.
Apart from value 2, restrictions imposed by the hyphenation mode
are not respected for words whose hyphenations have been
specified with the hyphenation character (‘\%’ by default) or the
hw request.
Nonzero values in the previous table are additive. For example,
mode 12 causes GNU troff to hyphenate neither the last two
nor the first two characters of a word. Some values cannot be used
together because they contradict; for instance, values 4 and 16,
and values 8 and 32. As noted, it is superfluous to add 1 to any
non-zero even mode.
The automatic placement of hyphens in words is determined by pattern files, which are derived from TeX and available for several languages. These files are named hyphen.xx (for the patterns) and hyphenex.xx (for a list of exceptions in languages that require them) where xx is an ISO 639 language code; see the table below.
The number of characters at the beginning of a word after which the first hyphenation point should be inserted is determined by the patterns themselves; it can’t be reduced further without introducing additional, invalid hyphenation points (unfortunately, this information is not part of a pattern file—you have to know it in advance). The same is true for the number of characters at the end of a word before the last hyphenation point should be inserted. For example, you can supply the following input to ‘echo $(nroff)’.
.ll 1 .hy 48 splitting
You will get
s- plit- t- in- g
instead of the correct ‘split- ting’. English patterns as distributed
with GNU troff need two characters at the beginning and three
characters at the end; this means that value 4 of hy is
mandatory. Value 8 is possible as an additional restriction, but
values 16 and 32 should be avoided, as should mode 1.
Modes 4 and 6 are typical.
A table of left and right minimum character counts for hyphenation as
needed by the patterns distributed with
GNU
troff follows.92
| language | pattern name | left min | right min |
|---|---|---|---|
| Czech | cs | 2 | 2 |
| English | en | 2 | 3 |
| French | fr | 2 | 3 |
| German traditional | det | 2 | 2 |
| German reformed | den | 2 | 2 |
| Italian | it | 2 | 2 |
| Russian | ru | 2 | 2 |
| Spanish | es | 2 | 2 |
| Swedish | sv | 1 | 2 |
Hyphenation exceptions within pattern files
(that is,
the words within a
TeX
\hyphenation
group)
obey hyphenation restrictions imposed by hy.
Disable automatic hyphenation; i.e., set the hyphenation mode to 0
(see above). The hyphenation mode of the last call to hy is not
remembered, but invoking hy without an argument restores the
hyphenation mode default; groff’s localization macro files do so
for the languages listed above.
Set hyphenation mode default to
mode,
configuring the value the automatic hyphenation mode takes if
hy
is invoked without an argument.
The hyphenation mode default is available in the read-only register
‘.hydefault’;
it is associated with the environment.93
"]pattern-file"]pattern-fileRead hyphenation patterns from pattern-file, which is sought
in the same way that macro files are with the mso request or the
-m mac command-line option to groff. The
pattern-file should have the same format as (simple) TeX
pattern files. More specifically, the following scanning rules are
implemented.
\$ are not supported.
^^xx (where each x is 0–9 or a–f) and
^^c (character c in the code point range 0–127
decimal) are recognized; other uses of ^ cause an error.
hpf checks for the expression \patterns{…}
(possibly with whitespace before or after the braces). Everything
between the braces is taken as hyphenation patterns. Consequently,
{ and } are not allowed in patterns.
\hyphenation{…} gives a list of hyphenation
exceptions.
\endinput is recognized also.
\patterns is missing, the whole
file is treated as a list of hyphenation patterns (except that the
% character is recognized as the start of a comment).
The hpfa request appends a file of patterns to the current list.
GNU
troff ties the set of hyphenation patterns
to the hyphenation language code
selected by the
hla
request
(see below).
The
hpf
request is usually invoked
by a localization file loaded by the
troffrc
file.94
A second call to hpf (for the same language) replaces the
hyphenation patterns with the new ones. Invoking hpf or
hpfa causes an error if there is no hyphenation language. If no
hpf request is specified (either in the document, in a file
loaded at startup, or in a macro package), GNU troff won’t
automatically hyphenate at all.
Caution:
The
hpf
and
hpfa
requests interpret the remainder of the input line as the file name
argument,
including any spaces,
up to a newline or comment escape sequence.
Suffixing the file name with a comment,
even an empty one,
prevents unwanted space from creeping into it during source document
maintenance.95
For automatic hyphenation to work,
the formatter must know which letters are equivalent.
For example,
the letter ‘E’ behaves like ‘e’;
only the latter typically appears in hyphenation pattern files.
GNU
troff expects characters
that participate in automatic hyphenation
to be assigned
hyphenation codes
that define these equivalence classes.
At startup,
GNU
troff assigns hyphenation codes to the letters ‘a’–‘z’,
applies the same codes to ‘A’–‘Z’
in one-to-one correspondence,
and assigns a code of zero to all other characters.
The
hcode
request enables application of hyphenation codes
to characters outside the Unicode basic Latin set;
without doing so,
words containing such letters
won’t hyphenate properly
even if the corresponding hyphenation patterns contain them.
Localization files for the input character set and language
configure hyphenation codes;
see
groff_tmac(5).
Set the hyphenation code of ordinary or special character dst1 to
that of src1, and so on. dst1 must be an ordinary character
(other than a numeral) or a special character, and src1 must be an
ordinary character (other than a numeral) or a special character to
which a hyphenation code has already been applied. Assigning the code
of an ordinary character to itself effectively creates a unique
hyphenation code (which can then be copied to others). hcode
ignores spaces between arguments. If any argument is invalid,
hcode reports an error and stops reading them.
For example, the following hcode requests are necessary to assign
hyphenation codes to the letters ‘ÄäÖöÜüß’, needed for German.
.hcode ä ä Ä ä .hcode ö ö Ö ö .hcode ü ü Ü ü .hcode ß ß
Without these assignments, GNU troff treats the German word
‘Kindergärten’ (the plural form of ‘kindergarten’) as two words
‘kinderg’ and ‘rten’ because the hyphenation code of the
umlaut a is zero by default, just like a space. There is a German
hyphenation pattern that covers ‘kinder’, so GNU troff
finds the hyphenation ‘kin-der’. The other two hyphenation points
(‘kin-der-gär-ten’) are missed.
To remove a character’s hyphenation code, copy the code of a character with a hyphenation code value of zero to it. For example, ‘.hcode ß $’ removes the hyphenation code from ‘ß’ (unless ‘$’ has already been assigned a different one).
The pchar request may be helpful to troubleshoot hyphenation code
assignments. See Debugging.
Caution: This request will be withdrawn in a future
groff release. Use hcode instead.
The hpfcode request defines mapping values for character codes in
pattern files. It is an older mechanism no longer used by GNU
troff’s own macro files. hpf or hpfa apply the
mapping after reading the patterns but before replacing or appending to
the active list of patterns. Its arguments are pairs of character
codes—integers from 0 to 255. The request maps character
code a to code b, code c to
code d, and so on. Character codes that would otherwise be
invalid in GNU troff can be used.
Set the hyphenation language to lang, or clear it if there is no
argument. Hyphenation exceptions specified with the hw request
and hyphenation patterns and exceptions specified with the hpf
and hpfa requests are associated with the hyphenation language.
The hla request is usually invoked by a localization file, which
is turn loaded by the troffrc or troffrc-end file; see the
hpf request above.
The hyphenation language is available in the read-only string-valued register ‘.hla’; it is associated with the environment (see Environments).
If no hyphenation language is set or no patterns are loaded,
GNU
troff does not perform automatic hyphenation.
Set the maximum quantity of consecutive hyphenated lines to n. If
n is negative, there is no maximum. If omitted, n
is -1. This value is associated with the environment
(see Environments). Only lines output from a given environment
count toward the maximum associated with that environment. Hyphens
resulting from \% are counted; explicit hyphens are not.
The .hlm read-only register stores this maximum. The count of
immediately preceding consecutive hyphenated lines is available in the
read-only register .hlc.
Set the (right) hyphenation margin to length. If the adjustment mode is not ‘b’ or ‘n’, the line is not hyphenated if it is shorter than length. Without an argument, the hyphenation margin is reset to its default value, 0. The default scaling unit is ‘m’. The hyphenation margin is associated with the environment (see Environments).
A negative argument resets the hyphenation margin to zero. 96
The hyphenation margin is available in the .hym read-only
register.
Suppress hyphenation of the line in adjustment modes ‘b’ or ‘n’ if that adjustment can be achieved by adding no more than hyphenation-space extra space to each inter-word space. Without an argument, the hyphenation space adjustment threshold is set to its default value, 0. The default scaling unit is ‘m’. The hyphenation space adjustment threshold is associated with the environment (see Environments).
A negative argument resets the hyphenation space adjustment threshold to zero. 97
The hyphenation space adjustment threshold is available in the
.hys read-only register.
Next: Manipulating Spacing, Previous: Manipulating Filling and Adjustment, Up: GNU troff Reference [Contents][Index]