Your support for our advertisers helps cover the cost of hosting, research, and maintenance of this document

Formatting Information — An introduction to typesetting with LATEX

Chapter 1: Writing documents

Section 1.9: Dimensions, hyphenation, justification, and breaking

LATEX’s internal measurement system is extremely accurate. The underlying TEX engine conducts all its business in units smaller than the wavelength of visible light, so if you ask for 15mm space, that’s what you’ll get — within the limitations of your screen or printer, of course. While modern high-resolution displays use pixels smaller than you can easily see, many older screens cannot show dimensions of less than 1⁄96″ without resorting to magnification or scaling; and on printers, even at 600dpi, fine oblique lines or curves can still sometimes be seen to stagger the dots.

At the same time, many dimensions in LATEX’s preprogrammed formatting are specially set up to be flexible: so much space, plus or minus certain limits to allow the system to make its own adjustments to accommodate variations like overlong lines, unevenly-sized images, and non-uniform spacing around headings.

TEX uses a very sophisticated justification algorithm to achieve a smooth, even texture to normal paragraph text by justifying a whole paragraph at a time, quite unlike the line-by-line approach used in most wordprocessors and DTP systems.

Occasionally, however, you will need to hand-correct an unusual word-break or line-break, and there are facilities for doing this on individual occasions as well as automating it for use throughout a document.

1.9.1 Specifying size units

Most people in the printing and publishing businesses in English-speaking cultures habitually use the traditional printers’ points, picas and ems as well as cm and mm when dealing with clients. Many older English-language speakers (and most North Americans) still use inches. In continental European and related cultures, Didot points and Ciceros (Didot picas) are also used professionally, but cm and mm are standard everywhere else: inches are only used now when communicating with North American cultures.

You can specify lengths in LATEX in any of these units, plus some others (see Table 1.3).

Figure 1.5: An M of type of different faces boxed at 1em

m-crop 

The red line is the common baseline. Surrounding letters in grey are for illustration of the actual extent of the height and depth of one em of the current type size.

The em can cause beginners some puzzlement because it’s based on the ‘point size’ of the type, which is itself misleading. The point size refers to the depth of the metal body on which foundry type was cast in the days of metal typesetting, not the printed height of the letters themselves (see Figure 1.4). Thus the letter-size of 10pt type in one typeface can be radically different from 10pt type in another (look at Figure 1.5, where the widths are given for 10pt type). An em is the height of the type-body in a specific size, so 1em of 10pt type is 10pt and 1em of 24pt type is 24pt. A special name is given to the 12pt em, a ‘pica’ em, and a pica has become a fixed measure in its own right. An old name for a 1em space is a ‘quad’, and LATEX has a command \quad for leaving exactly that much horizontal space.

Table 1.3: Units in LATEX

UnitSize
Printers’ fixed measures
ptAnglo-American standard points (72.27 to the inch)
pcpica ems (12pt)
bpAdobe’s ‘big’ points (72 to the inch)
spTEX’s ‘scaled’ points (65,536 to the pt)
ddDidot (European standard) points (67.54 to the inch)
ccCiceros (European pica ems, 12dd)
Printers’ relative measures
emems of the current point size (historically the width of a letter ‘M’ but see below)
exx-height of the current font (height of letter ‘x’)
Other measures
cmcentimeters (2.54 to the inch)
mmmillimeters (25.4 to the inch)
ininches

To highlight the differences between typefaces at the same size, Figure 1.5 shows five capital Ms in different faces, surrounded by a box exactly 1em of those sizes wide, and showing the actual width of each M when set in 10pt type. Because of the different ways in which typefaces are designed, none of them is exactly 10pt wide.

If you are working with other DTP users, watch out for those who think that Adobe points (bp) are the only ones. The difference between an Adobe big-point and the standard point is only .27pt per inch, but in 10″ of text (a full page of A4) that’s 2.7pt, which is nearly 1mm, enough to be clearly visible if you’re trying to align one sample with another.

1.9.2 Hyphenation

LATEX hyphenates automatically according to the language you use (see § 1.9.6). To specify different breakpoints for an individual word, you can insert soft-hyphens (discretionary hyphens), done with the \- command (backslash-hyphen) wherever you need them, for example:

When in Mexico, we visited Popo\-ca\-tépetl by 
helicopter.
        

If the words needs to be hyphenated, the best-fit of the points will be used, and the rest ignored.

To specify hyphenation points for all occurrences of a word in the document, use the \hyphenation command in your Preamble (see the penultimate sidebar ‘The Preamble’) with one or more words as patterns in its argument, separated by spaces. This will even let you break ‘helico-

pter’ correctly. In this command you use normal hyphens in the pattern, not soft-hyphens.

\hyphenation{helico-pter Popo-ca-tépetl vol-ca-no}
        

If you have frequent hyphenation problems with long, unusual, or technical words, ask an expert about changing the value of \spaceskip, which controls the flexibility of the space between words. This is not something you would normally want to do without advice, as it can change the appearance of your document quite significantly.

If you are using a lot of unbreakable text (see the next section and also § 4.7.1) it may also cause justification problems. One possible solution to this is shown in § 7.3.

1.9.3 Breakable and unbreakable text

Unbreakable text is the opposite of discretionary hyphenation. To force LATEX to treat a word as unbreakable, use the \mbox command:

\mbox{pneumonoultramicroscopicsilicovolcanoconiosis}
	

This may have undesirable results, however, if you subsequently change margins or the size of the text: pneumonoultramicroscopicsilicovolcanoconiosis, although if you’re reading this in a browser, you probably won’t see the effect properly: look at the PDF.

Another option, for reoccurring words, is to use the \hyphenation command as shown in § 1.9.2, but give the word[s] with no hyphens at all, which stops them having any break-points.

To tie two words together with an unbreakable space (hard space), use a tilde (~) instead of the space (see the list in § 1.6.1). This will print as a normal space but LATEX will never break the line at that point.

A normal space between words is always a candidate for a place to break the text into lines, and the word-spacing gets evened-out between all the remaining words...with one exception: a full point (period) after a lowercase letter is treated in LATEX as the end of a sentence, and it automatically puts a little more space before the next word. You do not (and should not) type any extra space yourself.

However, after abbreviations in mid-sentence like ‘Prof.’, it’s not the end of a sentence, so we need a way to tell LATEX that this should be a normal space. The command for doing this is \␣ (backslash-space — I have made the space visible here so you can see it, but it’s just a normal space). This prevents LATEX from adding the extra sentence-space and it also means it becomes a normal breakpoint (otherwise you would use the tilde as described above).

For example, it would look wrong to break the name Prof. D.E.

Knuth at a line-end. It’s a good idea to make this standard typing practice for things like people’s initials followed by their surname, as Prof.\␣D.E.~Knuth.

1.9.4 Dashes

The hyphen (-) is only used for hyphenated compound words like editor-in-chief. LATEX inserts its own hyphens when it needs to break a word at right right-hand margin.

Dashes are different: they’re longer and they are used in different places. Check the last sidebar ‘If you don’t have accented letters on your keyboard’ for how to find these characters in your computer’s character-map.

Long dash

The long dash — what printers call an ‘em rule’ like this — is used to separate a short phrase from the surrounding text in a similar way to parentheses. If you’re using XƎLATEX, you can just type the long dash on your keyboard.

  • If you can’t find the character, type three hyphens typed together, like---this: LATEX will recognise this combination and replace it with a real em rule.

  • If you want space either side, bind the first hyphen to the preceding word with a tilde like~---␣this and use a normal space after the third hyphen (shown as a visible space here, but it’s just a normal space). This avoids the line being broken before the dash.

The difference between spaced and unspaced rules is purely æsthetic. Never use a single hyphen for this purpose.

Short dash

The short dash is used between digits like page ranges (35–47). Printers call this an ‘en-rule’ and if you’re not using XƎLATEX you can get it by typing two hyphens together, as in 35--47. Never use a single hyphen for this purpose either.

Minus sign

If you want a minus sign, use math mode (see § 1.10) where you type a normal hyphen between math delimiters like \(x=y-z\). Don’t use the hyphen for a minus sign outside math mode.

There are other dashes for special purposes in the Unicode repertoire, but they are out of scope for this document.

1.9.5 Justification

The default mode for typesetting in LATEX is justified (two parallel margins, with word-spacing adjusted automatically for the best optical fit). In justifying, LATEX will never add space between letters, only between words. The soul package can be used if you need letter-spacing (‘tracking’), but this is best left to the expert.

There are two commands \raggedright and \raggedleft which typeset with only one margin aligned. Ragged-right has the text ranged (aligned) on the left, and ragged-left has it aligned on the right. They can be used inside a group (curly-braces, for example: see the sidebar ‘Grouping’) to confine their action to a part of your text, or put in the Preamble if you want the whole document done that way. This paragraph is set ragged-right.

These modes also exist as environments called raggedright and raggedleft which are more convenient when applying this formatting to a whole paragraph or more, like this one, set ragged-left.

\begin{raggedleft}
These modes also exist as environments
called raggedright and raggedleft which is more 
convenient when applying this formatting to a 
whole paragraph or more, like this one.
\end{raggedleft}
        

Ragged setting turns off hyphenation. There is a package ragged2e providing the command \RaggedRight (note the capitalisation) which retains hyphenation in ragged setting, useful when you have a lot of long words. There’s a \RaggedLeft, too.

To centre text, which is in effect both ragged-right and ragged-left at the same time, use the \centering command inside a group, or use the center environment.

Be careful when centering headings or other display-size material, and add manual linebreaks where needed (\\) to make the breaks at sensible pauses in the meaning (Flynn, 2012). Never rely on the automated line-breaking of editors in these cases.

1.9.6 Languages

LATEX can typeset in the native manner for several dozen languages. This affects hyphenation, word-spacing, indentation, and the automatic names of the parts of documents displayed in headings (but not the commands used to produce them).

Most distributions of LATEX come with US English and one or more other languages installed by default, but it is easy to use the babel package and specify any of the supported languages or variants, for example:

\usepackage[german,frenchb,english]{babel}
...
As one writer has noted, \selectlanguage{german}``Das 
berühmte Voltaire-Zitat, \emph{\foreignlanguage{frenchb}
{il est bon de tuer de temps en temps un amiral pour 
encourager les autres}}, ist ein Beispiel sarkastischer 
Ironie.''
        

Make sure that the base language of the document comes last in the list. The list of supported languages is in the package documentation.

Changing the language with babel is a cultural shift: it changes the hyphenation patterns, word-spacing, the way in which indentation is used, and the names of the structural units and identifiers like ‘Abstract’, ‘Chapter’, and ‘Index’, etc. For example, using French as the default, chapters will start with ‘Chapitre’.

The \selectlanguage lets you tell LATEX when to switch to the language specified in the argument. If you have only a small fragment in another language (a word or two, maybe a sentence, but less than a paragraph), you can use the command \foreignlanguage to change the language just for that text. The first argument gives the language; the second contains the word or phrase.

The babel package uses the hyphenation patterns provided with your version of LATEX (see the start of your document log files for a list). For other languages you need to set the hyphenation separately (outside the scope of this book).