Your support for our advertisers helps cover the cost of hosting, research, and maintenance of this document

Formatting Information — An introduction to typesetting with LATEX

Chapter 4: Lists, tables, figures

Section 4.2: Tables

Tabular typesetting is the most complex and time-consuming of all textual features to get right. This holds true whether you are using cold or hot metal type, typing in plaintext form, using a typewriter or a wordprocessor, using LATEX, using HTML or XML, using a DTP system, or some other text-handling package.

Printers charge extra when you ask them to typeset tables, and they do so for good reason: Each table tends to have its own peculiarities, so it’s necessary to give some thought to each one, and to fiddle with the alternative approaches until finding something that looks good and communicates well.

(Knuth, 1986, ch.22)

Fortunately, LATEX provides a table model with a mixture of defaults and configurability to let it produce very high quality tables with a minimum of effort. There are two things you need to know before you start: one is the terminology (see the sidebar ‘Terminology for tables and figures’ below) and the other is what ‘floats’ are (see § 4.2.1 below

The grid alignment of information in rows and columns in a Table is called a ‘tabulation’ or ‘tabular matter’, done with a tabular environment, and we’ll deal with that in § 4.2.3 below.

4.2.1 Floats

Tables and Figures (and several other features of documents like sidebars) are what printers and publishers refer to as ‘floats’. This means they are not part of the normal stream of your text, but separate freestanding entities, positioned in a part of the page to themselves (top, middle, bottom, left, right, or wherever the layout designer has specified). They always have a caption describing them and they are always numbered so they can be referred to from elsewhere in the text.

LATEX automatically floats Tables and Figures, depending on how much space is left on the page at the point that they are processed. If there is not enough room on the current page, the float is by default moved to the top of the next page.

The positioning can be changed by moving the table or figure environment to an earlier or later point in the file, or by using the optional argument to the table or figure environment. This can be any mix of the letters h (‘here’), t (‘top’), b (‘bottom’), p (‘page by itself’) to recommend where the Table or Figure should go (order is not significant: LATEX will pick the best fit). To make your recommendation stronger, precede the first letter with an exclamation mark (!).

In this example you can see a Table requested to go here(!) or if not, at the top of the page; and a Figure requested to go at the bottom of the page or if necessary, on the next full page by itself.

\begin{table}[!ht]
...
\end{table}

\begin{figure}[bp]
 ...
\end{figure}

Authors sometimes want many figures or tables occurring in rapid succession, which is poor writing, as it’s not just unfair to the reader, but raises the problem of how they are going to fit on the page and still leave room for text. In extreme cases, LATEX will give up trying, and stack them all up until the end of the chapter or section for you to decide manually where to put them.

The skill is to space your tables and figures out within your text so that they intrude neither on the thread of your argument or discussion, nor on the visual balance of the typeset pages. But this is a skill few authors have, and it’s one point at which professional typographic advice or manual intervention may be needed.

If you are unable to arrange things easily, as a last resort you can use the float package and the option letter capital H (‘Here, dammit!’). Be aware that figures or tables using this package option are no longer floats so the onus is on you to ensure that the numbering sequence is not disrupted.

Remember that if there really is not enough space ‘here’ on the page, then it really won’t fit, and you will HAVE TO move things manually.

The float package also lets you create new classes of floating object (sidebars, examples, exercises, etc).

Please now read this section a second time. Getting the hang of floats can take a while if you’ve never come across the idea before. Most writers strongly recommend writing the document in its entirety first, and not worrying about where the floats end up until the text is complete and not likely to change any more. Then start moving any floats that are misplaced.

4.2.2 Normal tables

To create a LATEX Table, use the table environment containing a \caption command followed by a \label command (the label is what you will use to refer to the table: we haven’t done this yet, but see § 5.3.1 below if you’re curious).

\begin{table}
  \caption{Project expenditure to year-end 2022}
  \label{ye2022exp}
  ...
\end{table}

Numbering is automatic, but the \label command MUST follow the \caption command, not the other way round. The numbering automatically includes the chapter or section number in document classes where this is appropriate. The \caption command has an optional argument to provide a short caption if the full caption would be too long for the List of Tables:

\caption[Project 2022]{Project expenditure to 
  year-end 2022 showing utility costs as a 
  separate item}

The caption is centered if it’s shorter than one line; otherwise it is set full out. The caption and ccaption packages offer extensive customisation of the caption, including location and font.

Table 4.3 below is an example that we can use to show how tables work (if you’re reading this in on the web, ignore the shading: that’s a part of the webpage style, not the LATEX table).

Table 4.3: Project expenditure to year-end 2022

Item€ Amount
Total64,783
a)Salaries (2 part-time research assistants)28,000
Conference fees and travel expenses14,228
Computer equipment (5 workstations)17,493
Software licenses3,562
b)Rent, light, heat, etc1,500

The Institute also contributes to items at (b).

4.2.3 Simple tabular matter

To typeset the grid within a table (or elsewhere), you use the tabular environment. There are four ways to enter the data:

By hand

you can enter the tabular matter (cell data) by typing it in, which is perhaps the most common method, especially for small quantities of data;

In a grid tool

many LATEX editors come with a pop-up grid tool like a miniature spreadsheet, which makes creating tabular matter easier, at the cost of some loss of fine control (see Figure 4.1 below).

With a package

if the quantity of data is very large and is already in a spreadsheet or database, or if it is data which will change frequently before you are finished your document, you can use the datatool package (formerly known as csvtools) to read the data from a Comma-Separated Values (CSV) spreadsheet import/export file (see § 4.2.4 below). If the data changes, you just re-export it and re-run LATEX.

For large numbers of tables in big documents (eg theses) this is by far the most accurate and time-saving method.

As an image

it is also possible to include a ‘table’ which has actually been captured as an image from elsewhere, such as a screenshot from a spreadsheet (so it’s not really a table just a picture of one). We will see how to include images in § 4.3 below on Figures, where they are more common.

Figure 4.1: Table being edited in LYX’s tabular editor

lyx-table

In Figure 4.1 above and Table 4.3 above there is the table which we’ll use as an example. It’s got a number, a caption, three columns with headings and some ruled lines, and a comment afterwards.

The tabular environment

This takes one compulsory argument which specifies how many columns there will be, and what type they are. You give one letter for each column using one of l, c, and r for a left-aligned, centered, or right-aligned column. The number of letters MUST be the same as the number of columns you are putting in the table.

\begin{tabular}{clr}
...
\end{tabular}

In the example in Table 4.3 above, the tabular setting has three columns, the first one centered, the second left-aligned, and the third one right-aligned, so it is specified as {clr}. The dcolumn package provides a d column type for decimal alignment, and there are others we shall come across later.

Note

Each cell of these types (c, l, r, d) can hold only one line of data in its cell. If you need multi-line cells (like miniature paragraphs), see § 4.2.4 below

Cell and row division

You can then type in each row, making sure each cell’s data in the row is separated with an & character, and each row ends with a double backslash (\\).

a)&Salaries (2 part-time research assistants)&28,000\\

You don’t need to add any extra spaces or do any manual formatting, although you can if you want: LATEX just uses the column specifications to know how to format it.

If a cell has nothing to go in it, you just don’t type anything, but the ampersand must still be there:

&Total&64,783\\
Column headings

These are often set in bold type, as in the example (see ‘Cell formatting’ below).

&\textbf{Item}&\textbf{\EUR\ Amount}\\[6pt]\hline

In this case there is also some extra space (6pt, see ‘Row spacing’ below) to make it look nicer, and a horizontal line across the table (see item ‘Table rules’ below).

The data for a row may be longer than the width of the screen window in your editor, but it can take up as many lines on the screen as needed; the end of the row is always signalled by the double backslash, so LATEX knows when it’s time for the next row.

Cell formatting

Font changes can be done within a cell (bold, italic, etc; we’ll come on to these later, see § 6.2.4 below) and these changes are limited to the cell in which they occur: they do not ‘bleed’ across cells (in the example, the column headings have each been made bold separately).

Row spacing

Additional vertical white-space below a row (but above a rule) can be specified by giving a dimension in [square brackets] immediately after the double backslash which ends the row (3pt in the case of the last row before the totals in the example). A negative value will decrease the spacing below that row.

If the line below a horizontal rule looks too close, it can be optically spaced by adding a strut at the start of the next line (that is, after the \hline). A ‘strut’ is hidden vertical rule a little bit higher than the row-height; hidden because its width is zero, making it invisible, as in the example code. Use the \rule command for this, with a width of 0pt and height of 1.2em, just a fraction higher than the text, which will force the rows apart by 0.2em.

\begin{table}
\caption{Project expenditure to year-end 20}
\label{ye2022exp}
\centering\smallskip
\begin{tabular}{clr}
  &\textbf{Item}&\textbf{\EUR\ Amount}\\\hline\rule{0pt}{1.2em}
a)&Salaries (2 part-time research assistants)&28,000\\
  &Conference fees and travel expenses&14,228\\
  &Computer equipment (5 workstations)&17,493\\
  &Software&3,562\\
b)&Rent, light, heat, etc.&1,500\\[3pt]\cline{2-3} 
\rule{0pt}{1.2em}&Total&64,783\\
\end{tabular}
\par\medskip\footnotesize
The Institute also contributes to (a) and (b).
\end{table}
Table rules

A line across the whole table is done with the \hline command after the double-backslash which ends a row.

For a line which only covers some of the columns, use the \cline command (in the same place), with the column range to be ruled in curly braces. If only one column needs a rule, it must still be given as a range (eg in the example, {3-3}).

Vertical rules (between columns) can be specified in the column specifications with the vertical bar character (|) before, after, or between the l, c, r letters. This character creates rules which extend the whole height of a table: it is not necessary to repeat them every row.

I have indented the code example given just to make the elements of the table clearer to read: this is for editorial convenience, and has no effect on the formatted result (see Table 4.3 above). If you copy and paste this into your example document, you will need to add the marvosym package to your Preamble, which will let you use the official CEC-conformant Euro symbol command \EUR (€ as distinct from €).

4.2.4 More complex tabular formatting

TEX’s original tabular environment was designed for classical numerical grids, where each cell contains a single value. If you need a cell to contain multiline text, like a miniature paragraph, you can use the column specification letter p (paragraph) followed by a width in curly braces instead of an l, c, or r. So p{3.5cm} would mean a column 3.5cm wide, where each cell can contain paragraph-style text, for example:

\begin{tabular}{cp{3.5cm}r}

These p column specifications are not multi-row (row-spanned) entries: they are single cells which can contain multiple lines of typesetting: the distinction is extremely important. These paragraphic cells are typeset justified (two parallel margins) and the baseline of the top line of text is aligned with the baseline of neighbouring cells in the row.

The array package provides some important enhancements which overcome the limitations of the p cells:

Vertical alignment

In addition to the p, whose vertical alignment baseline is the the top line of text, the array package provides the m and b letters. These work the same way as p (followed by a width in curly braces), but their vertical alignment baseline is the middle or bottom of the cell respectively.

Prefixes and suffixes

With the array package, any column specification letter can be preceded by >{} with some LATEX commands in the curly braces. These commands are applied to every cell in that column, so to make a p column typeset ragged-right you would say, for example, >{\raggedright}p{3.5cm} (or \raggedleft, or \centering).

Note that if you do this, the last column specification MUST include a prefix or suffix containing the \arraybackslash command, to revert the meaning of the double-backslash, which gets redefined by horizontal formatting commands like \raggedright, otherwise you will get errors when the end-of-row double-backslash is not recognised.

There is a suffix format as well: you can follow a column letter with <{} with code in the curly braces (often used to turn off math mode started in a prefix).

The colortbl package lets you colour rows, columns, and cells; and the dcolumn package mentioned above provides decimal-aligned column specifications for scientific or financial tabulations. Multi-column (column-spanning) is built into LATEX tables with the \multicolumn command; but for multi-row (row-spanning) cells you need to add the multirow package. Multi-page and rotated (landscape format) tables can be done with the longtable, rotating, and landscape packages.

The LATEX table model is very different from the HTML auto-adjusting model used in web pages; it’s closer to the Continuous Acquisition and Life-cycle Support (CALS) table model used in technical documentation formats like DocBook. However, auto-adjusting column widths are possible with the tabularx and tabulary packages, offering different approaches to dynamic table formatting.

You do not need to format the tabular data in your editor: LATEX formatting will typeset the table using the column specifications you provided. You can therefore arrange the layout of the data in your file for your own convenience: you can give the cell values all on one line, or split over many lines: it makes no difference so long as the cells are separated with the & and the rows are ended with the double-backslash.

As mentioned earlier, some editors have a grid-like array editor for entering tabular data. Takaaki Ota provides an excellent tables-mode for Emacs which uses a spreadsheet-like interface and can generate LATEX table source code (see Figure 4.2 below).

Figure 4.2: Tables mode for Emacs

emacs-table

4.2.5 More on tabular spacing

Extra space, called a ‘shoulder’, is automatically added on both sides of all columns by default. The initial value is 6pt, so you get that amount left and right of the tabulation; because it is added left and right of every cell, the space between columns is therefore 12pt by default. This can be adjusted by changing the value of the \tabcolsep dimension before you begin the tabular environment.

\setlength{\tabcolsep}{3pt}

The shoulder can be omitted in specific locations by adding the code @{} in the appropriate place[s] in the column specification argument. For example to omit it at the left-hand and right-hand sides of a tabular setting, put it at the start and end of the column specifications (putting it between two column specifications will remove all space between those columns).

\begin{tabular}{@{}clr@{}}

You can also use @{} to insert different spacing between columns (or at the right-hand and left-hand sides) by enclosing a spacing value; for example, @{\hspace{2cm}} could be used to force a 2cm space between two columns.

To change the row-spacing in a tabular setting, you can redefine the \arraystretch command (using \renewcommand because it’s defined as a command, not a length). The value of \arraystretch is actually a multiplier, preset to 1, so \renewcommand{\arraystretch}{1.5} would set the baselines of your tabular setting one and a half times further apart than normal.

Exercise 4.3 — Calculate vertical spacing in a tabular environment

Assume that you are making a table in the default size of 10pt type on a 12pt baseline. You want a 14pt baseline, so what value would you set \arraystretch to?

It is conventional to centre the tabular setting within a Table, using the center environment (note US spelling) or the \centering command (as in the example) — the default is flush left — but this is an æsthetic decision. Your journal or publisher may insist instead that all tabular material is set flush left or flush right (not the individual columns; the whole tabular setting inside the table).

If there is no data for a cell, just don’t type anything — but you still need the & separating it from the next column’s data. The astute reader will already have deduced that for a table of n columns, there must always be n-1 ampersands in each row. The exception to this is when the \multicolumn command is used to create cells which span multiple columns, when the ampersands of the spanned columns are omitted. The \multicolumn command takes three arguments: the number of columns to span; the format for the resulting wide column; and the contents. So to span a centred heading across three columns you would write \multicolumn{3}{c}{The new heading}.

The \multicolumn command can also be used to replace a single column if you need to vary some prefixing or suffixing or alignment specified in the column specification. For example if you have a right-aligned column (eg numbers) but you want one of the cells to be some text centered, you could write \multicolumn{1}{c}{no data}. In this case, of course, you keep all the ampersands, because you are not actually spanning columns.

4.2.6 Techniques for alignment

As mentioned earlier, it’s perfectly possible to use the tabular environment to typeset any grid of material — it doesn’t have to be inside a formal table. There are also other ways to align material without using a tabular format.

4.2.6.1 Using tabular alignment outside a table

By default, LATEX typesets tabular environments inline to the surrounding text. That is, the tabular environment acts like a single character within the paragraph. This also means if you want an alignment displayed by itself, not as part of a formal table, you can put it between paragraphs (with a blank line or \par before and after) so it gets typeset separately; or put it inside a positioning environment like center, flushright, or flushleft.

One side-effect of this is that small and intricately constructed micro-tabulations can be used to good effect when creating special effects like logos, as they they get treated like a character and can be typeset anywhere.

Tabular setting can be used wherever you need to align material side by side, such as in designing letterheads, where you may want your company logo and address on one side and some other information on the other side to line up with each other. One common way to implement ‘spring margins’ like this is to create two columns of whatever fraction of the page width you need (but adding to 1, of course), and removing for the extra space that would otherwise be added automatically between columns and at the edges:

\begin{tabular}{
	  @{}
	  >{\raggedright}p{.75\textwidth}
	  @{}
	  >{\raggedleft\arraybackslash}p{.25\textwidth}
	  @{}}
left-hand material
&
right-hand material\\
\end{tabular}

As mentioned earlier, the @{} suppresses the inter-column gap (or the shoulder left or right) so that the total width available will be the full text width of the page.

Exercise 4.4 — Create a tabulation

Create one of the following in your document:

  1. a formal Table with a caption showing the number of people in your class broken down by age and sex;

  2. an informal tabulation showing the price for three products;

  3. the logo \setlength{\fboxsep}{3pt}\setlength{\fboxrule}{.4pt}\fbox{\begin{tabular}{@{}c@{}}\bfseries Y E A R\\[-2pt] \bfseries 2 0 0 0\vrule depth.5ex width0pt\end{tabular}} (hint: § 4.6.2 below)

4.2.6.2 Alignment in general

Within the two-dimensional plane of conventional typesetting, there are two sets of axes to which the elements of the document should align: horizontal and vertical.

  • The vertical axes are the left and right edges of the paper, the left and right margins of the text area, indentation, any internal temporary left and right margins (as for lists, block quotation, displayed mathematics, the left and right edges of illustrations, etc), and any internal column boundaries of a tabular environment.

  • The horizontal axes are the top and bottom edges of the paper, the top and bottom margins of the text area, the space for running headers and footers, the top and bottom edges of all ‘pool’ items (see the start of this chapter), the baseline of the text, and any internal row boundaries of a tabular environment.

Warning

If someone says they want something ‘aligned’, you need to ask ‘aligned to what, exactly’? It’s not always obvious, and in unusual cases it’s not always easy to find out how to calculate or access an axis without careful study of the internal programming of a class or package.

By default, LATEX starts each line up against the left-hand margin: if indentation is used, then the first lines of paragraphs will be indented, except for the first paragraph after a heading.

  • Depending on the language you select in the babel or polyglossia packages, the first lines of first paragraphs after a heading may not be indented (for example in French typesetting).

  • In right-to-left languages, the alignment is reversed, and lines start up against the right-hand margin, and (see below) end against the left-hand margin.

The typeset line extends to the right-hand margin, and the process of justification ensures that all line-ends, apart from the last in a paragraph, align with this margin. The exception is when a raggedright or raggedleft or centering alignment has been specified.

Alignment to the four paper edges is extremely rare except in magazines and specialist formats like corporate reports or white papers, where images may be positioned to the edge[s] of the paper, and are said to ‘bleed’ off the sheet. It is of course possible in LATEX but it is well outside the scope of this introductory text for beginners.

4.2.6.3 Alignment within pool items

While typesetting a paragraph, LATEX has no way to become aware of whereabouts a particular word or letter is being placed, for two reasons:

  1. The justification of the paragraph does not start until after the whole paragraph has been typeset; only then does TEX start testing for line-end breakpoints, assigning them penalties, and inserting the variable spacing between words. This process is synchronous with the typesetting of the paragraph, and the next paragraph will not be started until justification of the one just ended is complete.

  2. The positioning of the paragraph vertically on the page does not start until well after at least a whole page’s worth of material has been typeset and justified, and the ‘galley’ of accumulated material comes close to filling up. At this point, TEX pauses typesetting of the next page (which it has already started), finds the optimum place to break the page, sends the completed page to the output, resets the accumulator to the remaining material, and then resumes typesetting. This process of page-building is therefore asynchronous with the process of typesetting, and the point at which access to already-typeset material ceases to be possible is not predictable in meaningful terms.

This means that doing things with stuff that has already been dealt with really isn’t possible, and requests for it have to be respectfully declined. Anything you need to do with an item, whether it’s a letter or a word or a paragraph, like applying a font change or putting it in a box, for example, needs to be done in situ, before it disappears down TEX’s throat.

While there are packages for dealing with completed paragraphs, such as reledpar for typesetting synchronised parallel-page (eg dual-language) editions, access to the inside of the paragraphs is not possible at this stage. It is, however, possible to typeset material into a box, and then do things with it, including emptying it all back out again, in a limited manner. This makes it possible to see how much space a particular item is going to occupy, and then decide whether or not to treat it in a certain way. Standard LATEX does this when deciding if a table or figure caption is narrow enough to fit centered on one line, or if it needs to set it full-out.

Packages which provide their own alignment options, such as enumitem for finer control of lists, usually specify in the documentation how to manipulate the shape and appearance of their environments. A substantial amount of this is about how to align one atomic value, such as a heading or title, with another one, such as the word which comes after it. In the case of the lettrine for dropped initial capitals, it’s about how to adjust the capital (up, down, right, left) with respect to the indented rectangle into which it is to fit. In the case of the colortbl package for coloured rows and columns and cells in tables, it includes details of how to get the coloured block microadjusted.

4.2.6.4 Alignment to margins

The geometry package has extensive features for specifying the paper size, page size, margin sizes (left and right, if you are typesetting for double-sided work), marginal gaps, the head and foot settings for headers, footers, footnotes, and the gaps between them.

The description of line-alignment in the preceding section holds true for all text typeset inside further environments, for example in an abstract or a quotation, and within all lists, as well as the p/m/b column formats within a tabular setting. So long as you remain aware of the possible effect of unscoped formatting commands on lower-level nested environments, you can nest one environment inside another to an unspecified depth, and the rules of alignment will continue to be applied as much as possible. However, as with HTML and CSS, it is possible to overuse or abuse nesting, as it makes the code obscure.

Because the nesting of environments implies encapsulation, access to the alignment points (eg margins) of an outer environment is often not possible inside a deeper-nested environment. The TEX language model allows for the inheritance of settings defined at a higher level, but where these values are implemented as part of the code creating both the current environment and a higher one (eg lists inside lists), they will occupy the same space, and only the local value will be accessible. In such cases, any values needed would have to be saved in a variable accessible to the lower-level environment. In 30+ years of using LATEX I have only ever needed to do this once.

4.2.6.5 Grids

Outside the tabular environment, LATEX does not use a grid system. Its origins in mathematics mean that because displayed equations can occupy non-integer numbers of ‘lines’ (compared with text, which always occupies a whole number of lines), it was judged better for quality to allow flexible space between headings and paragraphs. Over the depth of a whole page, this minute amount of flexibility usually absorbs the fractional part of a line-height due to overspill in formulas (part of which ‘rubberisation’ led Leslie Lamport to choose LATEX as the name for his set of macros).

There is a grid package available which enables grid setting in double-column documents, but overall there is no easy way to ‘snap’ pool elements to arbitrarily distanced gridlines. The flexibility of \parskip and the dozen or more other ‘skips’ (flexible lengths) in the LATEX source (latex.ltx) could be removed, and display mathematics set in boxes of an integer number of \baselineskips, and special environments could be written to anchor themselves to a specific corner, but in general, the model of flexibility has proved itself over nearly 40 years, and requirements for grid models should be transferred to the NTS in the care of TUG and the LATEX development specialists.

  1. You can use the tabular environment anywhere you need stuff aligned in rows and columns, not just in a figure. 

  2. Note that this use of the double backslash to signal the end of a row is subtly different from the use we saw in § 1.10.5 above to terminate a normal text line prematurely. Here it marks the end of a table row. 

  3. The term ‘spring margins’ comes from the DOS wordprocessor PC-Write and seems to be due to its author, the late Bob Wallace. I am not aware of any other mainstream system at the time that implemented them. 

  4. This is known as Anglo-American usage (and applies to those countries and their [legacy] colonies, when using the English language).