Your support for our advertisers helps cover the cost of hosting, research, and maintenance of this document

Formatting Information — An introduction to typesetting with LATEX

In this chapter…

As we saw right at the start, LATEX uses plaintext files, so they can be read and written by any standard application that can open text files. This helps preserve your information over time, as the plaintext format cannot be obsoleted or hijacked by any manufacturer or sectoral interest, and it will always be readable on any computer, from your smartphone (LATEX is available for many handhelds, from old PDAs, see Figure 8.1, to Android devices, see Figure 8.2) through all desktops and servers right up to the biggest supercomputers.

Figure 8.2: LATEX editing and processing on the Samsung Galaxy Note 4

However, LATEX is intended as the last stage of the editorial process: formatting for print or display. If you have a requirement to re-use the text in some other environment — a database perhaps, or on the Web or a CD-ROM or DVD, or in Braille or voice output — then it should probably be edited, stored, and maintained in something neutral like the Extensible Markup Language (XML), and only converted to LATEX when a typeset copy is needed.

Although LATEX has many structured-document features in common with SGML and XML, it can still only be processed by the LATEX, PDFLATEX, and XƎLATEX programs. Because its macro features make it almost infinitely redefinable, processing it requires a program which can unravel arbitrarily complex macros, and LATEX and its siblings are the only programs which can do that effectively. Like other typesetters and formatters (Quark XPress, Adobe InDesign and PageMaker, FrameMaker, Microsoft Publisher, 3B2, etc), LATEX is largely a one-way street leading to typeset printing or display formatting.

Converting LATEX to some other format therefore means you will unavoidably lose some formatting, as LATEX has features that others systems simply don’t possess, so they cannot be translated — although there are several ways to minimise this loss or compensate for it. Similarly, converting other formats into LATEX often means editing back the stuff the other formats omit because they only store appearances, not structure.

Most converters are one-way: that is, they convert into LATEX or out of LATEX, and there are several excellent systems for doing the conversion from LATEX directly to HyperText Markup Language (HTML) so you can at least publish it on the web, as we shall see in § 8.2.

However, there is one system that does both, and includes a huge range of formats: Pandoc (http://pandoc.org/). This is a large library of Haskell routines for handling conversions, with a command-line front end. Supported formats include Word, OpenOffice/Libre Office, DocBook, InDesign, Markdown, and MediaWiki. Before trying the systems described in § 8.1 and § 8.2, see if Pandoc will handle your files. The exception is probably converting from XML to LATEX for which a robust XSLT2 script is really the only reliable solution.

1. Strictly speaking it isn’t output at this stage: XML processors build a ‘tree’ (a hierarchy) of elements in memory, and they only get ‘serialised’ at the end of processing, into a stream of characters written to a file.