Browse all

Topics

Publishing

The revolution will be typeset

10 Jan 2013

As the computing world shifts from desktops and laptops to tablet-style devices, one of the most widely used tools in physics – LaTeX – is struggling to follow. Software developer Duncan Steele explains how this typesetting program is now starting to catch up

The revolution will be typeset

The tablet revolution began on 27 January 2010, when Apple’s then chief executive, Steve Jobs, stood in a packed lecture hall in San Francisco and unveiled “a third category of device”. With the arrival of the iPad tablet, technology finally caught up with science fiction – keyboards, mice, printers and disk drives had all been replaced with a simplified touchscreen interface and a wireless network connection. Apple’s device quickly gained competitors, as tablets’ portability and ease of use made them an instant favourite among people who use their computers mainly for e-mailing, browsing the web, watching films and playing games. It took a little longer to adapt more serious desktop tools such as word processors and spreadsheets for use with a simplified touchscreen interface, but before long, almost all common desktop applications had tablet siblings.

There was, however, one major exception: LaTeX. Despite impassioned pleas from the academic world, the typesetting program used by tens of thousands of mathematicians, physicists and other scientists seemed to have been left out of this technological revolution. That began to change in September 2012, when two native LaTeX “apps” finally made it to the iPad – one of which, Texpad, was the result of a year-long development project for me and my business partner, Jawad Deo. However, the new apps still lack many of the packages and tools that users have come to expect from LaTeX, so although LaTeX has now joined the tablet revolution, it continues to lag behind. Why?

Stuck in the sandbox

LaTeX is an offshoot of a typesetting program called TeX, which was introduced by the computer scientist Donald E Knuth in 1978. In Knuth’s words, it was “intended for the creation of beautiful books – and especially books that contain a lot of mathematics”. LaTeX was born when a set of extension packages for TeX was released in the early 1980s. TeX and LaTeX differ from word processors such as Microsoft Word in that they replace cumbersome equation editors and layout options with a simple “typesetting language” (figure 1). For example, to make boldface text in LaTeX, you just place the text within the brackets of a “textbf{ }” command. Many common mathematical symbols are particularly easy to create: inserting an integration sign requires the command “int”, and the command for a subscript “s” is simply “_s”. Newcomers to LaTeX face a steep learning curve, but once they have become accustomed to the program’s high-quality typography, they rarely switch back.

In the years since its creation, TeX migrated painlessly from mainframe to desktop to laptop, so it is not obvious why it is taking so long to progress to tablets. Nothing about the TeX typesetting language is incompatible with a touchscreen interface. Instead, the obstacles lie below the tablets’ sleek metal-and-glass exteriors, in the operating systems powering the tablet revolution.

A traditional desktop operating system has a single file system that is visible to the user and accessible by all installed software. This architecture makes it possible for a malicious or malfunctioning program to “run loose” and attack other files, but it also allows a single task to be distributed across multiple applications and tools. LaTeX typesetting is a great example of this distributive process. For example, to produce the first draft of this article, I typed it in the desktop version of our Texpad LaTeX editor, which passed the data first to LaTeX, then to a program called dvips that converts files from one format (DVI) to another (PostScript), then to the ps2pdf PostScript to PDF converter, and finally back to Texpad for display (figure 2).

Tablet operating systems such as those found on the iPad and Android-based tablets, however, force all software to be self-contained: every application is restricted to a subset of the file system and hardware. This subset is called a “sandbox” and it acts like a virtual cage within which an application’s files are both contained and protected from other applications. Just as cages ensure that zoos don’t consist of a large pile of animal carcasses and one well-fed lion, sandboxes create a safe environment for the user where any single malicious or malfunctioning application is limited to damaging its own files.

This is great for security, but it inhibits software such as LaTeX, which relies on a co-operative style of software architecture. If a tablet LaTeX editor needs to communicate with a tablet LaTeX typesetter, then it must include that typesetter within its sandbox, along with all the tools and packages the typesetter requires. iOS, the iPad’s operating system and as such the most widely used tablet operating system, goes even further and requires all software to be packaged as a single program. In almost every case this makes sense, but it is a headache when you consider the four different programs in the simple typesetting example described above.

Consequently, for the iOS version of Texpad we combined the editor, LaTeX, a bibliography program called BibTeX and a basic set of LaTeX packages, all in a single program. Contrast this with a desktop installation where these components would be all be distributed, updated and operated as separate entities.

A complex ecosystem

The problem with packaging LaTeX in this monolithic manner is that after nearly 35 years, the TeX ecosystem consists of a mind-boggling number of packages, and almost every day we get an e-mail from a user asking us to add some esoteric package or tool. The youngtab package is a perfect example: it allows you to draw collections of boxes or cells called Young tableaux in your LaTeX document, and although it is indispensable for some mathematicians and theoretical physicists, it is useless for the majority of users. For now, we are happy to keep expanding what we offer, but we rarely get the same request twice, and if we added every package our customers want, our app would outgrow a tablet’s limited storage space. For example, the current edition of the most widely used distribution of LaTeX, TeX Live 2012, consumes 4 GB of space on my hard disk. Although tolerable on a laptop, that equates to almost a third of my iPad’s usable space – an unacceptable size for an application. The bloated nature of LaTeX also slows down the typesetting process. While a web browser can lay out a large document almost instantaneously, typesetting even the simplest document on my laptop takes 2 s, and my PhD thesis took well over a minute. This isn’t a problem on a fast desktop computer, but on a low-power, battery-conscious tablet, it can be.

The biggest problem for developers of a tablet-friendly LaTeX is not the number of LaTeX tools and packages, or even the sandboxed tablet operating systems, but the fractured nature of TeX itself. To illustrate this, suppose an experimental physicist is writing a paper, and she wants to include both a diagram and a photograph of her experiment. How should she do this? One common choice of package for drawing diagrams in LaTeX is PSTricks. However, PSTricks only works with the LaTeX/dvips/ps2pdf chain of typesetting tools described above, and her photo is in JPEG format, which will not typeset with that chain. JPEGs will typeset with a different version of LaTeX, known as pdfLaTeX – but PSTricks, of course, won’t.

To get around this apparent impasse, our experimental physicist could re-draw her experiment using a different package that is compatible with pdfLaTeX, or she could use a package to convert her photo to a format acceptable to the original typesetting chain. I don’t want to bore you with all her other options, but on the version of LaTeX currently running on my laptop, I count a choice of six typesetting chains, two diagram packages, and innumerable tools to knit all these choices and their incompatible file formats together.

The tangled web

Why, you may ask, is TeX structured in such a tortuous fashion? After all, it was written by Knuth, a man who is held in the same esteem among computer scientists as Richard Feynman is among physicists. When Knuth began TeX in the 1970s he started from scratch. There was no font system suitable for TeX, and no suitable file format for the final typeset product either. Knuth’s solution was to create a new font system, Metafont, and his friend David Fuchs created a new document format, DVI. All of this was written in WEB, a programming system Knuth also created. The resulting system was so far ahead of its time that it took 20 years for a general-use font format (OpenType) to emerge with support for all the typographical tricks supported by TeX and Metafont, such as kerning and ligatures (figure 3), and to bring these techniques into wide usage.

Few people have the creativity, programming ability and endurance required to construct a system as capable as TeX from nothing, and Knuth deserves the renown it has brought him. He certainly has my gratitude. Nevertheless, over time, Knuth’s groundbreaking technologies aged and disappeared. DVI gave way to PostScript in the 1980s and PDF in the 1990s; Metafont did not survive the arrival of PostScript either; and as for the WEB programming system – well, as far as I know, Metafont, TeX and the occasional TeX-related tool, such as the bibliography-generating BibTeX, are the only pieces of software written in it. Even the LaTeX logo (below right) has aged. Initially, it was a demonstration of TeX’s powerful support for superscripts, subscripts and kerning. Now, it is just a hassle to have to type the extra capital letters every time.

In the face of all these changes, Knuth decided to preserve TeX’s core in its original state, and he only consented to alterations that fix bugs in this core code. So although DVI, Metafont and WEB usage have declined, TeX continues to produce files in the defunct DVI format – forcing developers to write external utilities to convert the DVI file to a PDF. Users also came to expect colour and images in their documents, but once again TeX was not modified; instead, extra features were embedded in the ancient DVI format as “special strings” for a different tool to interpret and render further down the pipeline.

Inevitably, a PDF-producing variant of TeX, called pdfTeX, was written. But thanks to Knuth’s prohibition against altering TeX, pdfTeX soon became a separate and competing typesetting tool. Many packages will run on both systems, but some run on pdfTeX only, and others on the original TeX only. Unfortunately, this “forking” set a pattern in TeX development, splitting not only the code but also the efforts of its community of developers. Today there are five widely used typesetting engines – TeX, pdfTeX, XeTeX, luaTeX and pTeX – as well as some less common ones, such as kerTeX.

Out of many, one?

At this point, non-LaTeX users may be wondering why anyone bothers with such a complex and clunky system. LaTeX’s longevity rests on one simple fact: it produces beautiful documents. For most users, that is all that matters. Not everyone needs to know about what is going on “under the hood”, and for those physicists who do take a peek, the intricate nature of LaTeX can be appealing. Its interlinking architecture conjures up images of a patchwork of tools running in harmony – the typesetting equivalent of Charles Babbage’s difference engine.

However, a difference engine won’t fit in your pocket, and those of us working behind the scenes also recognize that this particular engine would spin more smoothly if it had fewer cogs. The incompatibility between LaTeX’s heritage and structure and the sandboxing on most tablet devices has merely heightened an existing problem. For us, the only solution has been to do what the LaTeX community should have done long ago: choose.

To create our LaTeX tablet app, we have selected a single typesetting engine, kerTeX, and a single file format, PDF. We have merged this with the BibTeX bibliography tool into a single software component, or “library”, that is plugged into Texpad, our editor. The result is a pleasantly snappy typesetter, and a starting point from which we are modernizing LaTeX’s internal architecture to make it both compatible with tablets and amenable to further development.

It is often said that “the path to hell is paved with good intentions” and today’s fractured LaTeX lies at the end of a long trail of well-intentioned rewrites. We are mindful that an ecosystem blighted by incompatible standards cannot be cured with another incompatible standard – even if it does come in tablet-friendly form. To avoid this trap, we are following XeTeX’s example of supporting standards rather than creating them. XeTeX (and the related XeLaTeX) have discarded the TeX-specific character encodings in favour of Unicode, an encoding system containing virtually all characters ever written – from the familiar Latin alphabet to ancient Egyptian hieroglyphics. Unicode has long since been the standard encoding in all other areas of the computer world, so having supported Unicode, XeTeX is also capable of working with modern font standards, such as the OpenType format mentioned above, rather than just the antiquated, and TeX-specific, Metafont files.

Other approaches are possible, especially on tablets based on the more loosely sandboxed Android operating system, on which a single application can consist of multiple interacting programs. As this article was being prepared for publication at the end of 2012, an Android developer, Vu An Hoa, released the TeXPortal application in which he packaged the entirety of TeXLive in its original multiple-program form within a single application sandbox. There is a great deal of effort being spent on adapting LaTeX for tablet computers, and this effort is incontrovertible proof of the importance and superiority of Knuth’s system.

As important as it is for TeX to keep abreast of changes in the computer world, the typographical quality of documents produced by the 1978 version of TeX still stands up against today’s word processors. That is why LaTeX has survived several technology revolutions already, and it is why it will also survive the advent of tablet computers.

Related journal articles from IOPscience

SILVER SUPPLIERS

Copyright © 2018 by IOP Publishing Ltd and individual contributors
bright-rec iop pub iop-science physcis connect