Use the “caption” package to customize your tables and figures in LaTeX

The “caption” package is one of the first packages I found when I began learning LaTeX. After discovering how to include graphics and attach a caption to any figure, I likely Googled “latex bold caption labels” when I noticed that the default LaTeX behavior did not match that of many of the chemistry journals I was seeking to emulate.

The default look of a LaTeX caption is:


(Note: for instructions on how to create beautiful native LaTeX figures using GNUplot, see my post here!)

Not bad – but there are a few ways we can make it better. To start, you can invoke the “caption” package in your preamble.


Now, that won’t do anything yet; we need to tell it what to do. First: bold labels. Modify the line above:


As you see, a couple options we have are “font=” and “labelfont=”. In this case, I prefer normal size font (relative to that defined in your document class) but you can set it to any standard font size family (small, footnotesize, scriptsize, tiny, etc.) The other option lets us specify our font family – in this case, boldface (bf). Note that you can combine font families here; to make labels bold and italic, use labelfont={bf,it}, etc. Here’s how it looks in bold:


For more details on these formatting options, see Peter Yu’s helpful tutorial.

But wait – there’s more! The figure is still not quite right. Do you see it? The spacing between the x-axis and its label (pH) is smaller than the distance between the x-axis label and the caption! This can be visually jarring in an otherwise symmetrically spaced page, and depending on other settings in your preamble, can put the caption closer to the body text than the figure to which it is actually referring! (Why the typesetting engine would not safeguard against this in the first place is beyond me.)

Fortunately, the caption package gives us access to more sophisticated parameters (per the CTAN documentation) so we can tighten things up a bit. In your preamble (ideally immediately after your package declaration), put:


You can change the “skip=” parameter to whatever distance suits your document layout, and you can also apply the same modifications to table captions (change “figure” to “table” and, if your preference is to put your table captions above your table, as is proper, [ahem…], use “aboveskip=2pt” instead of  “skip=6pt”). Thus, we get a more visually pleasing spacing:


And for tables:


Hope this helps, and happy typesetting!


LaTeX output in GNUplot

As I began to prepare to write my Ph.D. dissertation in graduate school, I became curious about better ways to create professional-looking graphs. Commonly used software like Microsoft Excel and Apple’s Numbers (then part of their iWork suite) made it easy to enter and visualize data, but both programs produced a product that looked amateurish, tacky, and disjointed, and lacked powerful customization tools that I imagined I might want to employ. Both had an antiseptic quality to them, making them more suitable for a “big picture,” flashy business/sales presentation than for a detailed, nit-and-grit scientific publication. Moreover, I wanted every single graph – I expected to have over 75 in my dissertation – to have a congruent, homogenous look and feel, including font, style, resolution, and size. In my view, a Ph.D. dissertation was the crowning document of my education, and I wanted it to be absolutely, unequivocally, perfect.

I asked our lab coordinator – a seasoned Linux/UNIX veteran who preferred emacs over Word – if he had any ideas.

“GNUplot,” he said. So off to Google I went.

The first results I found were pages and pages of examples, and the plots were exactly what I had in mind: detailed, highly technical, and infinitely customizable, but also polished and of a high quality. Even better was that plots could be generated by a shell script that reads the data from an external file! This meant that all the parameters for the graph – dimensions, color, font, point size and style, line thickness and style, axes, ranges, etc. – could be defined in a single shell script. To make multiple graphs, I would only need to point the script at a different data set. Output could be set to PNG, JPG, and – even better – PostScript/EPS or PDF. This was SO COOL.

As I started experimenting with data I/O and various settings, I came across an unexpected output option: “epslatex”. I was planning on using LaTeX to typeset my dissertation because of its powerful equation, Bibliography, and cross-referencing capabilities (its native PDF output and its easy inclusion of PDF figures made GNUplot’s PDF output even more attractive), so my curiosity about GNUplot’s epslatex output option was piqued.

Here’s how it works:

  1. Create a plain text file that will be your GNUplot generator script. Include the following lines:
    gnuplot << TOEND
    set terminal epslatex size 3.5,2.62 color colortext
    set output 'graph1.tex'
    set grid xtics ytics
    set mxtics
    set mytics
    set xzeroaxis linetype 1 lc 0 linewidth 2.5
    set yzeroaxis linetype 1 lc 0 linewidth 2.5
    set xrange [-10:10]
    set yrange [-10:10]
    set xlabel "x label" 
    set ylabel "y label"
    set bmargin 2
    plot "data.dat" using 1:2 notitle w linespoints pointtype 5 linetype 1 lc 7 lw 3
  2. Go ahead and save that file (call it “plotcmd” or something of that sort).
  3. Place the data you wish to plot in columns (here I am using two columns for (x,y) coordinates) and save it in a plain text file called “data.dat” (or whatever you choose – just make sure it matches the script above in the 2nd-to-last line).
  4. Execute the script.

The script will output two files: graph1.tex and graph1.eps. These two files are very important! Upon inspection, you will find that the EPS file contains the graph only – no text, numbers, or labels. The tex file contains “instructions” on how to place the graphics. There are 3 important lines therein. They will look something like this:

  • \put(176,1973){\rotatebox{-270}{\makebox(0,0){\strut{}y-label}}}%
  • \put(2860,-110){\makebox(0,0){\strut{}x-label}}%
  • \put(0,0){\includegraphics{graph1}}%

As you can probably deduce, the first line places your y-axis label in your figure, the second line places your x-axis label in your figure, and the third line places the graph itself (the EPS file). If you had specified any other names for your labels in your script, they would be reflected here. Note that at this point, you can modify these labels using any LaTeX markup you desire, such as math, phonetic, or Greek characters! The \put() macro specifies (x,y) coordinates of the objects, where (0,0) is the bottom-left corner of your figure. If you want to tinker with their placement or fine-tune their appearance in some way, you can do so here.

To place your figure in your main document, use the graphicx (or similar) package and put in the body of your document:

\caption{Your caption here.}

And you’re done!

Now, that may seem like a lot of work – and it is! (Hence this post going in our “Not So Lazy” category…) but consider the benefits of using this method for your graphs:

  1. Homogenous size and styling – once you decide on a preferred look/style, you can use the exact same script on all your data sets. Just update your x– and y-ranges and axis labels and you’re good to go. Consistency and attention to detail separate the good from the great and the great from the best.
  2. Native language – all text on your graph will match the text in the body of your document, including proper fonts, mathematics, symbols, kerning, ligatures, etc., because it’s all LaTeX anyway!
  3. High resolution – with EPS/PDF vector images, your graphs will look as pristine as your document text in print and in all resolutions.
  4. Professional – these graphs don’t look like they were made by your 6th-grade computer teacher (sorry, Mrs. Tuinstra…). Readers will (or, they should) notice.
  5. No corrupt files – like LaTeX, the important material (your data and the script to plot it) is stored in plain text. No longer do you risk a software crash corrupting your precious data in Excel or figures in Word.
  6. Limitless functionality in GNUplot – we have barely scratched the surface of GNUplot’s functionality. If you need to represent your data in any imaginable way, GNUplot can do it.

Speaking of functionality in GNUplot, let me further explain each line in our script:

set terminal epslatex size 3.5,2.62 color colortext
% this instructs gnuplot to use EPS+LaTeX output, create a plot with 4:3 dimensions (3" width, 2.62" height), 
% and includes the ability for color output (more below). you can change the size to whatever you prefer.

set output 'graph1.tex'
% this sets the filename of the tex file, as well as the name of the EPS file referenced therein.
% if you change the name of the EPS file, its name needs to be updated within the tex file.

set grid xtics ytics
% this sets the option for gridlines on the graph along the x- and y-tics.
% this is my personal preference and can be excluded if you choose.

set mxtics
set mytics
% these set the option for "mini-tics" or "mid-tics", to include smaller tic increments between the major tic marks.
% this is my personal preference and can be excluded if you choose.

set xzeroaxis linetype 1 lc 0 linewidth 2.5
set yzeroaxis linetype 1 lc 0 linewidth 2.5
% GNUplot does not, by default, visually differentiate the x- and y- zero axes from any other gridline. 
% for purposes of visual orientation, I like to include the x- and y- zero axes on my plots.
% here, linetype 1 = solid line, lc 0 = black linecolor, and linewidth = 2.5x standard width.
% this is my personal preference and can be excluded if you choose.

set xrange [-10:10]
set yrange [-10:10]
% these specify your x- and y-ranges. GNUplot will determine these automatically if they are excluded.
% feel free to modify these per your specific needs.

set xlabel "x label" 
set ylabel "y label"
% these specify your x- and y-axis labels. you can modify these here, or keep them as 
% placeholders and modify them in the resultant tex file.

set bmargin 2
% this specifies a smaller than usual bottom margin. I prefer it to keep less space 
% between figure and body text. this is my personal preference and can be modified/excluded if you choose.

plot "data.dat" using 1:2 notitle w linespoints pointtype 5 linetype 1 lc 7 lw 3
% this is where the magic happens. this tells GNUplot to plot the data in our file (data.dat)
% using column 1 as "x" and column 2 as "y". (a 3-column datafile could accommodate plots as 1:2, 1:3. 3:1, 2:3, etc.) cool!
% notitle means not to include a title for the data set (as in a key). you could change this to "title 'data set 1'", for instance.
% linespoints means to plot data points connected with lines. you could use "points" instead.
% see the GNUplot manual for options for pointtype, lc, lw, etc.

An example of epslatex output is below.


Good luck and happy typesetting! Feedback is welcome!

Put section numbers in the margins of your LaTeX reports

For the first installment of our LaTeX mini-series, I thought I’d share one neat little trick involving a tweak to the layout of section headers in LaTeX documents: putting the section number counter in the margin.

This will create the illusion of a “hanging” paragraph below each header. Just add this snippet to your preamble:

\makeatletter\def\@seccntformat#1{\llap{\csname the#1\endcsname\quad}}\makeatother

The effect should look something like this:

As a side note: margins are a sore spot for me in my LaTeX career. I appreciate the default margin-width algorithms, which try to maximize readability by restricting the number of characters per line to ~60. But for many practical layout purposes, smaller margins are desirable. Nevertheless,when wider margins are in use, I particularly like to employ this design aspect in order to balance out the distribution of text on the page without impacting the readability of the paragraphs.

LaTeX mini-series coming soon!

In the coming days and weeks, I’ll be sharing more of my favorite tips and tricks to make your LaTeX documents even more functional and beautiful. I hope to focus on aspects of document and macro design and usage, as well as formatting and aesthetic tricks that I like to employ in my own papers. 

Remember, I am not a LaTeX expert! Indeed, most of these tricks have been discovered on internet message boards or in documents shared by others over the years—and many have been modified, tweaked, or re-invented for my own specific use and taste. 

Happy typesetting!

ACS formatting using NatBib in LaTeX

I recently came up with a clean and efficient way of managing references/citations using LaTeX, my favorite typesetting system for scientific writing. In my documents I prefer to use endnote-style references with ACS-style citations. To make things a bit easier, I also use a little macro to automatically format my citations. Here’s what I’m doing:

In your document preamble, invoke the natbib package using the super, comma, and sort&compess options.

\usepackage[super, comma, sort&compress]{natbib}

These settings force superscript numeration and comma separation in the text, and the sort&compress option condenses multiple references. For example:

The world is flat.6,7,10-13,16

Additionally, I tweak the following commands:


This uses a standard list for numbering in your endnotes, and titles the bibliography section References.

Finally, I use a macro to automatically format my references.

\providecommand*\myref[6]{\bibitem{#1}#2 \textit{#3} \textbf{#4}, \textit{#5}, #6.}

This macro, called \myref, accepts 6 arguments and encapsulates the standard \bibitem command. The arguments are:

#1 – label
#2 – authors
#3 – journal
#4 – year
#5 – volume
#6 – pages

It is used as such:

\myref{anders2009}{Anders, B.}{J. Am. Chem. Soc.}{2009}{8}{1223--1230}

In the body of your LaTeX document, the text should look like this:

The world is flat. \cite{anders2009, george2001, thompson1999}

Then to create your bibliography section,

\myref{anders2009}{Anders, B.}{J. Am. Chem. Soc.}{2009}{8}{1223--1230}
\myref{george2001}{George, K. L.}{Nature}{2001}{33}{8774--8775}
\myref{thompson1999}{Thompson, A.}{Science}{1999}{9}{900--954}

This will produce the following output (note that the citations are completely fictitious):

The world is flat.1-3


1. Anders, B. J. Am. Chem. Soc. 2009, 8, 1223-1230.
2. George, K. L. Nature 2001, 33, 8774-8775.
3. Thompson, A. Science 1999, 9, 900-954.

Keep on typesetting!

Removing duplicates in unsorted lists

In a recent project I was working on in my spare time, I came across the need to remove duplicate entries from an unsorted list of data.

Under circumstances when the data can be sorted, removing duplicates is fairly easy: one simply uses the “sort” command, followed by the “uniq” command. For example, if data.txt contains the following unsorted list, duplicates can be removed in this manner:


cat data.txt | sort | uniq



Easy enough, right?

For this particular project, however, I did not want to sort the data first; each entry was added in chronological order along with a timestamp, but in this particular case the timestamp had to be stripped out ahead of time. Therefore, sorting was not an option. Furthermore, the “uniq” command only removes duplicate entries that are adjacent to one another (which is why the “sort” command has to be invoked before passing the data to “uniq”).

The solution: a particularly nasty sed one-liner, which is probably not worth deciphering at this point. Suffice it to say that it creates a buffer in memory and compares each new entry to existing entries in the buffer; if the entry already exists in the buffer, then it is not added; if the entry does exist in the buffer, then it is added to the end of the buffer. Once the input has ended, the buffer is output.

Here it is:

cat data.txt | /bin/sed -n ‘G; s/\n/&&/; /^\([ -~]*\n\).*\n\1/d; s/\n//; h; P’



Because this relies somewhat heavily on memory, it could be problematic when used on humongous data sets, so keep that in mind. But, these days, I don’t imagine that would ever be much of a problem.