# ACS formatting using NatBib in LaTeX

I recently came up with a clean and efficient way of managing references/citations using LaTeX, my favorite typesetting system for scientific writing. In my documents I prefer to use endnote-style references with ACS-style citations. To make things a bit easier, I also use a little macro to automatically format my citations. Here’s what I’m doing:

In your document preamble, invoke the natbib package using the super, comma, and sort&compess options.

\usepackage[super, comma, sort&compress]{natbib}

These settings force superscript numeration and comma separation in the text, and the sort&compress option condenses multiple references. For example:

The world is flat.6,7,10-13,16

Additionally, I tweak the following commands:

\renewcommand{\bibnumfmt}[1]{#1.}
\renewcommand{\bibname}{References}

This uses a standard list for numbering in your endnotes, and titles the bibliography section References.

Finally, I use a macro to automatically format my references.

\providecommand*\myref[6]{\bibitem{#1}#2 \textit{#3} \textbf{#4}, \textit{#5}, #6.}

This macro, called \myref, accepts 6 arguments and encapsulates the standard \bibitem command. The arguments are:

#1 – label
#2 – authors
#3 – journal
#4 – year
#5 – volume
#6 – pages

It is used as such:

\myref{anders2009}{Anders, B.}{J. Am. Chem. Soc.}{2009}{8}{1223--1230}

In the body of your LaTeX document, the text should look like this:

The world is flat. \cite{anders2009, george2001, thompson1999}

Then to create your bibliography section,

\newpage
\bibliographystyle{plainnat}
\begin{thebibliography}{200}
\myref{anders2009}{Anders, B.}{J. Am. Chem. Soc.}{2009}{8}{1223--1230}
\myref{george2001}{George, K. L.}{Nature}{2001}{33}{8774--8775}
\myref{thompson1999}{Thompson, A.}{Science}{1999}{9}{900--954}
\end{thebibliography}

This will produce the following output (note that the citations are completely fictitious):

The world is flat.1-3

References

1. Anders, B. J. Am. Chem. Soc. 2009, 8, 1223-1230.
2. George, K. L. Nature 2001, 33, 8774-8775.
3. Thompson, A. Science 1999, 9, 900-954.

Keep on typesetting!

# Removing duplicates in unsorted lists

In a recent project I was working on in my spare time, I came across the need to remove duplicate entries from an unsorted list of data.

Under circumstances when the data can be sorted, removing duplicates is fairly easy: one simply uses the “sort” command, followed by the “uniq” command. For example, if data.txt contains the following unsorted list, duplicates can be removed in this manner:

data.txt
——–
12
7
6
9
6
7
12
13

cat data.txt | sort | uniq

output:

6
7
9
12
13

Easy enough, right?

For this particular project, however, I did not want to sort the data first; each entry was added in chronological order along with a timestamp, but in this particular case the timestamp had to be stripped out ahead of time. Therefore, sorting was not an option. Furthermore, the “uniq” command only removes duplicate entries that are adjacent to one another (which is why the “sort” command has to be invoked before passing the data to “uniq”).

The solution: a particularly nasty sed one-liner, which is probably not worth deciphering at this point. Suffice it to say that it creates a buffer in memory and compares each new entry to existing entries in the buffer; if the entry already exists in the buffer, then it is not added; if the entry does exist in the buffer, then it is added to the end of the buffer. Once the input has ended, the buffer is output.

Here it is:

cat data.txt | /bin/sed -n ‘G; s/\n/&&/; /^$$[ -~]*\n$$.*\n\1/d; s/\n//; h; P’

output:

12
7
6
9
13

Because this relies somewhat heavily on memory, it could be problematic when used on humongous data sets, so keep that in mind. But, these days, I don’t imagine that would ever be much of a problem.