Thursday, February 3, 2011

Total Paper Domination

It started, as so many large undertakings do, with a simple problem. I wanted to look up an experimental parameter, and I knew I had seen a paper doing exactly the experiment I wanted to copy.

Could I find that paper? Of course not.

Could I remember that detail? Can Miss Outlier EVER remember a detail?

Clearly there was a problem. I looked for a printed copy of the paper, no luck. In an email? Not so much. Somewhere buried in a folder on my hard drive? Probably.

Before I knew it, I was rounding up ALL the papers I have on my computer - stashed in various folders, "Remember This For PhD," "Related to Useful Experiments," "PhD Exploration" - that have since lost their meaning. I also have a stash of old papers, sort of a lab historical file, that was given to me in a huge zip file from the most senior grad student when I joined the lab. I've always meant to see what was in there, because it would look terribly bad if I forgot to cite one of my own lab's papers. (I did find the paper I was looking for, by the way, it happened to be on my desktop. But at this stage, that was entirely beside the point.)

And then the project got even more involved - because you see, I've always meant to get everything organized in a proper bibliography management software. For my Master's thesis, my strategy was pretty much to find all the papers I could, print them out in a binder, and then refer to them as I wrote Chapter 1: Introduction and Background. I put all the citations for the papers in a BibTeX file, so I could reference them as I wrote the thesis in LaTeX with a Lyx overlay. (If you don't know what that means, it couldn't possibly interest you anyway, don't worry.) So now that I was rounding up all my papers, it was the perfect time to choose a reference software and do things RIGHT.

This, by the way, is classic nerd behavior. I adore Rands in Repose - if you are a nerd, or must deal with one on a daily basis, he is a must read. I'd like to share an excerpt which is relevant in this case:
Chasing the Two Highs

The First High: When the nerd sees a knot, they want to unravel it. After each Christmas, someone screws up the Christmas tree lights. They remove the lights from the tree and carefully fold the lights as they lay them in the box. Mysteriously, somewhere between last year’s folding and this year’s Joy of Finding the Lights, these lights become a knotted mess.

The process of unknotting the lights is a seemingly haphazard one — you sit on the floor swearing and slowly pulling a single green cable through a mess of wires and lights and feeling like you’re making no progress — until you do. There’s a magical moment when the knot feels solved. There’s still a knot in front of you, but it’s collapsing on itself and unencumbered wire is just spilling out of it.

This mental achievement is the first nerd high. It’s the liberating moment when we suddenly understand the problem, but right behind that that solution is something greater. It’s….

The Second High: Complete knot domination. The world is full of knots and untying each has its own unique high. Your nerd spends a good portion of their day busily untying these knots, whether it’s that subtle tweak to a mail filter that allows them to parse their mail faster, or the 30 seconds they spend tweaking the font size in their favorite editor to achieve perfect readability. This constant removal of friction is satisfying, but eventually they’ll ask, “What’s with all the fucking knots?” and attack.

A switch flips when your nerd drops into this mode. They’re no longer trying to unravel the knot, they want to understand why all knots exist. They have a razor focus on a complete understanding of the system that is currently pissing them off and they use this understanding to build a completely knot-free product - this is the Second High.

Finding the paper on the desktop was the First High. But I was on to the Second High - complete paper domination.

I was going to need a system. I had a choice - what software to use? Actually I already had a software, because as I said, I've always meant to organize things. I use Papers - which you have to pay a small amount for, but you get a student price and it is AMAZING. There are a bunch of other options, of course. Here at World's Best School they like you to use RefWorks or EndNote, and a lot of people do just fine with those. Zotero is also useful, and has nice browser plug-ins, and JabRef is used by the open-source crowd.

But I've always felt that what I really want the software to do is keep track of the actual .pdf FILES, not so much the citations. I only need to cite things occasionally when I write articles, but I'd like to have all the information in those .pdfs available for me to easily find on a regular basis. The software Papers does just that - it's first and foremost an organization for your files. (I've been told Mendeley works much the same way, and is free.) It acts like iTunes, where you have all your files in the main library, and then you can make "Playlists," sort of, where you group the papers. So I have a bunch of playlists, for things like People Doing Manufacturing, and People Who Did Cool Stuff But Only Once, and Same Material As Me, and Same Process As Me. And any single journal article can be in as many playlists as you want, without making a physical copy of the file to put in another folder in the hard drive. I also make Smart Playlists, that automatically add files - for instance, Papers My LabMates Wrote - based on authorship in that case.

And of course Papers also ties in to LaTeX, in whatever IEEE or other proper format, so the citation process is quick and painless when I need it.

But the killer feature? Search, baby, search. Because all those files are .pdfs, I can search for a term, and find every single paper I have with that term in it. Even if it's not in the title. Awesome. Especially for papers titled things like, "Microfluidics: A Review". Um, gees, I might need a little more detail.

And I can take notes on each paper, like, "Put in Section 3" and those are searchable too. So when I go to write Section 3 of my article, I can look up all the papers I wanted to include. Or, ahem, "Experimental Parameter for Temperature," so when I go to do experiments, I might be able to find what I need...

So, dear readers, I am pleased to announce that a day a half later, I have gathered every paper I currently have in .pdf form, which is 632. I have populated the proper citation data for all 632 of those papers, and sorted them into meaningful groups (there are 200 that may be relevant to my PhD thesis). And at the end of all that, I did a search for the experimental parameter I was interested in.

And exactly one result popped up - the paper that had once been lost, and prompted the whole organization spree.

I have achieved that Second High, ladies and gentleman - total paper domination.

How do you keep your papers organized? Do you have, or does your lab dictate, a software of choice?


  1. You forgot to mention the vicarious nerd high, achieved by reading about another nerd's domination... This of course prompts some cognitive dissonance, as in, "Wow, it's REALLY nerdy when a post about ORGANIZING DATA makes me smile!"

  2. I have a Linux server running in the background. For the moment, it's getting built and configured to my tastes (latest task at the moment is figuring out how to build an extension for php to run script and apps on the _really powerful_ graphic card in the server).

    If I had to keep a stash of papers, I'd look at eprints ( but in the past, I've used mendeley with varying result (my fault, I change focus every day and I am an ethernal curious...)


  3. zotero has been doing all this (pdf indexing, search, tag cloud, notes, associated files, etc for at least a year and half). It also syncs your pdf attachments, so they are accesible anywhere, from a browser or another of your computers where you have zotero installed. Also great for when you dual boot xp/linux becuase it is obviously crossplatform, auto attachment, and download of files is nice. Oh yeah, did i mention automatic smart metadata retrieval for any pdf you drop into zotero. And there are plugins for ranking by citation, LyX integration, visual analysis, semantic assoociation, and god knows what else.

    A nerd that sees a knot where there is none, and pays for some randomness called Papers, must be a mac-fake-dummkopf. I snorted with disappointment at least 5 times while reading this, and am snorting with the fake wannabe nerdiness right now. Seriously lose that mac and stop paying for others to untie non-existent knots.

    Total domination all right, but Apple's not yours.