Breaking down the art of e-mail discovery

By Ian Fisher (2L)

Jason R. Baron just received more than 200 million e-mails.

Jason R. Baron

Jason R. Baron

But that’s nothing compared to what someone in his position could get in eight years.

Baron, the director of litigation for the National Archives and Records Administration, spoke to students on Feb. 26 in UF Law’s groundbreaking class on e-discovery, taught by adjunct professors William Hamilton (JD 83) and Ralph Losey (JD 79). The moment President George W. Bush’s term ended, Baron’s office took possession of all of the e-mails that went through the White House in Bush’s eight years. Baron expects a lot more from the Obama administration.

“What I’ve estimated in my law review article, is that whoever was the next president — I didn’t know it was President Obama at the time — but now President Obama, if he lasts two terms, at the end of eight years, he will have generated, at the rate that we’re going, a billion e-mails.”

Although Baron is planning to retire in two years, he acknowledges e-mail discovery in a modern trial is likely to be a logistical nightmare for his office if it had to go through a billion White House e-mails for litigation.

Baron should know. He and his office were involved in the United States v. Philip Morris, a multi-billion dollar case. In the case, Baron was responsible for searching more than 20 million e-mails from the Clinton administration as well as 50 years of tobacco-related documents. To do this, he used 12 keywords to search all of the e-mails, narrowing the number to 200,000.

After that, 25 lawyers took six months going through every e-mail to determine which were relevant. They determined about 100,000 were relevant and produced about 80,000. Only a few were ever introduced at trial, which is troublesome according to Baron.

“The natural inclination is to figure out a bunch of keywords that you can then go query your own client’s database or think of keywords to propound to the other side,” Baron said. “That’s not wrong. I guess my proposition is that it’s a little naïve to think that 12 keywords are going to reliably and efficiently get the relevant evidence that’s in a haystack in a giant collection, like White House e-mail, of 20 million documents.”

Baron gave one example of the many problems that arose with keyword searching: when Marlboro was searched, many e-mails with Upper Marlboro, a city in Maryland, came up.

Because of these and other problems with information retrieval, Baron got involved with the Text Retrieval Conference, or TREC, which is operated by the National Institute of Standards and Technology. TREC’s goal is to promote research into the science of information retrieval.

Until TREC, Baron said only one study had been done on lawyers finding relevant documents. In that study, there were 350,000 pages of 40,000 documents. Lawyers estimated that they found 75 percent of the relevant documents, however a research team found that the lawyers only identified about 20 percent of the relevant documents.

Many software companies are trying to solve these search problems with new programs that promise more efficient searches, but Baron said it is unclear which expensive program to buy and whether they actually work as promised. With all of these e-discovery issues, Baron recommended that students really learn the area because knowledge of it will give them a head start in a rough job market.

“We are just at the beginning, sort of the dawn of some new paradigm in the law,” Baron
said. “There is something happening out there, something different — and you can feel it.”