Image of one hundred bill burning “If your clients don’t have a records management system, they may as well take their money out into the parking lot and set it on fire.”

– Former U.S. District Court Magistrate Judge John Facciola

We all know that ediscovery is expensive, and various research reports have so confirmed. The definitive Rand study, Where the Money Goes: Understanding Litigant Expenditures for Producing Electronic Discovery, found that median costs for collection, processing, and review are $17,507 per gigabyte (roughly 3,500 documents or 10,000 e-mails).  The math is not pretty – a case involving 482 GBs of source data could exceed $8 million in ediscovery costs.

And on top of that are preservation costs. The  Preservation Costs Survey demonstrated that large companies incur significant fixed costs for preservation (for in-house ediscovery personnel and also for procurement and maintenance of legal hold management and data preservation technology systems), averaging $2.5 million annually.  More significant is the cost of employee time lost in complying with legal holds.  While companies with up to 10,000 employees incur the average time cost of over $428,000 per year, costs for the largest companies exceed $38 million per year.

There is indeed great complexity in how to cost-effectively process huge amounts of data through the ediscovery funnel. Tighter management of ediscovery processes continues to be important.

But as we ponder how to cut costs, let’s not confuse symptoms with causes:

What if the primary driver of ediscovery cost is not how we manage the collection-processing-review-production funnel, but is instead the breathtaking volumes of data we cram into the ediscovery funnel in the first place?

Sure, over-preservation is indeed a contributor to ediscovery cost. For example, Microsoft disclosed in testimony to the Judicial Conference’s Advisory Committee on Civil Rules (in advance of the 2015 FRCP amendments) that it preserved nearly 700 times the data it eventually produced in litigation.

But there’s more going on here than the Legal Department’s concerns about spoliation exposures. Simply put, we retain way too much data, without any legal requirement or business need.  And once the preservation duty arises, the ediscovery volume damage is already done.

The damage arises most commonly with data sources that are largely uncontrolled. Participants in the Preservation Costs Survey reported that, regardless of company size, their most difficult and burdensome data sources for preservation are email and hard drives, followed by legacy data.  The common denominator?  A lack of effective governance for unstructured business data, which allows rampant over-retention, which in turn, unsurprisingly, results in over-preservation.  And on it goes.

As presaged in the 2015 Annual Report of the Information Governance Initiative:

Our inattention to information is a creeping disaster in an age where we can no longer feel the weight of information, but we completely rely on it for our success and even our survival. Silicon Valley is an enabler, with business models that allow—and even depend upon—our atavistic fear of throwing things away by making information storage (and its costs) all but invisible.

Whether visible or not, the costs are there. And while a price will be paid later, during litigation, the damage was done long before, when we kept information far beyond any compliance or business need.  We should stop doing that to ourselves.