Questions About the Collection Phase of Ediscovery? We’ve Got Answers

Collection is part of the second stage of the ediscovery pipeline, in which data that may be pertinent or relevant to a civil litigation matter is collected for preservation and/or discovery. Subsequent stages may include data processing, document review, analysis, and production.  

Collection can be technologically complex due to the wide variety of data types that may need to be collected, such as emails, spreadsheets, text messages, voicemails, and more. Depending on the volume of data involved in a case and any encryption or archival methods employed, collection can range from quick and easy to slow and painful. While legal staff can handle collection in most straightforward cases, more complicated data should generally be collected in cooperation with the IT department to ensure that data and metadata are not inadvertently modified. Employee self-collection may be particularly problematic, since the mere act of accessing a file can modify its metadata  — opening the door to an accusation of data spoliation.

It’s often worth asking whether you even need to collect data or whether you can simply preserve it in place. If you can be assured that data won’t be modified or lost, then preserving in place may be more cost-effective. However, the more serious the case and the more expedited the timeline, the sooner you should begin collecting data so that it’s available when it’s needed for later stages of ediscovery.

To that end, bear in mind that relevant ESI will at some point make its way through the remainder of the ediscovery pipeline. If the matter goes to litigation, you or your partners, which may include an external law firm, will need to be able to access critical discoverable data to complete processing, review, analysis, and production. Collecting this data into a secure repository ensures that it is available, that it can be accessed only by authorized personnel, and that it is shielded from modification or deletion.

Collection, as with all of ediscovery, is not without its own risks. As with preservation, the goal with collection is the “Goldilocks” of gathering just enough data. First, there’s the risk of collecting too much irrelevant data. This results not only in expensive and unnecessary data storage, but also inflated costs for later processing and review. This over-collection risk is particularly likely when the IT department is tasked with collecting discoverable data without sufficient guidance from legal. To avoid complicated judgment calls, IT often indiscriminately collects entire hard drives or email accounts without parsing their contents.

On the other hand, under-collecting data is risky to the extent that preservation might be imperfect. If data that should have been preserved is somehow lost or modified before it can be collected, spoliation sanctions are possible.

To offset these risks, do your best to define the scope of production early on in a matter. Consider using early case assessment to develop targeted and tiered data collection strategies. With these approaches, you collect the least amount of data necessary at each juncture, while employing strong data preservation practices to ensure that data that hasn’t yet been collected will be available later if needed.

While collection can be complex, in the majority of garden-variety cases, it’s straightforward. By working in cooperation with your IT department and trusted vendors and delineating a reasonable scope for discovery at the outset, you can avoid common risks while getting the most from your collection efforts.

Glossary definition

Collection, part of the second stage of the ediscovery pipeline, is where data marked for preservation is collected into a secure repository. That collected data is then available for later processing, review, and production, but safe from deletion or modification, either intentional or accidental.