“Hash” values are a useful tool for the examination, discovery and authentication of electronic evidence
Electronic evidence is becoming more commonplace in civil and criminal cases. In December 2006, the Federal Rules of Civil Procedure were amended to establish new standards concerning the preservation and discovery of electronically stored information (ESI). As part of a continuing series reviewing electronic evidence issues, this posting notes how “hash” values (or hash algorithms) are used as an important tool in examining, discovering and authenticating electronic evidence.
What is a “hash value”? A judicial guide defines “hash value” as:
“A unique numerical identifier that can be assigned to a file, a group of files, or a portion of a file, based on a standard mathematical algorithm applied to the characteristics of the data set. The most commonly used algorithms, known as MD5 and SHA, will generate numerical values so distinctive that the chance that any two data sets will have the same hash value, no matter how similar they appear, is less than one in one billion. ‘Hashing’ is used to guarantee the authenticity of an original data set and can be used as a digital equivalent of the Bates stamp used in paper document production.”“Managing Discovery of Electronic Information: A Pocket Guide for Judges,” Federal Judicial Center, at 24 (2007).
What is unique about a hash value? One commentator has noted the distinctive qualities of a hash value (or algorithm):
“The range of values generated from commonly used hash algorithms is huge. For example, the prolific algorithm MD-5 can generate more than 340,000,000,000,000,000,000,000,000,000,000,000,000 (that’s 340 billion, billion, billion, billion) possible values. The widely used SHA-1 algorithm generates a range of values over four billion times larger than that. Thus, although there is a finite number of possible hash values and an infinite number of possible data inputs, the odds of a collision are infinitesimally small.”Richard P. Salgado, “Fourth Amendment Search And The Power Of The Hash,” 119 HARV. L. REV. F. 38, 39 n.6 (2006).
Hash values are used during different phases concerning electronic evidence. First, in the computer forensic examination process, a hash value is used to ensure that the examined copy has not been altered. A hash value will be taken of the original hard drive. Under accepted protocols, an image is made of the original. The image is used during the forensic examination to preserve the integrity of the original. A hash value is taken of the imaged copy before any examination. If the values are the same, then the copy is treated the same as the original. If the values are different, then the integrity of the copy is called into question. At the end of the forensic examination, a third value is commonly taken. The three hash values (original hard drive, imaged hard drive before the examination, and imaged hard drive after the examination) must match. See also Salgado, 119 HARV. L. REV. F. at 46 (“The hash algorithm has afforded digital media forensic analysis a highly reliable and efficient means to ensure that the integrity of the digital evidence collected remains uncompromised. It also provides a means to discard from the examination the irrelevant, and focus in on the important, while exposing little, if any, ancillary information.”).
Second, hash values can be used to authenticate evidence introduced in court, under FRE 901. There are few published decisions discussing the role of “hash value.” One opinion noted that a “hash value” (or hash algorithm) may be used to authenticate an electronic document by distinctive means. See Lorraine v. Markel American Ins. Co., 241 F.R.D. 534, 546-47 (D. Md. 2007) (“Hash values can be inserted into original electronic documents when they are created to provide them with distinctive characteristics that will permit their authentication under Rule 901(b)(4).”).
Third, hash values may be used during the discovery process. For example, some courts have recently referred to hash values in their discovery protocol as an issue to discuss at the FED. R. CIV. P. 26(f) Conference:
“Because identifying information may not be placed on ESI as easily as batesstamping paper documents, methods of identifying pages or segments of ESI produced in discovery should be discussed, and, specifically, and without limitation, the following alternatives may be considered by the parties: electronically paginating Native File ESI pursuant to a stipulated agreement that the alteration does not affect admissibility; renaming Native Files using bates-type numbering systems, e.g., ABC0001, ABC0002, ABC0003, with some method of referring to unnumbered “pages” within each file; using software that produces “hash marks” or “hash values” for each Native File; placing pagination on Static Images; or any other practicable method. The parties are encouraged to discuss the use of a digital notary for producing Native Files.”“Suggested Protocol for Discovery of Electronically Stored Information (“ESI”),” U.S. District Court for the District of Maryland, at 20-21.
These are just a few brief examples of the role of hash values serve concerning electronic evidence. Are you aware of recent cases using the hash value for electronic evidence? If so, the editors would be interested in learning about the role the hash value served in the case.
Subscribe Now To The Federal Evidence Review
** Less Than $25 Per Month ** Limited Time Offer **