Blown To Bits

Is Computing a Hash a Search Under the Constitution?

Sunday, November 2nd, 2008 by Harry Lewis

Talk about cases the Founding Fathers could not have anticipated. A federal court has ruled that computing the hash of a data file (a picture, for example) is a search, and is therefore subject to Fourth Amendment restrictions (that is, the police are supposed to get a search warrant before doing it).

What’s a hash? Hashing is a way of squeezing a lot of data down into a few bits. The same input will always give you the same output (which is called the hash, or the hash value). But because some information is inevitably thrown away in the squeezing process, it’s possible (in general) for two different inputs to give you the same output. The trick in the design of hashing algorithms is to make that unlikely.

Let’s take an example. Suppose we want to check to see if the photograph we have is one of a list of bad photographs (known child pornography, for example). Just storing all the photos on the bad list would take a huge amount of space. But we could hash each of them and just store the hash values. Then we could check our suspect photo against the list of bad photos by computing its hash and seeing if that value was in the list of hash values of bad photos. That check would be quick. Of course, if we got a match, before we arrested anyone, we’d want to compare the photos themselves just to make sure we hadn’t gotten an accidental “collision” where two photos happened to have the same hash.

A simple example of a hashing algorithm would be to treat the image as a sequence of 24-bit numbers and just add them all up, throwing away any numerical overflows. (Like doing arithmetic and just hanging onto the rightmost digits.)

Here’s how Arstechnica reports the relation of all this to the situation of one Robert Crist.

Crist had fallen behind on his rent, and his landlord hired a father-and-son pair to move the delinquent tenant’s belongings out to the curb, where a friend of one of the movers, Seth Hipple, picked up Crist’s computer. When Crist returned home, he began freaking out over his vanished machine‚Äîwhile Hipple was freaking out over what he’d found in a folder on the hard drive: Videos appearing to depict underage sex, which he promptly deleted.

Hipple called the East Pennsboro Township Police Department, and though the computer had been reported stolen, it soon found its way to the Pennsylvania Attorney General’s Office, where special agent David Buckwash made an image of the hard drive and began sifting through its contents using a specialized forensics program called¬†EnCase. Rather than directly examining the contents of the hard drive, Buckwash initially ran the imaged files through an MD5 hash algorithm, producing a unique (for practical purposes) digital fingerprint, or hash value, for each one. He then compared these smaller hash values with a database of the hash values of known and suspected child porn, maintained by the¬†National Center for Missing and Exploited Children. He came up with five definite hits and 171 videos containing “suspected” child porn. He then moved to gallery view, inspecting all the photos on the drive, and ultimately finding nearly 1,600 images that appeared to be child pornography.

No warrant had been sought to do any of this, however, and the judge threw out the evidence gathered from Crist’s computer as a result.

The government is likely to appeal, and a lot rides on the case. If, for example, the ruling is overturned and hashing isn’t a search, then the government would not need a warrant to go to your service provider’s central servers and hash every file, looking for illegal materials.

Comments are closed.