Lots of things in the news recently have been pushing me to do an article on cryptography. Unfortunately, cryptography is hard, and the longer I try to work on this, the longer the article gets. Meanwhile, I realize that at >1000 words, I still haven’t really explained anything useful and my readership probably gave up on me the third time I said algorithm.
So, here is some xkcd to distract you:
Instead, let me give you the shortest version, something you can use, even though cryptography is hard.
Encrypt everything*. Use “https” whenever possible, in fact, get https everywhere for your browser and leave it turned on. Encrypt your email. Encrypt your data at rest–especially if you don’t control the server it’s on. Encrypt your chats.
Unfortunately, most of those things require some work. Soon, I’ll do a howto on PKI (a toolset that will help you encrypt), where to get a PKI keypair and what to do with it.
I’d even recommend encrypting everything you have in Dropbox/iCloud/AmazonS3/etc.—beyond whatever encryption they are providing for you. I do, actually, trust that my data is safely encrypted by them from outside hackers. I’m less comfortable that someone inside my chosen cloud storage site won’t have access to my stuff. I’m even less comfortable that they won’t be compelled to turn over the data and/or encryption keys to a government at some point.
It used to be that there was too much data out there to worry about yours being compromised. Unless someone was trying to spy on you, there was a certain level of security by obscurity. Yes, technically, everyone’s (unencrypted) email is readable by everyone while it is “on the wire” in transit between email servers, but there was so freakin’ much of it out there that it was hardly worth the time to find, collect and analyze it. Chats, ftp, whatever, there was so much data it was effectively impossible to cast a dragnet unless you already knew what you were looking for and who was either sending or receiving it.
Lets take a look at email—there’s a good physical analogue, I can assume we’re all familiar with it and it’s a pretty good exemplar for many types of data.
Email is about as secure as a postcard. Anyone who has access to it can read it. Getting access to it just requires standing by a mailbox and waiting for a postcard to show up (or mugging the mailman). But who has time to stake out my mailbox waiting for some potentially interesting postcard to arrive? And who has the time to read all the damn postcards looking for the interesting one? Email scaled the same way. There were lots of haystacks to search when there might not even be a needle.
Several things have changed that paradigm.
1) Storage became cheap. There is a vanishingly small cost to store each email, so there is no need to worry about only saving the ones that I’m pretty sure are interesting. Think about Google; my current free quota is 15 GB. Multi-terabyte arrays can be had for just tens of thousands of dollars. A quick scan of my inbox suggests that the average email is in the 10-100 kilobyte range; let’s call it 50KB. That’s 20,000,000 emails per TB. While those costs do eventually add up, it’s peanuts compared to 10 or even 5 years ago.
2) Hadoop happened. MapReduce happened. NoSQL happened. GPGPU happened. It used to be that even if I had 20,000,000 emails, searching them was nearly impossible. Databases, indexes, search algorithms didn’t scale well to that many entities. Nor were multi-terabyte datasets manageable with most tools. That’s no longer true. Now enormous clustered datasets can be happily chewed on by clustered computation engines using simple, powerful algorithms and reasonably low-cost hardware. Searches that would have taken hours or days can be done, literally, in seconds. This is Big Data: it’s extremely powerful and, all hype aside, it does amazing things.
3) The USA PATRIOT Act happened. It compelled lots of ISPs and other providers to turn over the huge volumes of data (or metadata, not that there’s any difference) that are needed to do the analysis in the first place.
Now I don’t even have to stake out your mailbox; the USPS will send a copy of every postcard you get. I can get a huge, cheap warehouse to house all those postcards and I have thousands of hyperactive, caffeinated postcard readers with photographic memories to read them all. I need merely shout questions into the air and wait for an answer to be shouted back.
Unfortunately, even though collection, storage and search have gotten much, much easier for bad guys and governments alike, for the user, cryptography is hard, still. I’ll do what I can in later posts to make it a little easier.
*While I really do mean “everything,” this is a standard I can’t even live up to. Email cryptography, for instance, requires that both the sender and the receiver do something before an encrypted email can happen.