Paradux: a scheme to recover from maximum personal data disaster
The recent California wildfires got me thinking: if my house and my town burned down so quickly that I did not have time to grab even my laptop or document binder, how would I recover my data, my accounts and my passwords? I’m not sure I could. I wrote about that earlier.
Other calamities not involving natural disasters can have similarly catastrophic consequences: what if Google disables my account, and all my files on Google Drive and all my e-mail become inaccessible? How could I ever recover?
Or, my locally: what if my password manager dies and takes all stored passwords with it? Or I don’t remember the mast password. What then?
Here’s a scheme for how to recover from these disasters and others. I call it Paradux, from “Paradise Redux” — “Paradise” being the name of the California town that was burnt off the map recently. Let me explain; feedback appreciated!
First a few definitions, that makes the explanation easier:
- A “Data Stash” is a bunch of data, managed together, whose use I’d like to recover after the calamity. I can (and do) have many Data Stashes.
- The concept of a Data Stash lets me express which data lives and dies together, and which doesn’t: Data that is likely to become unavailable at the same time is part of the same Data Stash; data that likely stays available when some other data becomes unavailable is part of a different Data Stash.
- Examples for Data Stashes: all the files on the disk in my laptop; all the files in my Amazon S3 account; all the files in my Google Drive; the data on the stack of DVDs I mailed to my in-laws; the data on the SD Card my bank has in their safe deposit box.
- Data Stashes are comprised of data (like photos, music, e-mail, tax returns etc). They do not contain (Paradux-relevant) Metadata (see below).
- All Data Stashes are assumed to be access-controlled; somebody without the appropriate credentials cannot make use of the Data Stash. For files, that usually means encryption; for data in online-accounts, the account username and password might be sufficient.
- A “Metadata Stash” is a bunch of data (called “Metadata“), managed together, that describes where to find one or more Data Stashes, how to access the data in the Data Stash in clear text, and the relationship between Data Stashes.
- For example, an Amazon S3 Data Stash’s Metadata would contain the name of the S3 bucket, the Amazon user name and the corresponding password. If I have the Metadata, I can get at and use all the data in the Data Stash as before the calamity.
- For a USB Disk Data Stash, the Metadata is a descriptor of the disk (“blue, has a No Panic sticker on it”), and the credentials (such as the gpg key) to decrypt files on the disk.
- Metadata Stashes are always encrypted, using credentials that are different from those of any Data Stash (more below).
- Some Data Stashes are related to each other by being members of the same “Replica Set“. The idea for the Replica Set is that all members of the Replica Set contain the same data, and I can recover the data lost by one Data Stash in the Replica Set from any other Data Stash in the same Replica Set. Replica Sets are defined within the Metadata Stash.
- For example, if I back up my laptop to Amazon S3, and to an external USB disk, the Replica Set contains the laptop Data Stash, the Amazon S3 Data Stash, and the USB disk Data Stash.
- When I examine the Metadata Stash, having found the Metadata for the Data Stash that became unavailable during the calamity, I can now find the Metadata for one or more other Data Stashes (part of the same Replica Set) that have copies of the lost data from which I can restore.
- The more members in a Replica Set, the more likely I can recover my data. And the more expensive and clumsy the scheme :-)
As it is apparent, what we call Metadata Stash here is basically the information that a password manager maintains, with the exception of the management of Replica Sets. But that’s not critical because we can use some conventions, like the use of folders or tags provided by the password manager, to emulate. So think of your password manager as the manager of the Metadata Stash.
Before we get into how it works, let me document my assumptions:
- Any Data Stash can become unavailable at any point in time, and may never recover. The only way to recover is to recover from another Data Stash in the same Replica Set. This means that every piece of data, for this scheme, needs to have at least two copies that are unlikely to die at the same time (e.g, are stored on different continents, in different jurisdictions, using different storage technology and different credentials etc. How exactly you allocate Data Stashes in the same Replica Set is up to you.)
- Nothing in this scheme can have a single point of failure. For example, even if an on-line password management app should have an extremely redundant implementation across multiple continents, it is insufficient for our purposes because I might fail to remember how to log into the password management app, or they may suddenly go out of business. If either occurred, I could not recover from the calamity.
- I’d like the handle the case where I have amnesia and cannot recall anything. I still want to recover my data (after it occurs to me that perhaps I had some data …)
- This, however, can only work if I am willing to trust certain third parties. (If I can’t trust myself — I just turned amnesiac — somebody else must be trustworthy instead.)
- Obviously there are tradeoffs. If I am not willing to trust any third party, and I have amnesia, there is no way of recovering.
- However, I want to trust third parties as little as possible. Specifically, I want to make it hard for a single person, or a small group of conspirators, to access my data.
So this is how it works:
In normal operation:
- I have identified my primary Data Stashes, i.e. the places where I usually use my data (e.g. laptop disk, home server, cloud server etc.) I have recorded the Metadata for each of my Data Stashes in my Password Manager.
- The Password Manager stores its data (i.e. the Metadata Stash) on my local disk (or some other place), in encrypted form.
- Each of my Data Stashes is member of a Replica Set (e.g. simulated in the Password Manager using a folder) that has at least one other member. This other member is the Metadata for the Data Stash that contains the backup of my primary Data Stash. For example, if I notice that Data Stash “Living room PC” does not have a second member in its Replica Set, I will set up an automated backup scheme for it in which it backs up to an offsite location. This offsite location will be the second member of the Replica Set.
- The Password Manager’s data (the Metadata Stash) is protected by two sets of credentials, the “Operational Credential” and the “Recovery Credential“. Either can be used to decrypt the Metadata.
- The Operational Credential can be a password, as long as it is sufficiently complex so it cannot be brute forced. I use the Operational Credential on a day-to-day basis to access the Metadata managed by the Password Manager. This Operational Credential exists only in my head; it is never written down or shared with anybody. If I forget it, it is lost forever.
- The Metadata Stash is regularly backed up to several other Metadata Stashes (i.e. I have a Metadata Replica Set with a sufficient number of members). These Metadata Stashes are located in places that are easily accessible, and that are unlikely to all die at the same time. There could be many copies: on public websites, even on IPFS. (For extra security, those other copies might not be accessible with the Operational Credential.)
- The location of the Metadata Stashes is written down on a piece of paper. Copies of this piece of paper are distributed to the Stewards (see below).
Summary of regular operation: encrypt all your data, backup all your data, use a password manager to manage credentials and to capture your data inventory; distribute copies of the password manager’s database widely and let others know where the copies are.
Now for recovery case 1: I have lost everything, but I am not an amnesiac:
- I buy a new laptop.
- I download a copy of my (encrypted) Metadata. If I forgot all locations where the Metadata is stored, I ask one of the Stewards who will show me their piece of paper that has the locations.
- I open the Metadata with the Password Manager, enter the Operational Password, and now know where all my backups are, what they were backups of (by looking which Data Stashes are in the same Replica Set) and restore the data to the new laptop and/or other machines.
- When I’m done, I update the Metadata, and make sure the Metadata Stashes are updated as well.
But what if I’m an amnesiac? To support this, some extra setup is required first:
- Before the calamity, I pick a set of N “Stewards“. A Steward is a friend who will do a few things for me when I ask, and somebody who I have trust in. The larger the number N, the more secure, but the more overhead there is when recovery is necessary. N should probably always be at least 3; 8 or 10 might be much better. If the different Stewards do not know each other, or at least not well, that might be an advantage.
- I send a sheet of paper to each Steward. The sheet contains:
- the set of locations of the Metadata Stashes
- a fragment of the Recovery Credential
- the name and contact information of some (but best not all) other Stewards. 1 or 2 might be best.
- The fragment of the Recovery Credential is constructed by splitting the Recovery Credential into M fragments (see WikiPedia). M should be larger than 1, and less than N; depending on the choice of M, there is a tradeoff between security and possibility of failure due to a insufficient number of participants. So this is another number that can be chosen by the user based on their needs.
Now what if I’m hit by the bus, and I turn amnesiac? This might be easier to explain if I’m out of commission, and a friend restores things for me:
- The friend, if not one of the Stewards, asks around among other friends who might be a Steward.
- Once a Steward is found, we know the location of the Metadata Stashes and can recover the Metadata from one of them.
- Given a Steward, we ask them for other Stewards. They can refer to their sheet of paper to find others.
- Once we have M Stewards, we bring their key fragments together to reconstitute the Recovery Credential.
- Using the Recovery Credential, we can decrypt the content of the Metadata Stash, and recover the data from the remaining Data Stashes.
A similar process would be followed if I am fine but simply forgot my Operational Password. Sorry to bother you, Stewards, but well, that’s the price of security.
Obviously this scheme can be improved and also made more secure. For example:
- Some of the necessary Metadata recording and management could be automated.
- When signing up Stewards, they provide their public keys which will be distributed with their contact information on the other Stewards’ sheet of paper. That makes it harder for attackers to impersonate Stewards, although nothing particularly bad happens if somebody manages to.
- When copying the Metadata Stash around, we also copy a digital signature of the Metadata Stash around that was created with my public key. My public key was previously distributed to the Stewards on their sheet of paper. That way, we can ensure that the Metadata Stash was created by me, and it’s harder for an attacker to “poison” the Metadata Stashes with false info and thus hinder recovery.
- Instead of a piece of paper, the Stewards can manage the information using their Password Manager, in particular if they use Paradux as well.
Obviously, this also solves the problem of recovering any other password, and if I scan all important documents that I have (e.g. birth certificates, passports etc) and store them to a Data Stash, I can recover at least copies of all of those at the same time.
So. What do you think? Viable? Secure? Useful? Worth the overhead? I appreciate all comments!