The Sovrin system for “self-sovereign Identity” the latest kid on the digital identity block. It follows many other initiatives for “solving” digital identity in the almost 20 years since Microsoft Passport, which arguably was the first one, implemented so badly that is spurred a lot of people into action :-) Some of them were Liberty Alliance; Microsoft Cardspace and OpenID. I had my own, Light-Weight Digital Identity (LID). All of them had issues :-)
After taking a break from this community for a few years, I attended Internet Identity Workshop XXVI this week to catch up. The talk about Sovrin was everywhere, and I thought I write down my impressions. Note: some of this I may have misunderstood, and I likely miss major pieces of information, but I will make corrections to this post if you point out mistakes (and I hope some of my friends involved in Sovrin will).
Like in most digital identity systems, it all starts with an identifier. Identifiers in Sovrin can point to people, things, software, organizations, data — anything, really. Unlike in many other identity systems, identifiers in Sovrin can come from a variety of name spaces. All look like random numbers or strings. On Zooko’s Triangle, Sovrin identifiers are not human readable, but decentralized, and (possibly) secure. For example, bitcoin addresses (hashes of public keys) are valid identifiers in Sovrin, as are random strings (I think), as long as a suitable name space has been defined. Sovrin calls them “Decentralized Identifiers” aka “DIDs“.
Each identifier (DID) resolves to a file, called the “DID Document“. How it resolves is not that important, architecturally, and different schemes could be used. (The default is that the DID Document can be found on a special-purpose blockchain, operated by Sovrin member organizations. But we can mostly ignore blockchain to understand most of Sovrin.) The DID and its DID Document are bound to each other through a pair of public / private cryptographic keys. In a way, that makes the DID Document more important than the DID, because the document contains the public key, the DID, and it is signed by the corresponding private key, which is held confidential by the signer.
This is a great primitive from which to build more complex systems; I happen to agree with this primitive because we did the same in LID 14 years ago, employing GPG, which sadly wasn’t picked up by others at the time, so it’s great to see the idea surrected in Sovrin.
Why is it a great primitive? Because it enables any two entities, pointed to by some identifier each:
- to authenticate each other;
- to prevent evesdropping on data exchange between them (by encrypting data with the public key of the other party);
- to be certain that received data was originated by the party that claimed to have sent it (by means of digital signatures).
Once you have entities that can be certain that information exchanged between them has not been tampered with or intercepted, you can do great things. (And conversely, if you can’t be certain, you get the insecure internet without an identity layer but full of password forms and repeated form fields containing outdated, non-verified information as we have it today!)
The DID Document contains something else: a communications endpoint for what they call an “Agent“. Think of this entry as the modern, decentralized version of the DNS “MX” (“mail exchange”) record. If you send e-mail to email@example.com, your e-mail software will look up the MX record for example.com in the DNS system, which contains the address of the computer that is willing to receive incoming e-mail on behalf of all users at example.com, such as joe. Similarly, the communications endpoint described in the DID Document for a particular identifier contains the location (it’s a URL, rather than the name of a computer, but that’s a minor detail) of a piece of software that is willing to receive messages on behalf of the entity behind the DID. As the DIDs are not hierarchical names (unlike DNS, and thus e-mail addresses), the communications endpoint is defined per DID, not for all users at a domain; there is no concept like a domain in Sovrin.
It appears all data in Sovrin is expected to be JSON of some kind. As a result, things are extensible, and the DID Document can contain all sorts of other things beyond the basics. So there could be other services associated with a DID (this part of the spec looks an awful lot like the Yadis spec we worked on in the early days of OpenID, just in JSON instead of XML).
Of course, all parties need to agree on how those JSON documents are structured and what rules they follow. To keep Sovrin decentralized even on this layer, apparently anybody can define the rules for the various types of documents (I’m skipping about the details what types of documents there are, such as credentials, or schemas) and submit those to the Sovrin blockchain. From that point in time, anybody can use them. Adoption of any of them is entirely voluntary.
Now that we have primitives, and extensibility, we can build interesting stuff. One of which is privacy-preserving credentialing and verification by third parties. Take a university diploma as an example:
Let’s say I graduated from Sly’s School of Sleaziness with a master’s degree in Financial Manipulation. I was so good at it, they instantly hired me to teach the next generation of manipulators. But I don’t really like it there, and would like to teach at Pumpkin’s School of Political Posturing. They want me to prove to them that I indeed have the degree I claim from Sly’s. But I don’t want Sly’s to know that I’m applying at Pumpkin’s. How do we do this?
In the physical world, I will probably show my paper diploma to Pumpkin’s. That’s often good enough for them, but of course I could merely possess a desktop publishing program and a good printer, rather than a degree. Alternatively, Pumpkin’s could call up Sly’s, but I don’t want them to do that. Can Sovrin do better? It turns out, they can:
First, when I graduate at Sly’s, Sly’s gives me a file that puts the following information together (yes, in a JSON file):
- the identifier (DID) for me (because I earned the degree);
- information about the degree I earned (maybe it has a DID, too; it could just be text), and:
- the DID for Sly’s (because they issued the degree).
There could be other info, such as the date when I earned the degree etc., none of which matters for us here. Then, Sly’s uses their private key and digitally signs this “degree document”. In Sovrin parlance, such a document would be called a Credential. (I’m not particularly happy with that term because it is used for all sorts of things in computing, most of which are unrelated to its specific use in Sovrin. But well…)
Then, they give this signed document to me in some way. For example, it could be printed on the back of my paper diploma :-) or more likely, they send it to my Agent, which they can simply find by looking at the DID Document that goes with my DID. Depending on the use case, such a Credential might be published somewhere, or kept very secret; Sovrin doesn’t care.
When my Agent receives such a Credential, the Agent will most likely (that’s something that I have control over through my choice of which software I run as my Agent) put it into my Sovrin “Wallet“. This Wallet, of which there currently are only early implementations as I’m given to understand, could look very similar to Apple’s Wallet on iOS, for example: each Credential is shown as a card of some kind with branding by the issuer.
Now how do I prove to Pumpkin’s that I have the degree from Sly’s? Simple: I simply take the Credential, which is just a JSON file with a signature, and send it to Pumpkin’s. Pumpkin’s now knows:
- it is me who earned the degree (it has my DID in the document where it says who earned the degree);
- it was Sly’s that issued the degree (it has Sly’s DID in the document where it says which school);
- Sly’s makes this statement (because the whole document was digitally signed with the key pair that goes with Sly’s DID) and no aspect of the document was forged by me or anybody else.
Note that nobody other than me and Pumpkin’s need to know that I have shown them my Certificate: there is no callback of any kind to the issuer (Sly’s) as long as Pumpkin’s already has Sly’s DID Document. (And if they don’t, and the DID Document is retrieved from the blockchain, nobody can tell whether Pumpkin was looking for Sly’s keys or anybody else’s, and certainly not that I was applying at Pumpkin’s. This is great, and as far as I can tell, something novel.)
So far, this is all pretty straightforward and elegant. Due to my work on LID, it all seems very familiar (of course there are many differences, but mostly in the details: JSON vs HTML/XML; DIDs vs URLs; blockchain-based resolution vs HTTP GET etc. We did leak information about validation back to the issuer, something Sovrin solved. Also, LID was a much more “web-centric” system than Sovrin; pros and cons there).
Credential revocation in Sovrin is a bit messier, as it appears. I say “appears” because it involves a tricky application of Zero-knowledge Proofs, with all needed data on the blockchain and I cannot claim so far that I understand it. There is also only very high-level documentation that I can find, I wonder what the implementation status is?
Finally, let’s return to blockchain. What does it actually contribute to the picture here? I think many applications of this architecture will not actually need to use a blockchain, which is Good News as far as I’m concerned. What blockchain does contribute is:
- a globally visible record of valid associations between identifiers and associated metadata (DIDs and DID Documents). Without it, because DIDs are not necessarily self-authenticating (like a public key would be if used as an identifier), you could imagine an attacker to associate an alternate key pair with a given DID, and wreck havoc in the part of the universe that does not have access to the valid one.
- a global repository of schema-related documents that define data formats used with Sovrin. I guess they could also have been put on a website :-) but blockchain makes them more available in the long term, and more definitive.
- recovation, apparently :-)
- a blockchain token that has monetary value. They talk about using it as a utility token to let “money” flow between those making statements, and those relying on them. It is the blockchain equivalent of a title insurance premium: “if it turns out that the statement you made was false, but I relied on it, I have already received monetary value to compensate me for that risk.” I don’t think Sovrin has figured out the gory details how to really make this work, though, so far. (That’s ok. They’ll get to it.)
I’m leaving out some details that I don’t think are necessary to understand how the system generally works. For example: the exact messages Sovrin has defined that are supposed to flow to introduce two parties to each other and establish relationship-specific DIDs/key pairs, or that they are distinguishing between “cloud” and “edge” agents / wallets. There’s clearly a lot of work they have done, which undoubtedly is also going to evolve as real use cases and real users are thrown at the system. Those details can happily change if the foundation is clear and solid, and that’s what I focus on here.
So, that’s all I know (so far). What did I get wrong, or do I still need to learn?