How do we deal with multiple ontologies for identity data?


We have been working on a comparison of the ontologies of various identity systems at NetMesh in recent weeks. (Or schemas, or vocabularies, or models, or what-shall-we-call-them.) In other words, we’ve been comparing the quantity and quality of the meta-data that defines the meaning of the identity information that can be exchanged between parties according to an identity system or standard, without having to agree on anything other than the standard.

Why is this important? Well, if I send you a piece of identity information that looks like this:

<foo>
 <bar>Johannes</bar>
 <baz>Ernst</baz>
</foo>

you got to be able to interpret somehow. If you know me, that’s easy. But if you don’t, you need to know whether Johannes is my first name or last name, and the same for Ernst (yep, there are people named the same as me, but first and last name reversed). Unless you and I agree that tag <bar/> means "first name", and tag <baz/> means "last name", all sorts of undesirable things happen (the suggestion that I simply should have called the tags something meaningful in the English language would not go very far: only about half of six billion people world-wide speak English.)

This example uses XML, but whatever way of data representation from name-value-pairs to conceptual graphs and what have you, has the same problem.

So the number, expressiveness and documentation of the various elements of the identity ontology is very important for interoperability. If an ontology didn’t have a separate item for "first name" and "last name", but only one for "name", for example, there is little chance that the receiving end could possibly figure out with any degree of certainty which is which. (If there was a convention such as: "look for the comma", that’s still a convention that both sender and receiver need to agree on and thus part of the shared ontology; not that this would work very well with many names around the world.)

In the general case, no identity data exchange is possible at all if there is no shared ontology between sender and receiver at all.

[That’s of course true for all data exchange, not just identity data exchange. As Wittgenstein reportedly said: "If a lion could speak, we would not understand him.” — because of the lack of a shared life experience and thus of a shared ontology between lions and people.]

[Another side note: while architecturally, it may be a very good idea to define identity standards without an identity ontology, by definition, this means that the identity standard is not sufficient in itself to allow any meaningful information exchange between sender and receiver of identity information without further agreements beyond the standard. I think this is very much worth pointing out if one of our goals is to create a universal, plug-and-play kind of identity layer on the internet.]

I will write about the results of the comparison that we did at NetMesh in a bit, but let me just observe that based on what we’ve found so far, as an industry are fairly far away from a shared ontology for identity data, across the various standards and product development efforts, that actually meets the requirements as we see them. For the foreseeable future, it appears as if we have to work under the assumption that many, if not all, real-world deployments of identity systems which desire to interoperate with other identity systems have to deal with multiple identity ontologies, many of which are rather basic and which work only under certain assumptions about use cases.

So I have only two suggested answers to the question that this post raises:

  • All of ours (in the community) professed goals of creating this interoperable identity layer all across the net are rather shallow unless we work proactively on creating a shared identity ontology, preferably across standards and products.
  • Products that claim to be "interoperability" products in identity better have a plan how they handle differing identity ontologies, otherwise this claim sounds rather hollow. Imagine the whiz-bang product that lets the lion speak …

I hope that our systematic comparison of the various ontologies is going to be helpful towards those goals; if only in making it clear where the many holes are.

[If you want to review our draft before we put it out, drop me a note, okay? We’re trying our best to be complete and correct, but there’s a good chance that some people reading this post understand some of those ontologies much better than we do because they created them …]