In a recent post I argued that OpenID identifiers such as www.davidrecordon.com, =eek and mylid.net/jernst are much more natural than those generated by Yahoo! or Google that might look like this: me.yahoo.com/a/vIxu8Lll29jYXQEYBNg86tIZgY7Bs8c7.
Eric Sachs, the product manager in charge at Google, gave me a hard time over it; actually, he didn’t because he’s way too nice to do such a thing. But he let me blog some of our conversation. He writes:
Google & Yahoo both use OpenID URLs by default that are not human readable, and if someone visits them, the pages have no information about the user…
One of the reason that Blogger’s OpenID service launched before the generic Google service is because those users by default had already been through the pain of picking a "human readable" name for a URL.
For our E-mail users (@gmail.com/@yahoo.com) we could have chosen to return URLs with the user’s E-mail username as AOL does, but chose not to for what are hopefully obvious reason.
So we were left with the options of (1) not launching IDPs at all, (2) launching the IDPs with machine generated IDs, or (3) forcing our users to pick an "human readable" name for an OpenID URL (but one that was not their E-mail address)
Unfortunately, for both of us option 3 requires a user to try on average 5 times to find a name that is available. We have tried to force users to pick such a name for other services at Google, and the abandonment rate is 90-95%. Yahoo’s experience is similar. The RPs we have talked to (both big ones and small ones) have said they would not use our IDP if we forced users to go through a process with such a high abandonment rate.
So while I understand that in a perfect world our choice of 2 over 3 is not great, the alternative is 1, i.e. not launching IDPs at all. And if we went down the path of #1, then the only people who could use OpenID would be bloggers and users of some social networks that use human readable URLs for profiles (though that excludes Facebook,
To which I responded that I think there is a third alternative a la tinyurl. Here is Eric again:
Yep, that is what we tried for orkut.com initially. Unfortunately we got the standard problem of dictionary attacks against those URLs for screen-scraping. We experimented to find the smallest length ID that would still enable us to implement DOS style blocking for screen scraping. Unfortunately that length was 2 characters longer then what people could remember for their own ID, so we sadly gave up :-( And that was actually back when the orkut.com user base was a LOT smaller.
You could check with … MySpace/Facebook because I think they tried the same thing and had the same problem.
Me again: "Then I’m missing something. Why is that a problem? What advantage do I gain as an attacker if I guess that somebody’s URL is me.yahoo.com/a/vIxu8Lll29jYXQEYBNg86tIZgY7Bs8c7? In particular if sites — like this example in point [the openid.net site that sparked this thread] — will show it publicly anyway?
Eric:
The problem a bunch of social networks have had is evil websites who just try to crawl our entire systems by guessing our users’ profile URLs. They then use that data for a number of nefarious purposes, some of which are pretty sophisticated. Others are more basic, like trying to track the size of a social network’s user base for competitive purposes.
If hackers can still guess (or find elsewhere) some URLs, that is not nearly as damaging as them being able to easily crawl the whole social network. And by making our profile IDs longer, we can monitor for hackers who are trying to guess profile IDs because we see lots of requests for non-existing URLs.
I thought this exchange was worthwhile blogging. This is not the first time I’ve had this conversation, but I had not been aware of that last argument — which, as made, seems to apply mostly to social networking-related sites, but not other types of sites.
What I’m concluding is:
- If you can get your users to pick human-readable names that are not also e-mail addresses, that’s the best alternative.
- Otherwise, use a tinyurl-style automatically-generated scheme, unless it conflicts with the goal Eric outlined.
- If all previous approaches fail, use what is essentially a randomly generated UUID.
Remains to say for me that I now understand the argument that is being made, but there are substantial counter-arguments be made as well, including the user-unfriendliness of a non-human readable scheme, and the much higher susceptibility of what I call Phriend Phishing.
The last word hasn’t been spoken, but hopefully this discussion is helpful to understand the state of the best thinking.