Sunday, December 26, 2010

Best description of what architecture is ... by Charles Darwin (yes, THE Charles Darwin)

Here is a quote attributed to Darwin:

"It is not the strongest of the species that survives, nor the most intelligent that survives. It is the one that is the most adaptable to change.

Right there, for me, is the best definition of software architecture I have seen. Software architecture is whatever you do that allows your system to be adaptable to whatever changes its environments throws at it (scale, hardware failure, legal changes, change of business model, new forms of distribution etc.), better yet Darwin goes to to say

"In the long history of humankind (and animal kind, too) those who learned to collaborate and improvise most effectively have prevailed."

So in his view the keys to adaptability are collaboration and improvisation, which I'd translate it into modern systems as "Communication" and "Experimentation". Communication is THE key to collaboration and a quality lacking in a large number of leading engineer/architects. The core communication skill here is not about communicating what one's particular solution to a problem is, but more a style that is about inquiry, a style that illicit input and views from all significant parties and synthesize them into models and hypothesis and communicate them back and form common understanding (does your architect do that?)
 
 Experimentation is both a personal quality and something that must built into any system as a first class architecture requirement. Experimentation is generally under-valued in architecture discussion. Systems in general have multiple views, the most popular being the 4+1-views, I would add an "Experimentation View", a view of architecture that describes which sub-systems can be experimented with and in what ways, and how would an experiment success be measured. More on "Experiment View" later, but for, I am even more convinced that analogies with and inspiration from biological systems is worthy guide for designing software systems.

Wednesday, December 22, 2010

Cannonical Use Cases for a Relying Party

I have always felt that in identity community we spend most of our time discussing identity providers and their concerns (such as token format, protocol etc.) and do not spend enough time on and attention to relying parties. At eBay we play both roles i.e. we are both an identity provider (we provide sellers' identity and attributes to 3rd party developers) and relying party (accepting identities provisioned outside eBay marketplaces as eBay users). I can say that building a architecturally sound relying party is as challenging as building an identity provider.

In this post I simply want to enumerate the major use cases that any major relying party (that is anyone that for example plans to accept Facebook Connect) has to account for. I used the qualifier "major" to denote that these use cases are important if a given relying party has millions of uses and many services and applications (much like eBay does).

The list of use case are:


1-Sign-In and out

2-Connect and Reception

3-Link/Unlink

4-Profile Access Extension

5-Role Elevation

6-Recovery


7-Disconnect


8-Force Authentication

9-Customer Support 


10-Capturing Alternative/Second Secret 


and here is the descriptions:


1-Sign-In and out: This includes changes to your standard sign-in page to make it a "federated sign-in" page. The challenge here is mostly user experience i.e. how to design the UI correctly to achieve two goals:
   - Not confuse existing users who will not sign up with external IDP
   - Communicate to the user of external IDP what they need to do (without creating the NASCAR problem)
 There are also techniques for detecting the IDPs that user may have an account with and to show a "smart" list of IDPs

2-Connect and Reception: Once users clicks on "Connect" button (if you are using connect style IDPs such as FB) or entered her OpenId URI (although this is unlikely to be adopted by users), the user is send to IDPs to sign-in and given consent to his/her information to be send to you site - let's refer to this process as Connect - then user is sent back to your site to a page/application that we refer to as "Reception". This is the processes that greets the users for the first time and provision an account in RP for him/her. I use the word "reception" to make it distinct from "registration" which is when provision is done based on data collected by RP itself. The reception process is significant b/c it covers the gap between data received from IDP and what is needed for a user to be provisioned, also it assigns the roles for new user. These roles are typically minimal since data coming from external IDPs are normally not trusted or verified. Also during reception token received from external IDP together with associated meta data is stored in a central location accessible to different functional units (application) of RP

3-Link/Unlink: Another use case (often part of reception) is to detect whether the user connecting to RP is one who already has an account. The detection can be done based on mapping the data received from IDPs to existing account, the simplest form is to check whether email addressed returned by IDP already exists.Once an account is detected, user has to prove s/he actually owns it (normally by providing password) and the accounts are link. Since architecture hygiene calls for symmetric operation, you should also allow for unlinking of accounts.

4-Profile Access Extension: RP obtain a token during reception (such as OAuth token that comes with FB Connect), this token stores a set of access permissions to user resources (perhaps hosted by IDP). Any large RP has a set of applications that will use this token (for example MyeBay application as well as eBay Search Application) it is likely that one of these applications requires more information/access privileges that user originally consented to, in these case RPs should provide a central capabilities that conduct the process of requesting, receiving extended permissions from user and updating token meta information associated with user

5-Role Elevation: The first time users connect to RP they are granted a certain role (roles), normally this is a basic role since data provided by most IDPs are not reliable (eBay as an IDP does provides verified reliable data), at some point during the user's life cycle, users needs to perform an action that requires higher role assignment, in this cases RPs should provide capabilities to assign users higher role, this normally requires users to enter more information or go thru verification. This processes produce more attributes that will become part of users profile at RP.

6-Recovery: Every RP always has to establish a method for externally provisioned identities to authenticate WITHOUT the presence of external IDP. What does this mean? suppose you accept FB Connect and FB is down for 6 hours (an event that recently happened), further imagine that you operate a site that every minute of users not being able to login means financial loss. What do you do in this scenario? You may say, this is easy, ask users to enter a password during the first time reception, but wouldn't this defeat the whole purpose (or a big part of it) of users not having to remember many passwords?

7-Disconnect: All RPs must provide the capability for a user to "disconnect" i.e. close the account that was created based on an identity provided by external IDP. I personally believe that user owns his/her data and if user wants to disconnect and remove all of his/her activities from the record. s/he should be able to (to the extent that is legal)

8-Force Authentication: This is actually a capability of IDP, but RPs need to use this when they require user to be authenticated regardless of session's authentication state as seen by IDP. For certain operation RPs require a fresh session (or session that started in the past N minutes), in this cases RPs should request a forced authentication (I am using SAML terminology here) from IDP.

Monday, December 6, 2010

OAuth protocol and "Emdedded Browser" : Illustration of a simple security decision making framework

The discussion about whether to use an embedded browser or a full browser while performing OAuth protocol ,although passionate, is nothing new. Instead of expressing a position (I am for a full browser for the record), I want to use this case to illustrate a simple framework I use to make security design/implementation decision.


For every question/decision we designate one engineer as "good guy" and one engineer as "bad guy" and we ask each how this decision help or hurt you. I emphasis that it is critical to think and view a decision from a bad guy point of view. Most of us are basically not fraudsters and think in terms of "getting things done" and "being helpful", we rarely look at a feature and think to ourselves "How can I abuse it" (if you do, please contact me, I may have a job for you), so we assign an engineer just to do that.

Of course we want all our decision to be just as shown in the figure above :Help good guys and hurt bad guys. But that is not always the case, take using the embedded browser (and not a full pop up) to perform OAuth. This decision looks more like this:
It helps good guys (by not having every app render a login panel, perform authentication and deal with errors etc.), but it also helps bad guys (as illustrated nicely by my friend and colleague Yitao Yao  here - facebook is only one example) by making phishing easier. (it is worth pointing out that having application capture user secrets directly is along the counter diagonal in matrix above: hurts good guys and helps bad guys)

Does this mean you should never do this? I wish there was a simple and universal answer, but there is not.
In general you should not use embedded browser unless there is a good reason and more importantly the decision maker is fully aware of the risk (at least seen the matrix above and know the bad guys will benefit). Mitigating circumstances could be unattractive economy of fraud (your application is not a prime target fraudsters), you believe fraudsters can not effectively distribute their application to a large enough group of people, your user base is exceptionally savvy (I envy you) or your mobile operator has a good control over nature of applications and what they do (think Apple-like).

Sunday, December 5, 2010

Kitchen Meeting

I started an experimentation a few weeks back. Instead of scheduling and holding a meeting in one of eBay's formal meeting rooms, I tried to meet with people in kitchen or cafe (or starbucks across the street from the Hamilton campus) - now this is partly b/c it is getting too tough to find a free room, but I also had a hypothesis that when people meet in informal setting - specially a place that has to do with eating - they are more likely to collaborate.

I have to say the results are mixed so far, on one hand it indeed does make a difference, people seem to be friendlier and more willing to see "the other side" of issue - whatever it might be. However I also noticed that they are more likely to "forget" or back-off from agreements or conclusions that were reached too.

For now I keep doing this as much as I can, at least it gives me an incentive to keep the number of people in a meeting to 3 people max, this alone helps increase the effectiveness of meetings.