Software For All Seasons: Architecture

Showing posts with label Architecture. Show all posts

Wednesday, March 14, 2012

What really is this "Managed Market Place" anyway?

I work for a division within eBay Inc. called "Managed Market Places". The name is a bit curious. I was asked, more than once and by range of people, what really "managed marketplace" is? Is it a new type of marketplace by eBay (no it is not!), is it a vertical/niche marketplace within eBay (no it is not!), some one on Quora even interpreted it as it means that eBay simply "manages" the marketplace as oppose to growing it! ( if this was the case why would eBay announce that to the whole word by labeling it as such?)

So then what exactly is MMP (as it is known internally) and why is it important?

the nature of the Internet lends itself perfectly to the basic concept of a "marketplace": a mechanism for buyer and seller to find each other. Marketplaces were and still are an important and growing part of the internet. The growing list of niche marketplaces include etsy, zarrly, odesk, airbnb, taskrabbit, yardsellr, zimride and many many more. (not to mention marketplaces from facebook, google, yahoo and other major players)

At the first glance, it looks simple enough: create a site that brings the parties to a transaction together (from buyer and seller of antique to two people who want to share a ride or a room), and either take a cut of the transaction or make money by advertising. This is indeed the basic concept behind a marketplace - or an unmanaged marketplace. Marketplace itself is not a party to any transaction. Buyer and seller deal with each other directly and take the risk (or bulk of the risk) of direct transaction. EBay operated, more or less, as an un-managed marketplace for a while too.

In managed marketplace on the other hand, neither party to a transaction takes a risk, in other word marketplace guarantees the success of transaction, no risk (at least ideally). Of course a managed marketplace can "manage" other aspect of interaction such as inventory, quantity, price, promotions etc. as well but for now we only focus on risk as it is the focus of eBay MMP as well.

The evolution of simple internet marketplaces to managed marketplaces is an important trend, as the Internet users become more sophisticated and demand more from services they use online. The AirBnB incident back in July of 2011 is a perfect illustration of how "unmanaged" marketplaces will be forced to offer a higher level of assurance/risk mitigation and become managed marketplaces.

What does it mean from systems and architecture point of view? Here are five main aspects that is particularly different in dealing with managed marketplaces

1- The first significant change is that of people's mind set: You have to see yourself in risk management business, or at least assume that risk management is a major part of your operations. What this changes first, and foremost, is that you now have to identify, assess, prioritize, mitigated (or plan to) and measure risk. In all likelihood all of these activities (and the tools and systems you need to perform them) are new to you if you are dealing with a simple/un-managed marketplace.
2- Central to any consumer risk management scheme is "Identity", and I don't mean OpenID or OAuth or SSO... I meant attribute, assurances, verification, accuracy, uniqueness or mapping a real world entity to a digital identity (Entity Resolution)
3- Data is the core to efficient risk management, and big data and your ability to collect and analysis them becomes central to your ability to operate the marketplace at a reasonable cost (minimum losses)
4- Coherent Architecture become even more important. Simply because your systems becomes more complex and more integrated. A simple marketplace is just that, a marketplace site/application. A managed marketplace would include identity provisioning and verification, risk definition, measurement, management at user and at transaction level, a system for filing claims and disputes, systems dealing with ever changing legal and business landscape that enforces what you can and can not do with data you collect and finally integrating all these system in a productive way (seamless but without coupling them)
5- Even Driven and Complex Event processing: This already has a big role in distributed system, but it plays more and more important role in distributed risk management. Real time assessment of risk becomes critical and due the cost/performance of risk assessment, incremental assessment or risk based on primitive and complex event generated over entire session (or even life time of a user) will be the only practical solution.

Wednesday, November 23, 2011

The Uncommon Security Common Sense

I can not claim that I actually counted or classified all the reasons peoples cite for not taking security (or for that matter sound and well thought through system design) seriously from the start, but the three following lines seems to be the most common ones:

1- The "it is too contained" line: So what is the big deal? at worst it may affect a very small percentage of my users.
2- The "it is too early" line: Oh my system/site/project is too small and we only have a few users, we really don't have time/resources for this.
3- The "it is too small" line: My project is too small or too obscure for anyone to care.

By the way, I have heard these lines or their equivalent not only when it comes to security engineering (or re-engineering) but also in designing business policies or risk management measure to prevent fraud, or in general negative user experiences as well as general system design.

Now to be fair, these reasons all sound like "common sense", after all why would you take on additional cost and time for your project or accept the expense and risk of re-engineering your code to fix an issue that may only affect 1% or 0.01% of your users? or why should you spend two weeks to fortify a system that takes you 3 days to design and it is "just an experiment"? and finally who really cares about a small project some where with some obscure URLs that takes an email address as one of its inputs and shows some useful error message if the email is not registered? does ANYONE really care?

Well, as it turns out, security common sense (like many other form of common sense) is actually quite uncommon ! Let's look at these frequently cited common sense logic a bit closer.

To demonstrate the fallacy behind the first logic (it is too contained, it only affect 0.01% of users) I cannot think of any better illustration than the words of presidential candidate Herman Cain where he said that "for each woman who has accused him of harassment there are probably thousands who haven't" and he is 100% accurate and right! But does that make any difference? In all likelihood his presidential bid is all but over. Or could the Washington D.C police chief during the "D.C Sniper Attacks" have possibly argued that the whole thing was not a big deal b/c only 0.001% of D.C metro population were actually killed and therefore there is no need for massive mobilization of police, FBI, ATF and even secret service !?

The same math is thru for security, it does not matter if only 1000 users out of 10MM become victim of a
poorly secured or design system. What matters is how many people hear and learn about it - and you can be
sure that at least in this day and age that number is a few order of magnitude larger than the actual number of
victims. The sense of insecurity that this causes in the rest of the user community and its economic cost is the real math that matters not the fact that only 1000/10MM=0.01% users were affected.

The second line "it is too early" or its equivalents "we don't have enough time or resources" is the most commons line not only in security matters but also system design and architecture aspects as well. What is interesting here is that the exact premise cited for not focusing on security (or sound design for that matter), is why security should be taken seriously i.e. "I am too new to afford not to be secure", if you are releasing a new product (or brand or a site) you REALLY DO NOT HAVE A SECOND CHANCE TO MAKE A FIRST IMPRESSION. If you are not secure, or if your first few user gets taken advantage off (think of AirBnB incident) you are doomed. To further demonstrate the risk in this argument I submit the following picture of one of the more famous car design mistakes : Honda Odyssey 1998

Honda designed this in a hurry to get into the growing minivan market dominated by Dodge/Chrysler. They decided to differentiate by replacing a convenient power sliding door with a traditional door! Imagine what would have happened of this was a new no-name company without Honda's established brand? Of course Honda corrected the mistake in 1999 model and beyond and went on to have one of the most successful Minivans. But if you are not Honda, you better spend time and money on designers and marketers to tell you, in the first try, that whoever buys a minivan *needs* a sliding door.

Now we get to the third line "Who really cares about me?" I have to admit that I have the most sympathy with people who resort to this logic. After all it is tough to imagine how capable and resourceful the modern fraudester/hacker community is without actually having a brush with them. I do not get into the details - if you are interested you can briefly scan Rick Howard's excellent book "Cyber Fraud, tactics, Techniques and Procedure" - for the purpose of this writing I'd suggest you assume the following is true:

In the game of "Who wants to break into my system" your adversary is more motivated (financially or politically) than you are, more experienced than you are, is more innovative than you are, is more nimble than you are, wants it worst than you do, has a smaller cost base than you do (and therefore) all he needs is 0.01% (or smaller) of your users - the ONLY advantage that you have is that you right the rule of the game. Do not give up that advantage easily. You WILL lose the game.

Btw, the end point URL that takes an email and very nicely checks and display an error if email does not belong to a valid user - was actually found (although it was a little obscure URL not linked to from anywhere - and used to extract valid company X user emails (cost $5000+) from a large list of non-verified harvested emails (cost $50) - a vital part of phishing industry value chain.

Wednesday, October 12, 2011

PayPal Access & Commercial Identity

Today eBay Inc. announced an identity and attribute provider product called PayPal Access. Some described as a "Facebook Connect for Commerce", others described it as an easy registration tool for mobile site. Today at the X,Commerce Innovate Conference someone suggested to me that this is the first step for eBay Inc. to offer full cloud based user management for e-commerce sites and merchants. You can also see the official press release from eBay Inc. here.

Most of the press and coverage today focused on "Consumer Identity" - or more accurately Consumer Commercial Identity - and the benefit of PayPal Access for consumers and online merchants visited by those consumer. Consumer identity is indeed one facet of "Commercial identity" - but there is another side to commercial identity, a less understood - and arguably less sexy - side and that is Merchant Identity. What do I mean by this? Let's look at a scenario:

Merchants themselves are consumers of so many online and offline services (think of it as B2B services) - a company that sells on eBay - or any other online channel - has an eBay account, an account with a shipping company (FedEx), a Facebook account, perhaps another account with a email marketing service, bank account etc. Clearly merchants suffer from the same "account and password hell" that consumers do - but this hell is a lot deeper and hotter for merchants, consider these facts

- Most merchants have employees/contractors who create these accounts on behalf of the merchant,
- A lot of these employees (for smaller merchants) are part time or temps
- Employee turn over is high

Here in addition to the usual forgetting one's password - which for merchant leads to loss of productivity and money - sometimes the person who created the account simply leaves - if you are lucky and s/he good terms, you end up having to chase the employee and restore your access, if not, you are exposed to unauthorized access by the employee or down right "account take over".

You might say, what is the difference between this and consumer identity, these employee are consumers to and technically there is no difference. But look closer. Merchant use cases are fundamentally different. In consumer identity use cases, a consumer is a principle and gives consent on his/her own behalf to a agent (another site or application), the IDP itself recognizes the consumer is the principle and allows her (and ONLY her) to change or revoke this access. In Merchant cases, what appear to be the consumer is really not a principle binded to the merchant identity but an employee. In this case IDP must recognize this "hierarchical relationship" and allow and "admin" employee of merchant to monitor and manage the life cycle of tokens (and identities) of employees.

In the use case above, merchant X would not reveal its primary eBay user name and password to any employee, the would provision an account for each employee. Employee then logs into eBay using her own account - and via PayPalAccess - All the while PayPal Access monitors and manage all the tokens issued to all employees of merchant X. Should an employee leave or changes function, the token can be revoked by merchant X admin regardless of employee's decision.

If this sounds familiar to LDAP or ActiveDirectory, b/c it really serves the same function: Enterprise Identity, in this case enterprise is really a merchant. This is not unexpected in the world where enterprise identity, consumer identity (a.k.a social identity) are converging - and there is a need for a cloud based enterprise user management.

Please note that this is NOT an annoucement (or leak) for PayPal Access Cloud-Base user directory. IT IS NOT, REALLY. I just wanted to point out the there is two sides to commercial identity, a sexy side (consumer) and a side that can make you money (merchants).

In the next post, I will write a bit about Consumer Commercial Identity and how it may be different that social identity.

Friday, September 9, 2011

Interviewing @ eBay Part III - Software Architecture Interview

I don’t know of any job title/role in technology that is more controversial, and evoke more emotional reaction, than that of an “architect”. Engineer, engineering manager, product manager, accountant, business developer etc. all have almost the same definition/responsibilities from company to company, architects role though vary widely: in some firms one cannot do anything without an architect permission and in some others the role is completely eliminated.

You should first know that architecture is a role with a wide definition (TOGAF alone defines five types of architect - enterprise, business, data, application, IT). EBay architects play a combination of tech lead, internal evangelist, tech management and product management, and role is often the agent of change for eBay technical direction, tech stack, technology choices, process and methodologies …

Interviewing and selecting an architect is especially challenging. In addition to core skills of a software engineer (yes if you are interviewing for an architect position, you should be comfortable coding – no Java guru, but be able to code), the main attributes I am looking for are:

- Integrity: Change in technology often brings about change in organization and power structure, people currently in power know this and may not be enthusiastic about it, architect should have the integrity and courage to call for change when it is not popular.

- Leadership: integrity and courage is necessary but not sufficient, in this role you should have leadership i.e. the ability to influence, inspire and induce change in direction (often major changes) in a way that people want to make the change, not forced to (you will have no formal power anyway)

- Clarity : last but not least, architects MUST bring clarity to situations where goals are unclear, definition of problem is fuzzy, needs are uncertain, data is incomplete, assumptions are inaccurate, yet delivery is urgent and pressure is high …bringing clarity to all aspects of such situations are often the most important function of an architect at eBay.

So for interview, expect some of the core software engineering questions, with much more emphasis on modeling and problem solving plus few of the followings:

- When you are asked to “architect” a system – say photo album app – what does that mean to you? What tasks do you perform? What would be your deliverables? How would you interact with engineers?
- How do you ensure the delivered system conforms to your architecture?
- Model and Design eBay
- From the time you type in www.ebay.com , to when you see eBay home page, explain what happens under the hood, at all layers
- How does Ajax-style interaction impact a traditional/classical page-oriented architecture? What are the changes it would force to the classic architecture.
- How would proliferation of Mobile application impact the classical web based architecture?
- Explain Map/Reduce in simple but reasonably accurate term, in a way a marketing person can appreciate it.
- Describe challenges and best practices in developing a distributed system – such as SOA based system.
- Describe the qualities of a well-designed API or service interface .
- Describe your favorite application development framework or design, explain its benefits and shortcomings (e.g. Spring or Struts, or your own framework)
- Compare and contrast SQL and NoSQL DBs, when do you use each?
- How do you store a social graph like LinkedIn or Facebook?
- How do you decide to buy or build a piece of technology?
- eBay, as other online merchants and markets, has a policy against sale of fire arms, how do you design a system to enforce this policy?
- How do you design an application – such as a cart or check out flow - in a way that product and UI folks can experiment with and optimized different aspect of it?
- At any given time, eBay support a set of widely used browsers, for the rest, it display a warning message and asks users to upgrade to another browser. How so you design this system?
In a large and distributed system, how do you ensure data-consistency for critical functions such as authentication/login
Discuss a few significant technology trends, why do you think they are important? How would you anticipate their impact on current architecture/system?
What would you do in your first month of working for eBay

If some of the questions sound vague, it is because they are! (btw, they are a lot clearer than what you'd face with in reality). Remember that you need ask questions, seek and bring clarity to the problem definition before you jump into the solution.

Again if you are interviewing for a particular specialty such as Security, I18N, Messaging, Operations etc. you should expect particular question in those areas (I will post a list of question for my security and identity architecture interview later), but for system and application architecture, be prepared for at least 3 or 4 questions from the list above.

Wednesday, July 20, 2011

BrowserID...and the search for perfect Identity Selector

This is a long post, and here is the gist of it:

In general I think we, as identity community, are still looking for a practical solution to a simple yet adequate "identity selector", CardSpace represent a solution on the rich and complex end and BrowserID represent a solution on the simple (but not rich) end, every solution is a valuable iteration to the final solution of the perfect identity selector - I feel the iterations will continue.

Now, here is for those you are interested in the details:

A few days ago Mozilla announced the availability ofBrowserID. Identity is a hot topic these days, so almost immediately the community (and my mail box) was buzzing with predictable questions such as:

Is BrowserID the next generation of OpenID?
What are the differences between OpenID and BrowserID?
Will BrowserID is the answer to Identity on the web?
Is BrowserID secure?
What is the relationship between BroswerID and OAuth or OpenID Connect?
Should I implement BrowserID now?

All legitimate questions to be sure, but perhaps a few of them could have been clarified more easily if Mozilla had called this feature, say, “Verified Email” – after the title the specification the BrowserID is based on (see Averified email protocol for the browser). I think that would have been a name that reflected the intent and use of this feature and helped answered these questions.

In this post I write about where I think BrowserID would fit in the space and, as risky as it, venture a guess as to whether it will be adopted by identity providers and relying parties (not wise I know…but hey this is a personal blog) - if you want to read more about the implementation details of BrowserID see a very good writeup by Lioyd Hilaiel here, and I also recommend reading Dick Hardt notes on BrowerID as well.

To start let me summarize what BrowserID is. BrowserID is a protocol for

- Verifying users email addresses (as many as a use wants) as reported by an email provider (or Identity Provider)

- Storing it in browser (Firefox as of now) safely - or one would hope so.

- Transmitting email address safely to a "relying party" that is a site that requires an email address securely.

It uses PK cryptography to ensure integrity of all communications. BrowserID requires two key pairs, one generated by IDP (email provider) to sign the assertion (email x belongs to subject y) and another key pair generated later by browser to sign the (subject+email+IDP public key) token and pass it on to relying party (that is your web site). So it is reasonably safe and uses a subject confirmation method known as "Holder of Key.

So how should one use BrowserID? is this OpenID? is this a replacement for OAuth?

I think the proper use of BrowserID, if it ever gets support from other browser vendors, is to replace the annoying "email verification dance". BrowserID is a fantastic lower level solve for this problem. Is this a replacement for OpenID or OpenID Connect? I think not! does it replace OAuth, absolutely not, there are simply two different things.

BrowserID, in my view, really is not an ID - my definition of identity is something I can replace registration with, not just login panel.

An identity is more about attributes than about authenticating an identifier (such as email address). and BrowserID is by design is silent about that matter, in the words of Mike Hanson, the author of Email Verification Protocol:

"The idea that led to the BrowserID work was not "how can we fix identity on the web", but "what is the smallest possible claim we could make to make progress on the browser as a claims agent?"

And

"...Attribute exchange is deliberately out of scope"

Mike states the design goals clearly and the solution achieve its goals perfectly, the only thing that complicates the matter is the name BrowserID and the fact that identity means so many things to so many people.

Will BrowserID be adopted widely as a form of identity? will it replace your login button? I have my doubts on both counts.

First off, let me say that I would use BrowserID for email verification (if it is adopted by consumers), it is a very elegant solution for that. As for wider adoption as a form of identity, let's look at the three main actors in any identity ecosystems: RPs, IDPs and User/Consumers.

For a large number of relying parties, an identity provider that simply asserts one attribute (email address) is not valuable enough to dedicate scarce real estate of "login page" to (and add to their NASCAR complexity), RPs would opt for richer IDPs (Facebook, G+, Twitter etc.), that way they not only get an email address by rich set of information about a user (you may ask how about privacy? more on that later).

IDPs (for example email providers such as Gmail, Yahoo, MSN etc.) also do not have clear cut incentive to support BrowserID - maybe receiving fewer number of "confirmation emails"? Entities such as FB and Twitter are unlikely to support BrowserID (x@y.com does not have to be a real email address, it is simply an identifier so FB could decide to support user@facebook.com) since they are strategically dedicated to have a presence on RPs login pages and not to be inter-mediated by Moziall's sign-in button.

There are also a few questions on the users adoption, prime among them is what happens when a user switches browser or device (with most user today using more than one device/browser to access the internet)

and to lesser degree whether general population of user will sufficiently understand the use experience.

Finally I have to say I am impressed by one aspect of implementation (which I assume was a lesson learned from CardSpace lack of adoption) and that is the near dead simple implementation for RP sites.

In general I think we, as identity community, are still looking for a practical solution to a simple yet adequate "identity selector", CardSpace represent a solution on the rich and complex end and BrowserID represent a solution on the simple (but not rich) end, every solution is a valuable iteration to the final solution of the perfect identity selector - I feel the iterations will continue.

Tuesday, July 12, 2011

Power to the People - Data Sharing Power That is - UMA Draft Recommendation Released

After more than a year of hard work, UMA WG (User Managed Access Working Group ) announced the draft recommendation for UMA protocol.

UMA is a communication protocol and is defined by its formal FAQ as

User-Managed Access (UMA, pronounced "OOH-mah" like the given name) is a protocol designed to give a web user a unified control point for authorizing who and what can get access to their online personal data (such as identity attributes), content (such as photos), and services (such as viewing and creating status updates), no matter where all those things live on the web.

Former colleague and current lead of UMA WG @ Kantara Eve Maler gave a very good presentation last year at Cloud Identity Summit that you can see here. There is also a Webinar on July 13.

To explain UMA and what it tries to do, I feel an example works the best (the spec and the terminology are not the easiest way to understand UMA).

Alice is an on demand speaker and celebrity chef, she travels a lot. She uses

A Calendar application,

A general purpose social networking site

A niche professional networking/community site of food enthusiast

Owns a selling account with a popular marketplace for high end kitchen gadgets and appliance.

(All above service are called resources and Alice is a resource owner)

She uses the services of several useful applications:

Travelers Guide: An application that recommends restaurants and activity in all major cities based on Alice’s preferences of profile on general purpose social network.
Chef Tracker: An application that notifies Alice’s professional connections and fans when she is coming to their town and where she speaks.
Merchandise Lister: A web application that lists Alice’s autographed recipe books and limited edition appliances to multiple online channels including the marketplace.

(All above are called consumer or requester)

Alice can grant authorization to the three applications above using OAuth directly. In this case the four resource servers (Social network, professional network, calendar server and marketplace) each maintain Alice’s authorization to respective application(s).

However Alice (a busy professional) does not have a centralized and easy to access location where she can see all the authorization she granted to different applications. It is true that each resource server maintains a record of Alice’s granted authorization, but since each server has a different way to display the authorization (and they often buried in random places around the resource server) it is unlikely that Alice ever gets a clear view of how different applications are using her information.

Or alternatively, Alice can go to a one server – called Authorization Manager (AM) – and give authorization to three applications at one place.

The benefits are clear. Alice can literally see all applications that access her information (and the extent of their access) at one place. For example Alice can limit the access of Travelers Guide to one week (during which she travels)

To understand why UMA is important and what UMA does, we should first understand some of the implicit assumptions (or some may argue simplifications) behind OAuth 2.0 authorization.

OAuth 2.0 does require an authorization step, where the “resource owner” authorizes a consumer application to access a resource on his/her behalf. This authorization however requires and assumes:

Authorization is performed at the time of resource access by consumer
Resource owner should be present to grant authorization
Authorization assume that future access to resource is always made on behalf of resource owner by requester (consumer application) – not on behalf of someone else (say Alice’s assistant in the example above) - This is very important point and deserves its own post :-) OAuth scope does not cover this class of use cases at all.

And more importantly in my view (and from giving the power back to people point of view)

Resource owner (user) has to keep track of application/consumers s/he has authorized to access various resource on his/her behalf (how many of you know how many application you have authorized to access the collection of your online accounts? And exactly what type of privileges you granted to them? Think about it!)

Now, granted, this is not OAuth problem, by it is the byproduct of individual consumer obtaining authorization to access variety of resources individually.

UMA is set to address all the above with the introduction of a central Access Manager (AM – in UMA terminology) and, the protocols to connect Resource Server and Consumer to AM.

UMA is an ambitious undertaking (in my experience all authorization initiatives are!) primarily because its success depends on whether users (resource owners) can successfully express policies the govern consumer access to the resources owned by them. This is the linchpin of any authorization project I have ever seen or been part of.

Whether UMA is adopted and implemented as a viable product is remained to be seen, but UMA protocol is a firm step in the right direction.

In the next entry, I will focus on UMA model and associated communication protocol among its participants.

Sunday, May 15, 2011

To OSGi or not to OSGi ... that is NOT the question

The topic of OSGi is attracting some attention these days, at least in my neck of the woods. The short of it is that a lot of web application developers are asking whether they should use OSGi or not. My answer: this is not the question you should be asking!

Yes, I know that the popular narrative is that OSGi makes your system more modular, makes dependency management a thing of the past, certainly solves all CLASSPATH issues, allows you to have multiple version of the same bundle running at the same time and enables you to start, stop, and deploy new bundles without restarting your framework (or server).

The reality though is that OSGi is a component technology (much like SOA or EJB) and a tool, it perhaps can make a well design system be implemented with more ease and fidelity to its original design, but it can NOT do anything for a poorly designed or organically grown system and in fact makes it more complex. So the right question to consider is what qualities my system has to have so that OSGi can actually help me?

To me, the answer is similar to SOA system, or any granularly componentized system,

1- Granularity and boundaries of components: This is perhaps the most important aspect of a distributed system. OSGi unit of component is a "bundle", physically bundle is a jar file, it can logically be a Java class or a full subsystem - such as Jetty or a large web application - or anything in between, OSGi does not offer any hint here - nor should it. Granularity of modules is an architectural matter. For existing system, this (breaking the system apart into logical modules) is almost always the most difficult step toward any modularization. If your system already does not clear module with defined boundaries, and if it is organically grown, there is no easy or automated way to decide what modules are, needless to say that simply creating one massive OSGi module does not help you at all and simply add one more layer of useless abstraction on top of everything else. Your best bet here is to use a dependency graph tool and try to isolate and bundle packages/jar files based on some topological sort method. This often requires refactoring of existing code to remove bad dependencies and transform the graph into architecture layering you intend. This brings me to the second aspect of a well design modular system.

2- Layering and velocity: In order to define your logical modules correctly, you need to define some form of layering that informs your dependency management i.e. a lowest layer (let's call it Kernel), with no dependencies but the standard runtime, one layer above (say Core), with dependency on Kernel. Core is a layer and may include multiple logical module (jars/packages), then you may have Service and Application layer etc. You need to decompose and map your entire code based to your own pre-defined layer and in addition decide a velocity (release cycle) for each layer, as well as whether your release of lower layer would be forever backward compatible or it would impact higher layers. (more on this in 4)

Again OSGi does not offer help here - and nor should it - it simple is a technology.

3- Dependency Management: defining layers does not guarantee that dependency schema will be enforced, still you need to manage it (preferably using tools and automatically) to make sure that for example your Kernel does not depend on your Core layer or there is no cyclical dependencies among your logical modules. More tricky yet is the nature of dependencies. If dependencies are not managed you may notice that there is a very large module in say , Core, layer with a large number of services and applications depending on it. At first glance it may look like a useful module! but the large size should make you suspicious, often time the upper layers simply leak what should be located in application or service layer down to lower layer - lack of engineering disciplines, knowledge, time of all above.

In this case, OSGi would help you capture and discover the dependency, but does not tell you that they should not be there to begin with.

4- Version-ing policy: As I said in (2), well designed systems has layers, from “lower” layer to “upper” layers – based on topological sort of dependency graph. Typically each layer has a version number visible to other layers (some may choose to have each module in a layer to have a version number visible to all other module in upper layer, this makes life a bit more difficult for module in upper layers). One should decide how many active version of each layer (or module) to be active at any given time. This, seemingly straightforward, decision have significant implications, options are

- If at any given time you maintain only one version, everything is a bit easier, but you either have to maintain perpetual backward compatibility or force all the upper layer change at the same pace with lower layers.

- If you maintain multiple version, you don’t need to be backward compatible and you may transition upper layer gradually – very desirable. But you have to deal with two version at the same time (not only at runtime, but development branches, testing …)

For most web application, people maintain one version and deal with the downsides – often in form of a backward compatible changes. OSGi can help with maintain multiple version at the same time - something that is certainly useful for client side application, for web application most people I talked to are not planning to use this feature.

5- Testing strategy: Distributed systems are tough to test. A monolithic system is one large binary, you can build, deploy it and test it. For distributed system, test environment has to be setup, one would build only his module, the other module you depend on must be ready (either as out of process services in SOA, or bundles in OSGi) and have the right version. If you are using, say, five modules and each of them has two active version, there are 32 possible combinations you need to test (to be exhastive) – one reason having only one version at the time is often preferable. Again OSGi does not help you with designing your test strategy, you should have one regardless of technology you use to modularize you system.

6- Deployment: Last but not least is deployment of your system/web application. You need to decide whether to deploy OSGi framework as a web application under you Servlet container, or deploy your servlet container as a bundle in your OSGi framework. If you are using SOA, you need to decide where to deploy each service and how to bundle service stubs with your application (if any stubs are needed), or you may use a combination by deciding that each service stub is an OSGi bundle. In any case, there should be a clear design for correct deployment of a distributed/modular system.

If you design a system in a way that these aspects are taken into accounts, then OSGi probably helps you implement it easier - although for web applications the issue to "two runtimes" is a bit too much for my taste - but then again if all the above aspects are taken into account, you may not have an urgent need to OSGi anyway (banks offer credit of people with good credit but probably the don't need it anyway....). Often times engineers and managers who work on poorly designed or organically grown systems, and in an effort to reduce complexity and increase productivity of people working on them, stumble upon OSGi...if you fit into this category, my recommendation is to focus on fixing the underlying issues that makes your system complex, inter-dependent and coupled. Until you do that, OSGi (or any other alphabet soup of technology) will not help you.

Sunday, January 2, 2011

Kingdom of Nouns ...

The other day I took advantage of a rare couple of hours of peace and decided to catch up on my blog readings, I came across a nice post by Joel Spolsky from the Joel on Software fame on Map/Reduce. The post is probably one of the clearest writing on origin of Map/Reduce and what it does - if you haven't read it I highly recommend it, even if you are a Map/Reduce pro - in the course of explaining Map/Reduce he talks about how thinking in terms of functional languages made inventing Map/Reduce possible and implying that if one only think in term of OO languages where functions (or verbs) are not first class citizens coming up with an abstraction such as map() or reduce() function is tough. He includes a link to another interesting post title "Execution in Kingdom of Nouns" that contends that Java is a kingdom of nouns and verbs are "owned" by nouns. I like this post too, especially the witty narration. However I disagree with conclusion (or implication) that one can not effectively or elegantly model certain class of thoughts and abstraction with a statically typed language like Java.

Look at any natural language, they ALL seems to be kingdom of nouns. Look at how babies start to talk, they almost excursively uses nouns for a while and THEN add verbs to form sentences (or form thoughts). The core example of the post for taking garbage out that I copied here:

get the garbage bag from under the sink
  carry it out to the garage
  dump it in the garbage can
  walk back inside
  wash your hands
  plop back down on the couch
  resume playing your video game (or whatever you were doing)

(italic emphasis is from original post)

This example is there to show that a normal function is a sequence of verbs and

can be easily expressed without needing a "noun", but the noun here is hidden,

it is the "subject" that is actually doing it. In a function language like JavaScript

the replacement would be the "global" scope, or some form of function that author would

create called "takeOutTheGarbage()" and call it from different "context"s, those context

would be the nouns.

I admit,I am not a language guy, I am more of a modeling guy but i do think that

type spaces formed around collaborating nouns (subject, objects etc.) are capable of modeling and asbtracting

any concept/thoughts. I do agree that some folks in the "Javaland" overuse classes and have too many

factories and adapters, mediators, visitors etc. but this is more of a "fashion" issue than a basic

OO modeling issue.

I also agree (and have seen) that functional languages are better for certain tasks

but I wishing for live in a "kingdom of verbs" is a bit extreme in my view, after all

verbs are all performed by somebody/soemthing unless you prefer to live in the land of anonymous.

Sunday, December 26, 2010

Best description of what architecture is ... by Charles Darwin (yes, THE Charles Darwin)

Here is a quote attributed to Darwin:

"It is not the strongest of the species that survives, nor the most intelligent that survives. It is the one that is the most adaptable to change."

Right there, for me, is the best definition of software architecture I have seen. Software architecture is whatever you do that allows your system to be adaptable to whatever changes its environments throws at it (scale, hardware failure, legal changes, change of business model, new forms of distribution etc.), better yet Darwin goes to to say

"In the long history of humankind (and animal kind, too) those who learned to collaborate and improvise most effectively have prevailed."

So in his view the keys to adaptability are collaboration and improvisation, which I'd translate it into modern systems as "Communication" and "Experimentation". Communication is THE key to collaboration and a quality lacking in a large number of leading engineer/architects. The core communication skill here is not about communicating what one's particular solution to a problem is, but more a style that is about inquiry, a style that illicit input and views from all significant parties and synthesize them into models and hypothesis and communicate them back and form common understanding (does your architect do that?)

Experimentation is both a personal quality and something that must built into any system as a first class architecture requirement. Experimentation is generally under-valued in architecture discussion. Systems in general have multiple views, the most popular being the 4+1-views, I would add an "Experimentation View", a view of architecture that describes which sub-systems can be experimented with and in what ways, and how would an experiment success be measured. More on "Experiment View" later, but for, I am even more convinced that analogies with and inspiration from biological systems is worthy guide for designing software systems.

Wednesday, December 22, 2010

Cannonical Use Cases for a Relying Party

I have always felt that in identity community we spend most of our time discussing identity providers and their concerns (such as token format, protocol etc.) and do not spend enough time on and attention to relying parties. At eBay we play both roles i.e. we are both an identity provider (we provide sellers' identity and attributes to 3rd party developers) and relying party (accepting identities provisioned outside eBay marketplaces as eBay users). I can say that building a architecturally sound relying party is as challenging as building an identity provider.

In this post I simply want to enumerate the major use cases that any major relying party (that is anyone that for example plans to accept Facebook Connect) has to account for. I used the qualifier "major" to denote that these use cases are important if a given relying party has millions of uses and many services and applications (much like eBay does).

The list of use case are:

1-Sign-In and out

2-Connect and Reception

3-Link/Unlink

4-Profile Access Extension

5-Role Elevation

6-Recovery

7-Disconnect

8-Force Authentication

9-Customer Support

10-Capturing Alternative/Second Secret

and here is the descriptions:

1-Sign-In and out: This includes changes to your standard sign-in page to make it a "federated sign-in" page. The challenge here is mostly user experience i.e. how to design the UI correctly to achieve two goals:
- Not confuse existing users who will not sign up with external IDP
- Communicate to the user of external IDP what they need to do (without creating the NASCAR problem)
There are also techniques for detecting the IDPs that user may have an account with and to show a "smart" list of IDPs

2-Connect and Reception: Once users clicks on "Connect" button (if you are using connect style IDPs such as FB) or entered her OpenId URI (although this is unlikely to be adopted by users), the user is send to IDPs to sign-in and given consent to his/her information to be send to you site - let's refer to this process as Connect - then user is sent back to your site to a page/application that we refer to as "Reception". This is the processes that greets the users for the first time and provision an account in RP for him/her. I use the word "reception" to make it distinct from "registration" which is when provision is done based on data collected by RP itself. The reception process is significant b/c it covers the gap between data received from IDP and what is needed for a user to be provisioned, also it assigns the roles for new user. These roles are typically minimal since data coming from external IDPs are normally not trusted or verified. Also during reception token received from external IDP together with associated meta data is stored in a central location accessible to different functional units (application) of RP

3-Link/Unlink: Another use case (often part of reception) is to detect whether the user connecting to RP is one who already has an account. The detection can be done based on mapping the data received from IDPs to existing account, the simplest form is to check whether email addressed returned by IDP already exists.Once an account is detected, user has to prove s/he actually owns it (normally by providing password) and the accounts are link. Since architecture hygiene calls for symmetric operation, you should also allow for unlinking of accounts.

4-Profile Access Extension: RP obtain a token during reception (such as OAuth token that comes with FB Connect), this token stores a set of access permissions to user resources (perhaps hosted by IDP). Any large RP has a set of applications that will use this token (for example MyeBay application as well as eBay Search Application) it is likely that one of these applications requires more information/access privileges that user originally consented to, in these case RPs should provide a central capabilities that conduct the process of requesting, receiving extended permissions from user and updating token meta information associated with user

5-Role Elevation: The first time users connect to RP they are granted a certain role (roles), normally this is a basic role since data provided by most IDPs are not reliable (eBay as an IDP does provides verified reliable data), at some point during the user's life cycle, users needs to perform an action that requires higher role assignment, in this cases RPs should provide capabilities to assign users higher role, this normally requires users to enter more information or go thru verification. This processes produce more attributes that will become part of users profile at RP.

6-Recovery: Every RP always has to establish a method for externally provisioned identities to authenticate WITHOUT the presence of external IDP. What does this mean? suppose you accept FB Connect and FB is down for 6 hours (an event that recently happened), further imagine that you operate a site that every minute of users not being able to login means financial loss. What do you do in this scenario? You may say, this is easy, ask users to enter a password during the first time reception, but wouldn't this defeat the whole purpose (or a big part of it) of users not having to remember many passwords?

7-Disconnect: All RPs must provide the capability for a user to "disconnect" i.e. close the account that was created based on an identity provided by external IDP. I personally believe that user owns his/her data and if user wants to disconnect and remove all of his/her activities from the record. s/he should be able to (to the extent that is legal)

8-Force Authentication: This is actually a capability of IDP, but RPs need to use this when they require user to be authenticated regardless of session's authentication state as seen by IDP. For certain operation RPs require a fresh session (or session that started in the past N minutes), in this cases RPs should request a forced authentication (I am using SAML terminology here) from IDP.

Thursday, October 7, 2010

MongoDB is Web scale ...

This is a funny clip - produced by the xtrano rmal technolog - courtesy my friend Gunnar Peterson. It pokes fun at people jumping on No SQL and MongoDB bandwagon. I have to say I can relate to the sentiment, No SQL, KV storages etc. are suited for certain use cases and access pattern, but a vast majority of day to day use cases can be handled just fine with SQL.Even if you are Mongo fan it is fun to watch this.

- Warning: The clip uses adult language.

Sunday, September 19, 2010

what is a "Platform"?

Usually the answer to the question “What is X”, at least in the context of software engineering, is given in two different ways:
- What does X do or what is X supposed to d.
- How does X work.
The former is more of philosophical answer and the latter more of a pragmatic one. For example, “what is a service?” could be answered in one the two ways:

- A unit of functionality that is exposed thru a well defined interface and is loosely coupled to its consumers, it is autonomous, reusable, discoverable and stateless.

Or it can be answered as

- It is a unit of code exposed thru a WSDL and invoked using SOAP and it is language neutral

Those who know me know that I am more inclined toward philosophy. So when I attempt to answer “what is a platform?” – as I had to recently when we were building eBay Application Platform - I opt for what it does.
To me the answer is simple, at least in the realm of software engineering:
A software platform is any set of functionality that increases developers’ productivity, plain and simple.
Operating systems do that, languages do that, APIs do that so do IDEs such as Eclipse. So what is the difference between tools and platforms? Tools are not programmable, platforms are. In other words developers can “program” platforms to suit their needs. In other words tools are used to accomplish one task, platforms can be used (i.e. programmed) to perform different tasks. Some platforms start as tools (like Eclipse, Excel) but evolve to become a platform.

Why, besides philosophical clarity, is this important? It can be used to define a clear goal and metrics for success of whatever is called a “Platform”.

Tuesday, September 7, 2010

Authorization: One Step At a Time

In my experience authorization (much like identity and authentication) is a poorly understood topic by most engineers, architects and product managers. The prevailing narrative about authorization is magic box protecting a resource that knows every policy applicable to a resource and how to correctly enforce them, or at least know who can access the resources and in what way.

Both of these views are inaccurate (or partially true) and often lead to construction of single layer, complex to implement and impossible to manage systems.Authorization by nature is a hierarchical filtering mechanism; the operating keyword by far is hierarchical. The successful authorization systems are the ones that consist of several collaborating layers of authorization and filtering, each layer controls one dimension of access.

For example, imagine a company with a few departments: Executives, Marketing, Accounting, Sales and Product Development. Further imagine that each department has resources (data and services that operate on data) and applications (software users use to access services, view and manipulate data). In particular accounting has three applications: a data entry application, reporting application and a full book management application (web based or native app does not matter here). Here are the logical authorization rules expressed as typical requirement statements:

1. No person or application in marketing can access any resources in accounting
2. Data entry application cannot access account payable, any payment services or reporting services
3. Reporting applications cannot make any changes (write, edit) data
4. Full Book Management application can perform any function
5. Only Accounting manager can pay an invoice greater than $1000
6. Only CFO can run quarterly profit and loss reports.

Do you see the hierarchy here? Can you translate it to AuthZ system hierarchy?

Rule (1) talks about a large granularity “department”, rules 2,3,4 talk about applications and rule 5,6 talk about roles within a particular application or set of apps.
The first rule should be enforced thru a router or a gateway that blocks access to any application from marketing department. That is an effective isolation mechanism implement an authorization rule.
Second set of rules (2,3,4) should be enforced via a system level guard that only operate in request headers and tokens binded to them. Examples of such systems are ESB or pipeline style authorization handlers.
The last set of rules 5,6 should be enforced with an application level authorization system or guard that is aware of different roles within an application and their privileges vis-à-vis resources.
Now what happens if you collapsed three systems into one? Well in short the authorization system becomes complex to implement and tough to manage and three different layers with three different velocity of change would become one constantly changing piece of code.

The authorization system must scan everything in request, from originating IP address, to headers identifying calling application to payload determining parameters of operations. It has to understand a wide range of concerns from deployment (impacting IP addresses) to business logic ($1000 limit).

Authorization is tough but single layer authorizations systems like this are nightmares of manageability.

Imagine what would have happened if all checks at the airports (from entry to the terminal till when you sit in your seat inside the plane) would have been performed by the security officer upfront who today only checks your driver license and matches that with your ticket? At the airport there is three different levels: The guys who check your driver license and ticket, the TSA guys who check your bags and the crew at the gate who check your ticket and make sure you don’t sit in first class.

Sunday, August 8, 2010

CAP Therom and Digital Identity

If you read this blog chances are you are familier with the CAP theorem, it basically states that any distributed system operating at scale can choose at most two of the followings three:
- Consistency
- Availability
- Partition Tolerance

There other examples of pick any "2 out of 3" in life are:

- The management rule of thumb: Good, Cheap, Fast
- Graduate student dilemma: Fun, Grades, Sleep (replace fun with your own idea of it)
- Investment advice: Low Risk, High Return, Legality - if you pick low risk and high return chances are you are compromising legality :-)

They way I look at all these "rules" is that the space of each of these domains offer only two degrees are freedom and once you choose two points (that effectively determine or fix your degree of freedom, the third point will be chosen for you)

For example in "Good, Cheap, Fast", your degrees of freedom are basically time and money, once you choose how much time and money you want to spend all three qualities are determined. So now, if you choose time and money not directly, but indirectly via the choice of say good and fast, you automatically also chosen "not cheap".

Interestingly digital identity offers the same 2 out of 3 dynamics among the three main attributes of

- Quality of Identity
- Usability
- Cost

"Quality of Identity" is a measure of how uniquely a set of data represents a real world person and how strongly an IDP stands behind such assertion (for example whether IDP guarantees up to a certain amount damages resulting from inaccurate data), usability is how easy it is for IDP to provision such identities.

It is clear that if an IDP chooses to provide high quality identity and also wants to makes its provisioning easy to use (or scalable for that matter), it has to spend a lot of money.

In practice though, IDPs segment the user base and only provide high quality identity for users to whom maximum credit are extended (e.g. users who can sell the most on eBay).

Friday, June 25, 2010

Wanna Make Your Life Hard? There is an app for that, or maybe 100,000

These days it is not uncommon to find some one with 100+ apps installed on his/her phone (maybe you are one of those?). I imagine most of those apps are used once or twice and will never be used, the only time you see them (or their icon) is when you scroll past by them to get to your 2 or 3 (or five) apps you actually use frequently.

This app mania clutters UIs and waste a lot time, and I can't help but thinking "do we simply forget all the lessons and progress of last 10-15 years simply because a device came along with attractive aesthetics?"

What happened to "browser is the operating system?" in the days that most desktop apps are moving to cloud and being delivered thru browsers, why is it that most (if not all) mobile apps are native code? Yes native apps still have certain capability that browsers do not offer YET, but with HTML 5 and Web Kit (and maybe with a little bit of industry support to close the remaining gaps) browser based apps would be sufficient for large majority of mobile apps.

Don't get me wrong, there will always be need for native apps, but much like with desktop, over time only a few apps will be privileged to be native, most other will and should be browser based standard, write one, run on any browser apps.

This makes development of mobile apps simpler and more cost effective, instead of maintaining three apps for iPhone, Andriod and Blackberry, most developer only maintain one version, and you only install a few apps that truly need to be native apps.

Friday, April 2, 2010

Key to Scalability: Distributed System Development

Today I came across this nice presentation about Google internal architecture practices by Jeff Dean.plenty of valuable advice and common sense (the most uncommon of all senses). I just wanted to highlight one item that I feel is a bit under-appreciated - on page 20 when he talks about key attributes of distributed system, there is on bullet point that reads:

Development cycles largely decoupled
– lots of benefits: small teams can work independently

On one hand this is so obvious! After all they are "distributed" systems, how can they have coupled life cycle, on the other hand in so many people complain about lack of productivity and agility in large "distributed" systems. When you look closer, you find out that they have fairly decoupled system initally, more or less get the boundaries right, however they coupled the life cycles of all applications and services together !!

This means the whole organization releases with one giant push, the whole system all together. Everyone has to be on the same page, over time this unified "beat"brings down the boundaries and soon "the distributed" system becomes a monolith.

I will write a few more entries about the basic principles that enable true distributed (and autonomous) application development.

By the way, Here is eBay version of scalability advice (a bit dated but updating it is in my todo list) by Dan Pritchet.

Wednesday, March 31, 2010

Meetings kill Loose Coupling

There is a brilliant, yet underutilized, sociological observation called the Conway’s Law. It was stated by Mel Conway in his 1968 paper. It states that:

Any organization that designs a system will inevitably produce a design whose structure is a copy of the organization's communication structure.

The key phrase is “communication structure”, that is roughly, but not always, equal to organization structure.
You may have not heard of or read his paper, but I am sure you have seen the effect of his observation:

If you have two teams called Application and Kernel, then chances are you end up with two deployment units called applications and kernel (jars or folders or zip files or any other form of bundling code), on the other hand if you engineering organization has two teams called Books and Games, then you end up with a book and a game application, if there is no other team/group, it is unlikely that you can see a kernel or core subsystem that encapsulate the common construct between the two.

Now, we all know about the principal of loose coupling, and how it enables flexibility, efficiency, increased productivity etc. One very effective way of creating loose coupling between two system (let’s call then consumer and producer) is to make sure that members of the producer team do not meet with members of consumer team! You may convey requirements to them, but forbid meetings. If there is not communication, it is hard to build hard coupling. It is a bit strange and counter-intuitive, but we have used this with success in building key infrastructure services.

I have found out that a good early predictor of the level of coupling (or quality of interfaces in general) of a system is the list of invitees to their early design meetings. Whenever one of few engineers from a future consumer is part of the meeting, there is a good indication that they systems will be somehow coupled (either thru the types or domain values of parameters passed or the way errors are handled, the marshalling of the result, or even the terminology used) – This of course is a generalization, the team somehow has to collect requirements, or share result s and get early feedback, this is all OK, but if your goal is to create “loosely coupled” system, you should make sure your communication structure, are loosely coupled as well.
Take a service that verifies a credit card number to ensure validity and that a provided name indeed matches credit card issuer record. This service may have a method/operation in its interface:

Result VerifyCreditCard(UserId id)

It assumes that somehow the service can obtain a credit card number from a supplied UserID. This is a very tightly coupled interface that shows the service provider has had too much knowledge about its consumer.
Here is a less tightly coupled (better) example:

Result VerifyCreditCard(CreditCard card, BuyerName buyerName)

This is not too bad, but the choice of “BuyerName” indicates that the provider has the knowledge that consumer (the one she probably met with) happens to deal with principals that are buyer.
Consider a loosely coupled version that one would probably write if there is no additional knowledge of potential consumers

Result VerifyCreditCard(CreditCard card, Name name)

Communication channels are very important interface design, this means people in each design meetings should be selected carefully, if system A should not be dependent on system B, the best way to ensure it, is to reduce communication between developers of system A and system B.

Tuesday, March 30, 2010

Evolving Gracefully - Example I

In my posts, "What is Architecture", Part I and Part II, I argued that architecture is essentially what enables a system to evolve gracefully and traverse an optimal path in response to changes (requirements, assumptions, context, inputs etc.). In this post (and in a few more) I give examples of such changes and possible optimal/graceful (or non-optimal/disruptive) responses of a system.

Take "Accessibility Requirement" as an example. Few software systems are designed with the explicit goal of accessibility as one of the main architecture and design goals. However, some systems at some point must comply to certain accessibility standards and guidelines (it is good usability practices, the right thing to do, good business and the law) - see a very good introduction to accessibility from WebAIM.org here.

Now, if some one came one morning and asked you to make sure every image in every single HTML page of your site has an alt tag, what would be your response? Remember this maybe 1000s or image tag in 10s or 100s of applications produced by developers across three continents.

If your architecture (and the design and implementation of it) decoupled model construction from actual rendering, and a centralized rendering engine that takes a data structure (such as DOM) and uses a rendering strategy to produce XHTML, then your architecture (and you) has no problem responding to this new requirement easily. You'd simply make sure that when the render visit an element, it finds an alt attribute. This is an optimal response to change. Although the system was not particullary designed with accessibility in mind, since it was built based on the right architecture, it can evolve to accommodate change.

However, if the 1000s HTML pages on your site are produced by 100s of JSP scripts, a few PHP scripts, some XSL, a few instances of dynamically created tags in JavsScript etc. You would have a nightmare on your hand. This is an example of architecture that can not evolve ...

Friday, March 26, 2010

What is Software Architecture - Part II

In part I, I offered my view of software architecture by illustrating what "architecture" does for you: Allows you to deal with change optimally. This is basically saying architecture is what allows a system to change with out changing fundamentally what it is (forgetting about Dialectics for a moment). In other words:

Architecture is what enables a system to evolve gracefully

You may ask, how else a system would change? How is this for an example? Don't let the glitz and glitter fool you. This is an example of architecture that fail to evolve according to the changes in its context and accumulation of demands for change finally resulted in its destruction. However, I doubt that you see an implosion of Taj Mahal - architect Ustad (Master) Ahmad Lahouri - any time soon.

Now you may say, well these are building, what do they have to do with software? I don't want to over play the building metaphor, but software systems acts similarly. How many times you have seen a multi million dollar investment in, say, customer support system or CRM system or messaging infrastructure or authorization system etc. only to scrap them two, or three years later by another multi million dollar system?

On the start up side, often the first and only concern is to make it work and to survive, but when time comes to scale, they often have to re-write from scratch, that is the software counterpart of a Vegas hotel implosion.

In my next post, I will give you a few software and system examples of graceful evolution courtesy of good architecture practice and "implosion" for lack thereof.

Thursday, March 25, 2010

What is Software Architecture - Part I

My job title carries the word “architect” - although my Mom never fails to point out that I am not a real architect like my brother is. This title among the software engineering community in general and in the Silicon Valley in particular almost always is received as an incomplete description of what a person does. People always expect an additional clarification; they give you an inquiring look “… so what is it exactly that you do?”
The title reminds me of “food supplement” industry, unlike drugs it is unregulated and anyone can claim anything from magical weight loss pill to cure for emphysema powder. In our industry too, anyone can claim the title and define it as he s/he sees fit. The title also conjures up images of a two-class software society: a lower class of software engineers who actually code and a ruling, elite class of architects that do not code or even read and understand code– and inevitably overtime lose touch with reality. Out of concern for the latter, some companies even eliminate the title altogether.
But what do exactly architects do?
I try to answer this question by answering the closely related question: What is Software Architecture?
Let me get to the bottom line first and explain later: I view architecture as a discipline that enables system to deal with change in optimally.
There are two key terms in this definition, change and “optimally”.
Let’s start with change.
Non-trivial software is a system, and like any other system is subject to change over time. The change happens in all aspects and along all dimensions: scale, cost, expectations, visibility, criticality, operating environment, competitive landscape, technology, economy, people, organization etc.
One prominent and widely known example of “change” is rapid growth in demand from a software system or what is commonly known as “scale”. Our general expectation of well architected system is to handle the scale gracefully and with no disruption. However some systems crash and need to be fully re-designed for the new level of demand (remember early days of Tweeter ?)
To be presicous, this type of scale is scalability to handle homoginous user i.e. all users of eBay were assumed to be individual buyers and sellers (no medium to large sellers), all facebook users were assumed to be college student or all users of an enterprise system could be assumed to be internal users.
Another example of change is scale of operation or non-homoginous user demand i.e. the system dose not necessary experience a change in volume of demand from one type of users, but has to handle demand from different types of users for example eBay wanting to deal with large merchants, or facebook wanting to open its platform to everyone (not just college students).
These were only two example of change in form of scale. What differentiate a well architected system from a “system that works” is how well systems respond to those changes.
The second key term in our definition was “optimally”. Optimally means the change can be absorbed and handled by the system with no (or minimum) disruption and in other words system is designed in a way that the scope of change is predetermined and extent of it is contained: no crisis, no building the system from scratch, no revolution! They system simply evolves along a smooth path. That is the essence of architecture (at least the software kind).
The pictures below depict the graceful and disruptive response to change:

This one, shows the same change scenario but one where change is handled gracefully, presumable with a well architected system.

In the next post, I will give a few examples to make this view of what architecture is clearer.