Monday, September 1, 2014

What is CommerceOS - Part I


I am often ask what is CommerceOS, equally as often I answer based on what a successful end state for CommerceOS looks like: It is a set of capabilities exposed largely as RESTful APIs that allows anyone to rapidly and cost effectively develop applications that facilitate commerce between or among real or legal entities.

This means if you want to develop an application for a consignment store to sell its merchandize online (on or off eBay) CommerceOs makes it easy for you. If you work for a large multi-national corporation and want to list your products on eBay and on your site, then let local, regional and international buyers purchase your product, CommerceOS is your APIs. If you want to write a pop-up app on iPhone and Android that showcases adventure riding gears and products by mashing up content and pictures with commerce and community, CommerceOS should make your job easier and cheaper. And countless other scenarios like these, you got the picture.

This answer, although accurate from utility and use point of view, does not describe CommerceOS from technical and architectural point of view. A casual observer may not appreciate the technical and organizational complexities that should be overcome to achieve CommerceOS vision. In this series of posts I try to explain CommerceOS from technical aspect.

The rest of this post dive a bit deeper into motivations and problem definitions, in the second part I will focus on how we have approached the problem and architectural details of CommerceOS. In the last part I write a bit about the tough yet important job of  measurement and metrics (both tracking operations and goals).

First a bit of a context. Back in 2009-2010 timeframe (where CommerceOS transformation started) eBay had a global engineering workforce in the range of thousands (as it does today). The collective job of this engineering team was to work on an essentially web-based architecture (like so many other web era companies). This architecture assumed (perhaps more implicitly) that
- Buyer interacted with marketplace largely via a web browser, running on a desktop/laptop.
- Sellers were small(er) and interacted either via web UI to APIs designed to replicate Web UI.
- API code base and the business logic powering web UI were largely two independent code basis
- Data and table were shared among all applications and services
- Both data and code for each subsystem and functionality (e.g. Checkout UI or Payment services) was hard to isolate
- Operation technology was fixed capacity, manually provisioned, non-optimized.

The platform was stable and scalable (albit with cost curve implied but then current state of technology) with its own set of processes that made it possible for global teams to get their job done. However, in 2009-2010, it was very clear that we are a few years into development of trends that would re-shaped (or ram through depending on your point of view) traditional web-based architecture and require a different cost and productivity trajectories:

1- The ever present trend for efficiency and cost control driven by the need to expand or at least preserve margins or reallocate resource. This is natural trend for any market and segment as they become more mature and scale.
2- The mobile and connected device revolution
3- The advances in cloud computing
4- The shift of eBay merchant mix from smaller, more casual, lower volume sellers/merchant to larger, more architectural and operationally formal and higher volume sellers.

(I will talk about how technologies known as Big Data changed and impacted eBay architecture in a different post)

It was very clear that the forces of the four factors above require a different architectural approach from what, then, current web-oriented architecture was. We needed to bring down the cost of both infrastructure and operation as well as increase the productivity of our developers to address #1 above.

To address the multi-screen revolution (#2), we needed more than just a set of APIs (which eBay has had since 2003) - MP needed all screens (included desktop) to consume one set of APIs not two sets of code-bases one power the "main" web UI and the other to power all API/devices. This is a much harder problem than simply developing APIs. Additionally, we needed all APIs, regardless of which one of nearly 100 teams across the globe develop it, to look like a coherent portfolio of APIs (given the laws of thermodynamics, you can guess how tough this is).

To take advantage of promises of cloud computing (elasticity of capacity, related cost saving etc.) we needed our business logic and data to be encapsulated in well-defined and isolated modules, with clear dependencies that lend themselves to packaging suitable for cloud environment. Refactoring code to isolate systems and services is tough enough, but isolating data is an order of magnitude more challenging.

And finally, the larger, more sophisticated our merchants and partners became, the more we needed to use formal integration methodologies and feed-based interactions built on de-facto industry standard models of Order, Product, Inventory, Cart, Returns and less on auction-centered models such as Item, transaction etc. This represented a deep change in entity models and re-architecture of code and, perhaps more importantly, migration from old models to new models.

Given where we were and where we wanted to evolve MP architecture, we came up with a definition of a vision for CommerceOS as the basis for alignment and execution, knowing full well that no one paragraph accurately covers all aspect of this initiative. Having said that, here is the definition we came up with:

CommerceOS is eBay MP initiative to transform its architecture in such a way that large majority of its platform capabilities and business processes are exposed as a set of RESTful APIs in the cloud to all marketplace ecosystem participants, uniformly, securely, effectively and with high quality.

This may sound a bit abstract, but let me parse it:

The primary goal is to "transform architecture" - this means both technology stack (for services and applications) and processes not just technology (it never works that way).

Platform capabilities and business processes: That basically means everything that MP does from identity, verification of attributes, and caching to listing, pricing, order management and search need to be re-designed as RESTFul services and exposed to correct consumers. The modifier "Majority" indicates that some interactions are not strictly RESTful for legacy or integration requirements. We are pragmatic about it.

Our services are RESTful, it is a de-facto standard and we developed a few internal spec to uniformly covers aspects such as use of non-http verbs, security, tracking, internationalization, filtering, constraints etc.

The deployment target for services is "Cloud". This is not a feel good or casual term, it strictly mean that a service must be produced in a way that it can be packaged fully isolated from other services with well known dependency, clear version number. This allow us to deal with a unit of capability as a building block and use technologies such as Docker effectively (we had a few de-tour here with OSGi but course corrected).

"All participants" indicates that the consumers will be applications written not only for buyers and small sellers who interact with marketplace via a browser, but everyone else on all devices including large sellers who never interact via ebay.com, CS and their internal tools, partners such as logistical service providers as well as internal eBay staff.

Then there is a set of non-functional requirements, each key to how APIs are built, they include

Uniform: basically one code base supporting all consumer, with no "primary" web consumer.
Secure: self-explanatory, platform MUST be secure with well defined identity, AM and auditing. Secure also include availability.
Efficient: APIs must be efficient to consume, i.e. application developer must spent min amount of time and effort to consume it (docs, samples and sandboxes), APIs also should be cost effective to operate (see cloud requirement)
High Quality: An umbrella term covering functional quality (no-bugs) as well as performance.

CommerceOS started with this vision definition and context, in the next post, I will describe the architecture and structure of CommerceOS, both as a technology platform and from organization/process point of view.



Saturday, November 24, 2012

Via Quora: How has eBay's review system evolved in the past few years?



The feedback system and "feedback score" is one of the first things associated with eBay (maybe second only to auction). The main purpose of any feedback system is to harness the "signals" community members send to "feedback processor" to encourage and "enforce" desirable behavior. Anyone who ever thought about or design a rating or feedback system knows that the task is not as easy as it may first seem. And the first step is to understand what the different attribute of a feedback system is.

A member of Quora community asked me to answer the question "How has eBay's review system evolved in the past few years?". The question provided me with an opportunity (and a little push) to talk about six attributes of any feedback system. See here for the full answer and if you are not using Quora, I strongly recommend you consider using it. It is addictive. 

Wednesday, March 14, 2012

What really is this "Managed Market Place" anyway?

I work for a division within eBay Inc. called "Managed Market Places". The name is a bit curious. I was asked, more than once and by range of people, what really "managed marketplace" is? Is it a new type of marketplace by eBay (no it is not!), is it a vertical/niche marketplace within eBay (no it is not!), some one on Quora even interpreted it as it means that eBay simply "manages" the marketplace as oppose to growing it! ( if this was the case why would eBay announce that to the whole word by labeling it as such?)

So then what exactly is MMP (as it is known internally) and why is it important?

the nature of the Internet lends itself perfectly to the basic concept of a "marketplace": a mechanism for buyer and seller to find each other. Marketplaces were and still are an important and growing part of the internet. The growing list of niche marketplaces include etsy, zarrly, odesk, airbnb, taskrabbit, yardsellr, zimride and many many more. (not to mention marketplaces from facebook, google, yahoo and other major players)

At the first glance, it looks simple enough: create a site that brings the parties to a transaction together (from buyer and seller of antique to two people who want to share a ride or a room), and either take a cut of the transaction or make money by advertising. This is indeed the basic concept behind a marketplace - or an unmanaged marketplace. Marketplace itself is not a party to any transaction. Buyer and seller deal with each other directly and take the risk (or bulk of the risk) of direct transaction. EBay operated, more or less, as an un-managed marketplace for a while too.

In managed marketplace on the other hand, neither party to a transaction takes a risk, in other word marketplace guarantees the success of transaction, no risk (at least ideally). Of course a managed marketplace can "manage" other aspect of interaction such as inventory, quantity, price, promotions etc. as well but for now we only focus on risk as it is the focus of eBay MMP as well.

The evolution of simple internet marketplaces to managed marketplaces is an important trend, as the Internet users become more sophisticated and demand more from services they use online. The AirBnB incident back in July of 2011 is a perfect illustration of how "unmanaged" marketplaces will be forced to offer a higher level of assurance/risk mitigation and become managed marketplaces.

What does it mean from systems and architecture point of view? Here are five main aspects that is particularly different in dealing with managed marketplaces

1- The first significant change is that of people's mind set: You have to see yourself in risk management business, or at least assume that risk management is a major part of your operations. What this changes first, and foremost, is that you now have to identify, assess, prioritize, mitigated (or plan to) and measure risk. In all likelihood all of these activities (and the tools and systems you need to perform them) are new to you if you are dealing with a simple/un-managed marketplace.
2- Central to any consumer risk management scheme is "Identity", and I don't mean OpenID or OAuth or SSO... I meant attribute, assurances, verification, accuracy, uniqueness or mapping a real world entity to a digital identity (Entity Resolution)
3- Data is the core to efficient risk management, and big data and your ability to collect and analysis them becomes central to your ability to operate the marketplace at a reasonable cost (minimum losses)
4- Coherent Architecture become even more important. Simply because your systems becomes more complex and more integrated. A simple marketplace is just that, a marketplace site/application. A managed marketplace would include identity provisioning and verification, risk definition, measurement, management at user and at transaction level, a system for filing claims and disputes, systems dealing with ever changing legal and business landscape that enforces what you can and can not do with data you collect and finally integrating all these system in a productive way (seamless but without coupling them)
5- Even Driven and Complex Event processing: This already has a big role in distributed system, but it plays more and more important role in distributed risk management. Real time assessment of risk becomes critical and due the cost/performance of risk assessment, incremental assessment or risk based on primitive and complex event generated over entire session (or even life time of a user) will be the only practical solution.

Wednesday, November 23, 2011

The Uncommon Security Common Sense

I can not claim that I actually counted or classified all the reasons peoples cite for not taking security (or for that matter sound and well thought through system design) seriously from the start, but the three following lines seems to be the most common ones:

1- The "it is too contained" line: So what is the big deal? at worst it may affect a very small percentage of my users.
2- The "it is too early" line: Oh my system/site/project is too small and we only have a few users, we really don't have time/resources for this.
3- The "it is too small" line: My project is too small or too obscure for anyone to care.

By the way, I have heard these lines or their equivalent not only when it comes to security engineering (or re-engineering) but also in designing business policies or risk management measure to prevent fraud, or in general negative user experiences as well as general system design.

Now to be fair, these reasons all sound like "common sense", after all why would you take on additional cost and time for your project or accept the expense and risk of re-engineering your code to fix an issue that may only affect 1% or 0.01% of your users? or why should you spend two weeks to fortify a system that takes you 3 days to design and it is "just an experiment"? and finally who really cares about a small project some where with some obscure URLs that takes an email address as one of its inputs and shows some useful error message if the email is not registered? does ANYONE really care?

Well, as it turns out, security common sense (like many other form of common sense) is actually quite uncommon ! Let's look at these frequently cited common sense logic a bit closer.

To demonstrate the fallacy behind the first logic (it is too contained, it only affect 0.01% of users) I cannot think of any better illustration than the words of presidential candidate Herman Cain where he said that "for each woman who has accused him of harassment there are probably thousands who haven't" and he is 100% accurate and right! But does that make any difference? In all likelihood his presidential bid is all but over. Or could the Washington D.C police chief during the "D.C Sniper Attacks" have possibly argued that the whole thing was not a big deal b/c only 0.001% of D.C metro population were actually killed and therefore there is no need for massive mobilization of police, FBI, ATF and even secret service !?

The same math is thru for security, it does not matter if only 1000 users out of 10MM become victim of a
poorly secured or design system. What matters is how many people hear and learn about it - and you can be
sure that at least in this day and age that number is a few order of magnitude larger than the actual number of
victims. The sense of insecurity that this causes in the rest of the user community and its economic cost is the real math that matters not the fact that only 1000/10MM=0.01% users were affected.

The second line "it is too early" or its equivalents "we don't have enough time or resources" is the most commons line not only in security matters but also system design and architecture aspects as well. What is interesting here is that the exact premise cited for not focusing on security (or sound design for that matter), is why security should be taken seriously i.e. "I am too new to afford not to be secure", if you are releasing a new product (or brand or a site) you REALLY DO NOT HAVE A SECOND CHANCE TO MAKE A FIRST IMPRESSION. If you are not secure, or if your first few user gets taken advantage off (think of AirBnB incident) you are doomed. To further demonstrate the risk in this argument I submit the following picture of one of the more famous car design mistakes : Honda Odyssey 1998



Honda designed this in a hurry to get into the growing minivan market dominated by Dodge/Chrysler. They decided to differentiate by replacing a convenient power sliding door with a traditional door! Imagine what would have happened of this was a new no-name company without Honda's established brand? Of course Honda corrected the mistake in 1999 model and beyond and went on to have one of the most successful Minivans. But if you are not Honda, you better spend time and money on designers and marketers to tell you,  in the first try, that whoever buys a minivan *needs* a sliding door.

Now we get to the third line "Who really cares about me?" I have to admit that I have the most sympathy with people who resort to this logic. After all it is tough to imagine how capable and resourceful the modern fraudester/hacker community is without actually having a brush with them. I do not get into the details -  if you are interested you can briefly scan Rick Howard's excellent book "Cyber Fraud, tactics, Techniques and Procedure" - for the purpose of this writing I'd suggest you assume the following is true:

In the game of "Who wants to break into my system" your adversary is more motivated (financially or politically) than you are, more experienced than you are, is more innovative than you are, is more nimble than you are, wants it worst than you do, has a smaller cost base than you do (and therefore) all he needs is 0.01% (or smaller) of your users - the ONLY advantage that you have is that you right the rule of the game. Do not give up that advantage easily. You WILL lose the game.

Btw, the end point URL that takes an email and very nicely checks and display an error if email does not belong to a valid user - was actually found (although it was a little obscure URL not linked to from anywhere -  and used to extract valid company X user emails (cost $5000+) from a large list of non-verified harvested emails (cost $50) - a vital part of phishing industry value chain.

Sunday, November 6, 2011

OIX Attribute Exchange Summit - Washington DC

Open Identity Exchange (OIX) is holding this year's Attribute Exchange Summit in Washington, DC.

Identity attributes are core of the concept of digital identity. As federated identity ecosystem getting more mature and adoption grows among more sophisticated RPs  - with more consequential use cases such as government, health, education, commerce ... - so does the need for wider sets of attributes with more accurate and fresh values. This presents both tough challenges and opportunities for IDPs.

The challenges center, as one may expect:, around aggregating, correlating, transform and maintaining fresh copy of attributes in a cost effective manner and in a way that it does not compromise the privacy (and other rights) of the principle owner. IDPs can differentiate based on the range of attributes they provide in this way and there in lies the opportunities.

I will be talking more about identity attributes, their life cycle, uses cases and how they help establish and elevate trust among parties to commercial transactions (online and off-line) as part of a panel with Don Thibeau, OIX/OIDF chairman and Abbie Barbir, VP BoA.

If you are planning to attend, I'd be happy to hear from you.

Monday, October 17, 2011

OAuth vs. OpenID Connect ?

OpenID Connet 1.0 Spec is finally released (actually it was release back in Aug). Its release was accompanied by two predictable categories of questions/sentiments, one not very well informed and the other one a legitimate question:

-        OpenID is dead
-        OpenID Connect is really OAuth so why do we need a new protocol?

Granted, this is normally coming from software engineers and social application programmer community and not from identity community, but I feel they are significant enough to be addressed, especially at the time that more and more entities contemplating to become identity providers and they need to decide which protocol they should implement.
First, on the demise of “OpenId”:  It is true that the earlier versions of Open ID (version 1 and version 2) are, for all intent and purposes, depreciated and will not gain a whole lot of traction. But the general idea of “Open” standards for communicating between RPs and IDPs that enables users to provision fewer accounts and have a portable identity while still maintaining control over their privacy and data is alive and well and actually is even more vital than before.
Second, on relationship between OAuth and OpenID Connect, OAuth is a general protocol for authorizing an agent to access a resource on behalf of resource’s owner. OAuth does not assume any particular knowledge about the resource itself. What does this mean? Let’s go back to the canonical OAuth use case of a user who would like to authorize a printing services to access her photos from a Photo service provider. Now imagine that the photo service is slightly sophisticated and recognizes a few properties associated with photos e.g. resolution, size, whether they are shots with no humans, and if there shots with humans, who appears in the photos – basically let’s assume the resource served by SP has more semantics that simple “access”.
Now imagine that the user wants to grant access to only JPEG photos of himself and not a full access to all photos. How would the IDP encode this semantics in the authorization request and response? How would the SP know that they should only provide access to a subset of images?
To be sure, this is doable using OAuth, but the implementer has to add additional parameters to request and response or possibly constraint the input values of some other parameters.
A protocol that is built this way to access a specialized resource, would be a photo access protocol built on top of OAuth.
In essence this is exactly what OpenID Connect it: It is a protocol built on top of OAuth that supports features that are often desired and used when the resources being delegated is “identity” and attribute about an identity.
To illustrate the point, here are what we, at eBay, had to do for an internal authentication protocol on top of OAuth:
-  
      Force Authentication: adding parameters to authorization request to force users to authenticate no matter what the authentication state with IDP is
-        Authorization Behavior: adding parameters to authorization request to indicate to IDPs whether it should display the consent page and how to display the login page (overlay, full page)  
-        Standard Claim Set: Defining the default set of attributes returned by IDP
-        Requested Attributes: adding mechanism to allow RPs to ask for additional attributes, and annotating them to indicate whether explicit user consent is required.
-        Authentication Context: adding a fragment to response to communicated authentication context (single v.s multi-factor, PIN vs. Password, number of retires etc.)
-        Protection: adding parameters to indicate how access tokens should be protected (encryption, signature and order of operations)
-        Token Validation end point: adding an endpoint to introspect access tokens on demand.

These are all features and facets that OpenID Connect enables in a standard and interoperable fashion. In absence of a standard such as OpenID Connect though, any RPs integrating with our IDP had to implement basically a proprietary protocol, be it on top of OAuth.
The point is that if you want to operate an IDP and you want to use just OAuth, you have to add a few things to OAuth, depend on the depth of your requirements, to make it work for “Identity” resource. This is exactly what Facebook did with FB Connect – and they also did a good job of wrapping it with JavaScript plug-ins. The goal of OpenID Connect is to use OAuth as the basic access authorization protocol and add identity specific features to it so that it becomes a standard “identity protocol” that can enable seamless interoperability. 

Wednesday, October 12, 2011

PayPal Access & Commercial Identity

Today eBay Inc. announced an identity and attribute provider product called PayPal Access. Some described as a "Facebook Connect for Commerce", others described it as an easy registration tool for mobile site. Today at the X,Commerce Innovate Conference someone suggested to me that this is the first step for eBay Inc. to offer full cloud based user management for e-commerce sites and merchants. You can also see the official press release from eBay Inc. here.

Most of the press and coverage today focused on "Consumer Identity" - or more accurately Consumer Commercial Identity - and the benefit of PayPal Access for consumers and online merchants visited by those consumer. Consumer identity is indeed one facet of "Commercial identity" - but there is another side to commercial identity, a less understood - and arguably less sexy - side and that is Merchant Identity. What do I mean by this? Let's look at a scenario:

Merchants themselves are consumers of so many online and offline services (think of it as B2B services) - a company that sells on eBay - or any other online channel - has an eBay account, an account with a shipping company (FedEx), a Facebook account, perhaps another account with a email marketing service, bank account etc. Clearly merchants suffer from the same "account and password hell" that consumers do - but this hell is a lot deeper and hotter for merchants, consider these facts

- Most merchants have employees/contractors who create these accounts on behalf of the merchant,
- A lot of these employees (for smaller merchants) are part time or temps
- Employee turn over is high

Here in addition to the usual forgetting one's password - which for merchant leads to loss of productivity and money - sometimes the person who created the account simply leaves - if you are lucky and s/he good terms, you end up having to chase the employee and restore your access, if not, you are exposed to unauthorized access by the employee or down right "account take over".

You might say, what is the difference between this and consumer identity, these employee are consumers to and technically there is no difference. But look closer. Merchant use cases are fundamentally different. In consumer identity use cases, a consumer is a principle and gives consent on his/her own behalf to a agent (another site or application), the IDP itself recognizes the consumer is the principle and allows her (and ONLY her) to change or revoke this access. In Merchant cases, what appear to be the consumer is really not a principle binded to the merchant identity but an employee. In this case IDP must recognize this "hierarchical relationship" and allow and "admin" employee of merchant to monitor and manage the life cycle of tokens (and identities) of employees.

In the use case above, merchant X would not reveal its primary eBay user name and password to any employee, the would provision an account for each employee. Employee then logs into eBay using her own account - and via PayPalAccess - All the while PayPal Access monitors and manage all the tokens issued to all employees of merchant X. Should an employee leave or changes function, the token can be revoked by merchant X admin regardless of employee's decision.

If this sounds familiar to LDAP or ActiveDirectory, b/c it really serves the same function: Enterprise Identity, in this case enterprise is really a merchant. This is not unexpected in the world where enterprise identity, consumer identity (a.k.a social identity) are converging - and there is a need for a cloud based enterprise user management.

Please note that this is NOT an annoucement (or leak) for PayPal Access Cloud-Base user directory. IT IS NOT, REALLY. I just wanted to point out the there is two sides to commercial identity, a sexy side (consumer) and a side that can make you money (merchants).

In the next post, I will write a bit about Consumer Commercial Identity and how it may be different that social identity.