Saturday, July 3, 2010

Recall and Precision: It is not how many bad guys you caught, it is how many good guys suffered

The measurement of "Recall and Precision" is front and center in all of our fraud prevention measures and algorithms, you can read a general description of this concept and the mathematical definition in Wikipedia, but I have had better luck explaining the concept with this example:

Imagine there is a band of armed rubbers (say 5 guys) in your town and you sent your best cops to round them up. After a day they come back arresting a group of men. How do know if they did a good job?

The obvious answer is whether they have arrested ALL the gang members. So the measurement is "how many gang members have they arrested?" in this regard 5 is better than 4 and 4 is better than 3. Simple

But is that enough? Let's imagine three out comes

1 - The cops came back having arrested 5 guys, all of them gang member. This is perfect, they arrested ALL the RIGHT people, and ZERO WRONG person, recall = Precision = 100%.

2 - The cops came back having arrested 10 guys, 5 gang members of 5 random and innocent guys. In this case recall=100% but precision is 50% - which in this case is clearly not acceptable (even worse they could have arrested all men in the community, recall still would be 100% but precision would be near zero - that is called Carpet Bombing)

3- The cops came back with 3 guys, all gang member, no innocent guys was arrested. In this case recall= 60% but precision=100% - this is called Proof Beyond the Reasonable Doubt i.e. a philosophy of design whereby it is better to let a bad guy go free then to harm a good guy.

In modeling risk and fraud and designing algorithms to prevent them, we always have to measure the algorithm based on their recall and precision. Low precision methods typically cost a lot in term of customer support and friction in user experience, low recall algorithms and method result in higher losses for the company.

in designing a Risk Management strategy, I tend to side with lower recall then lower precision and then manage the ratio of loss/revenue with the higher revenue generated by higher precision - or right customer who were let in.

