CAP Curve
editThe cumulative accuracy profile (CAP) is used in data science to visualize the discriminative power of a model. The CAP of a model represents the cumulative number of positive outcomes along the y-axis versus the corresponding cumulative number of a classifying parameter along the x-axis. The CAP is distinct from the receiver operating characteristic (ROC), which plots the true-positive rate against the false-positive rate.CAP popularly called the 'Cumulative Accuracy Profile' is used in the performance evaluation of the classification model. It helps us to understand and conclude about the robustness of the classification model. ... A plot obtained by using an SVM classifier or a random forest classifier. A perfect plot( an ideal line)
Example
editLet's say you're a scientist at a store which sells clothes and your store has a total of 100000 customers. And placing that number on the horizontal axis. And you know that from experience whenever you send an offer like an e-mail to all your customer or any random sample of your customers approximately 10 percent of them respond and purchase the product. So I'm going to place 10000 which is 10 percent of the total on the vertical axis. And so what we're going to do is where we've got an offer that we want to send and we want to see how many customers are going to purchase our product. If we sent it off so if we send it to zero customers obviously will get zero responses right. What do you think will happen if we send it to 20000 customers. How many do you think will respond. Well because this a random sample and we know that about 10 percent response I would say about 2000 fair enough right. If 40000, if we send to the offer to 40000 of our customers then about 4000, will respond, Sixty thousand six thousand, Eighty thousand eight thousand, one thousand 1000 of our customers should respond. And this is a random selection process so here we can draw a line which will represent this random selection the slope of the line equal to that 10 percent that we know that respond on average to offer as if we just send them out like that. Now the question is can we somehow improve this experience can we get more customers to respond to offers, When we send out our letters so basically Can we somehow target our customers more appropriately to get a better response rate. And how about instead of sending out these offers randomly to say a random sample of 20000 customers how would we pick and choose the customer we send these offers to and how do we pick and choose well to start with let us build a model. A customer segmentation model demographic segmentation model but which wants to predict whether or not they will leave the company will predict whether or not they will purchase the product It's a very simple process it's the same thing because purchased is also a binary variable yes or no. And we can also run the same experiment and we can take a group of customers before we send out the offer and then look back and see who purchased whether male or female, Which country were they in what age predominately were they browsing on mobile were they browsing via a computer and all of these factors we can take them into account them put them into a logistic regression and get a model which will help us assess the likelihood of certain types of customers purchasing based of their characteristics or the general demographic status and other characteristics. .
And once we were'built this model how about we apply it to select customer we will sent the offer to female customer of a bank whose favorite color is red they're most likely to leave the bag here will her we have a similar result will say perhaps male customer in this certain age group who browse and mobile are most likely to purchase a mobile or something else if will tell us something or ill actually rank our customers we 'll give them probability of purchasing and we use the portability to contact your customer, of course, we contact we get zero response, then if we contact 20000 we'll probably get a much higher response rate than just 2000 because we're contacted 2000 were Our response rate will be higher than 4000 which we get in this random scenario if we if our model is real good by the time we're at around 60 thousand so more that just over half of our total customer base and we are really getting to that 10000 mark so we get 10000 people will respond in more then actually more than 9000 we could actually stop her. So now this draws a line through these crosses. so what you see this line here is called the cumulative accuracy profile of your model.
Analyzing a CAP
editThe CAP can be used to evaluate a model by comparing the curve to the perfect CAP in which the maximum number of positive outcomes is achieved directly and to the random CAP in which the positive outcomes are distributed equally. A good model will have a CAP between the perfect CAP and the random CAP with a better model tending to the perfect CAP.
The accuracy ratio (AR) is defined as the ratio of the area between the model CAP and the random CAP and the area between the perfect CAP and the random CAP.[1] For a successful model the AR has values between zero and one, with a higher value for a stronger model.
Another indication of the model strength is given by the cumulative number of positive outcomes at 50% of the classifying parameter. For a successful model this value should lie between 50% and 100% of the maximum, with a higher percentage for stronger models.
On very rare cases the accuracy ratio can be negative. In this case, the model is performing worse than the random CAP.
Applications
editThe CAP and the ROC are both commonly used by banks and regulators to analyze the discriminatory ability of rating systems that evaluate the credit risks [2] [3]
References
edit- ^ Calabrese, Raffaella (2009), The validation of Credit Rating and Scoring Models (PDF), Swiss Statistics Meeting, Geneva, Switzerland
{{citation}}
: CS1 maint: location missing publisher (link) - ^ Engelmann, Bernd; Hayden, Evelyn; Tasche, Dirk (2003), "Measuring the Discriminative Power of Rating Systems", Discussion Paper, Series 2: Banking and Financial Supervision (No 01)
{{citation}}
:|issue=
has extra text (help) - ^ Sobehart, Jorge; Keenan, Sean; Stein, Roger (2000-05-15), "Validation methodologies for default risk models" (PDF), Moody's Risk Management Services
Category:Mathematical modeling Category:Business modeling Category:Data science