Answering the “when” query
On a earlier submit I made the case that survival evaluation is crucial for higher churn prediction. My fundamental argument was that churn will not be a query of “who” however somewhat of “when”.
Within the “when” query we ask when will a subscriber churn? Put in a different way how lengthy does a subscriber keep subscribed on common? We will then reply one of the vital necessary questions: What’s the common subscriber life time worth?
Let’s roll up our sleeves and dive proper in: The survival curve S(t) measures the chance a subscriber will “survive” (not churn) till time t since beginning his subscription. For instance S(3)=0.8 means a subscriber has %80 likelihood of not churning by the third month of subscription.
The most typical means of estimating S(t) is by utilizing the Kaplan-Meier curve who’s method is given by:
the place t_i are all occasions the place at the very least one subscriber has churned, d_i is the variety of subscribers who’ve churned at time t_i and n_i is the variety of subscribers who survived until at the very least t_i. We will consider the time period d_i/n_i because the churn fee at time t_i.
For instance let’s calculate the survival curve for the next subscriber information:
The column t denotes the time a person has been subscribed till right this moment. If he churned that might be the time until he churned.
Now we have 2 occasions at which churn occasions occurred: t_i = {2,6}.
For t < 2 we’ve got S(t)=1 since nobody churned as much as that time.
At t_1=2 we’ve got d_1=2 (subscribers 3 and 6) and n_1=5 (all subscribers however 4). Utilizing the above method we get:
At t_2=6 we’ve got d_2=1 (subscriber 2) and n_2=1 (once more, simply subscriber 2).
We thus have:
Let’s plot that curve:
One factor to note right here is that at that each level alongside the curve we solely think about subscribers who survived as much as that time. If a subscriber joined very lately (e.g. subscriber 4) he received’t play a serious position within the calculation.
In observe you’d be higher off utilizing the survival curve implementation within the R survival
bundle or the python lifelines
library.
So why undergo the trouble of calculating S(t) within the first place? Seems that the anticipated life time is the world underneath the survival curve (I received’t go into proving that right here).
So in our instance above:
If a customers’ month-to-month plan invoice is for instance $10 then we are able to say that his anticipated LTV (life time worth) is $44.
On this submit we’ve seen how utilizing survival curves we are able to reply the “when” query — how lengthy is the common subscription. We noticed this may then be used to point what’s the $ worth of a subscriber.
Typically we may very well have an interest within the “who” query as effectively. For instance “What subscribers are most definitely to churn inside the first month of subscription”? On my subsequent submit I’ll present that utilizing survival curves we are able to higher reply that query as effectively!