See here for the full iPython Notebook code. Some of the descriptions are outdated but the code is almost the same.
After getting to know what lifetimes can provide, I started applying it from a financial perspective. I wanted to answer the most important question for Zakka Canada: Leveraging our customer analytic models, how can I estimate the firm value that Zakka Canada is worth as of today?
The rest of this post is divided into two parts: 1) modelling the monetary value of our customer base and 2) estimating the price of Zakka Canada through a simple present value cash flow valuation model.
I. The Gamma Spend/Profit Model
Currently our bg/nbd model talks in expected number of transactions while financial valuation models talk in dollar amount. Thus, we need to introduce a transaction value model alongside each unit of our transaction. Conveniently, Fader and Hardie have introduced the Gamma-Gamma Spend Model that estimates our customer's per-transaction dollar value. If you had been following the mathematical boxes before, the concepts used in this model is very similar in which we first pick a distribution to model a customer's mean spend value and then allow its parameters to vary over the entire customer-base as a separate probability distribution. Furthermore, we are able to infer the spend value for our unobserved (or best guess) customers as well as make predictions for our current customers, who we do have data on, for a more precise spend value.
Mathematical and Python Box: Gamma Gamma ModelIf we assume a customer's transaction value () follow a gamma distribution
, it follows from gamma distribution convolution and scaling properties that the average transaction value of a customer given that he/she had
number of repeat transactions is
. To break it down:
Next let the rate parameter also be a gamma distributed random variable which intuitively represents the heterogeneity of our customer base. Intuitively, this allows the model to be more flexible for customers that are big and small spenders. The code for the model fitting is below
1 2 3 4 5 6 | ggf = GammaGammaFitter(penalizer_coef = 0) ret_cust_data = data[data['frequency'] > 0] # only look at customers with repeat purchases ret_cust_data['monetary_value'] = ret_cust_data['monetary_value']*0.55 # Set Monetary value to gross profit only (see below) ggf.fit(ret_cust_data['frequency'], ret_cust_data['monetary_value']) p,q,v = ggf._unload_params('p', 'q', 'v') print ggf |
lifetimes.GammaGammaFitter: fitted with 3599 subjects, p: 3.95, q: 3.17, v: 28.89
Much like the BG/NBD model, we want to view our customers in terms of the observed and the unobserved, where the latter describes the unconditional distribution while the former applies bayes theorem for a more precise distribution of a specific customer. Starting with the observed, the authors first derive through the same process in part 1. This is the probability of an average transaction value given
number of repeat transactions. The probability distribution also serves as a likelihood function later on when we derive the observed distribution. Below is a plot of
under 1, 3, 5, 25 and 100 repeat transactions after fitting the model.
Customers with different number of repeat transactions can be placed under different spending behaviors. With a low number of repeat transactions (e.g 1,2), the distribution itself is wider and has a lower mean spend than customers with higher repeat transactions. For example, for a $40 transaction value, it's much more likely for a loyal customer of ours to spend that much than someone who is new and has had only 1 transaction with the company.
Next we derive the unobserved distribution for our customer's mean transaction value (). Referring back to the first definition of the model, each customer's transaction
follows a gamma distribution with an expected value of
. Since
is defined by a probability distribution,
is also a probability distribution (
). From change of variables technique, it follows that
and
After breaking apart into
and
, we can simplify to
which is an inverse-gamma p.d.f with expected value of
and variance of
To recap, this distribution represents the unconditional probability distribution for mean transaction value over our entire customer-base rather than individual transactions . This is also the reason why we used the gamma properties to derive
rather than
. Recall that the origin of
came from
. If
had been a constant all along, each customer is then assumed to have a single expected transaction value. Therefore, when we allowed
to function as a range of probabilistic values, we allow our model to be flexible toward different customers and their mean spending behavior.
With the underlying unobserved distribution made, we can further develop our observed customers that have shopped with us before to statistically infer their specific spending behavior. Similar to part 1, using Bayes theorem, the posterior distribution of for a given customer
is equivalent to
Derivation of the above is provided in the paper. Recall again from previous derivation that was created as a function of
. We can simply swap the prior
parameters with the posterior
parameters such that
becomes
and
becomes
The new customer-specific spend distribution is an inverse-gamma distribution with an expected value of
and variance of
Let's do just that, referring back to the RFM matrix, we now focus on M which stands for monetary value, this is simply the sample average revenue from each Zakka Canada's customer over their repeat transactions. We're going to be more interested in gross profit for the purpose of valuation. It was reported to me that the approximate profit margin on each sale is 55%, while this is a crude measure, it's good enough for now. We can fit our model and calculate the Expected conditional average profit, population average profit and lastly, the average sample profit.
Expected conditional Average profit: $53.27, Population Average profit $52.43, Average sample profit: $52.61Python Box: Average Profits
1 2 3 4 5 6 | avg_cond_profit = ggf.conditional_expected_average_profit(ret_cust_data['frequency'], ret_cust_data['monetary_value']) print "Expected conditional Average profit: %s, Population Average profit %s, Average profit: %s" % ( avg_cond_profit, (p*v)/(q-1), ret_cust_data['monetary_value'].mean() ) |
It is worth noting here that the first two measures are very similar. Expected conditional average profit calculates what we expect the mean profit to be for our currently observed customers and mixes it with the population average profit which is what the model thinks is to be over the long term for all customers. This is because there's uncertainty from the amount of orders we have observed from an individual customer (customer frequency).
As an example, I plot our top six customer's mean spend/profit distribution below (see the first math/python box for detail on the distributions). The text provided is the expected profit per transaction and the number that follows is the number of transactions they've had with us.

best_projected_cust was defined in part 1's Python Box: Top 6 Customers
1 2 3 4 5 6 7 8 9 10 | customer_scale = p*(v+ret_cust_data['frequency']*ret_cust_data['monetary_value']) customer_shape = p*ret_cust_data['frequency']+q ez = customer_scale/(customer_shape-1) plt.figure(figsize=(12,6)) plt.xlabel('$ Profit') for ind,i in enumerate(best_projected_cust.index.tolist()): ccig = invgamma.rvs(customer_shape[i], scale=customer_scale[i], size = 10000) plt.text(ez[i]+5, invgamma.pdf(ez[i], customer_shape[i], scale=customer_scale[i]), '$'+str(round(ez[i],2))+'| '+str(int(ret_cust_data.loc[i,'frequency']))) sns.distplot(ccig, hist=False, rug=False, label=str(i)); |
A few interesting features that stem from the model and data is how our best customers vary. Our best customer by far (baby blue) has an expected profit of $167 per transaction and even though he/she has established themselves as a long time customer with 17 repeat transactions, it is extremely rare to see other customers purchase at that range of value. Thus the GG model emphasizes more uncertainty by fattening the distribution. Conversely, customers on the lower end of the spectrum (blue, yellow) tend to bring in less profit per transaction but we find they are easier to predict in terms of their spend behavior (as shown by their peaked distributions). We can generalize all of this intuitively: as the nominal spending for a customer increases, the distribution shifts outward and the probable range of profit per transaction is driven by
- The degree of the shift (higher transaction values lead to more uncertainty)
- The number of transactions that this customer has had previously (more transactions should lead to more certainty)
Next, we can apply this spend/profit model into our valuation model.
II. Valuation: Estimating from the Ground Up
Investors interested in a promising company often need to determine how much the company is worth present day. According to traditional financial theory, the cost (also referred to as worth or value) of a company should be equivalent to its discounted future cash flow. Cash flow is defined as the excess cash that is derived from a business after all asset costs and liabilities have been covered.
For a small business like Zakka Canada, it's pretty simple: The company incurs a profit margin on each order and pays a fixed cost each month to get their cash flow. Furthermore,
- In 2015, Zakka Canada's overhead and fixed cost has been estimated at $195,000 or $16,250 monthly
- Average long-term equity returns range between 6-8% annually. We assume that an investor should expect a base annual 15% return (or effectively 1.17% per month) on his/her investment.
- We can breakdown our customers into their observed and unobserved components, valuing each of them separately.
Predicting New Customers
Before carrying out the full valuation, we need to ask ourselves how has ZC acquired new customers over time. At the end of each month, a new customer cohort arrives which composes of unique new customers that have made their first purchase with us. Below is a plot of this since inception.

1 2 3 4 5 6 | unique_cohort = trans_data[['Customer ID', 'Date']].sort('Date') unique_cohort = unique_cohort.drop_duplicates('Customer ID') unique_cohort['Date_M'] = unique_cohort['Date'].apply(lambda x: dt.datetime(day=1,month=x.month,year=x.year)) cohort_over_time = unique_cohort[['Date_M', 'Customer ID']].groupby('Date_M').count() ax = cohort_over_time.plot(legend=False, figsize=(12,4)) ax.set_xlabel('Date'); ax.set_ylabel('New Customer Cohort Size') |
As shown, Zakka Canada had a prominent rise in new customers until now where it had stagnated between 150 and 200 for the last five years. Furthermore, there seems to be seasonality effects over each individual month, for example, sales tend to peak before the end of the year. We can use this information to develop a simple time-series model for forecasting future cohorts (ultimately to be used in the valuation model). Taking the period of stagnation, I fit the time-series model and forecast out the next 4 years. As we can see from the plot below, the model captures most of the monthly fluctuations over time.

The time-series model is simply a constant plus 12 dummy variables, each denoting a unique month of the year
where is the indicator variable for which month
is in.
is equal to the mean monthly estimated new arrivals.
is the error term of the regression. The model is fitted over an ordinary least squares regression.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 | def prepareDesignMatrix(x): x['M'] = x.index.month for elem in x['M'].unique(): x['M'+str(elem)] = (x['M'] == elem).astype(int) x['c'] = 1. return x.drop('M', axis=1) cohort_over_time = cohort_over_time[cohort_over_time.index >= dt.datetime(2010,2,1)] cohort_over_time = prepareDesignMatrix(cohort_over_time) res = OLS(cohort_over_time['Customer ID'], cohort_over_time[['M1','M2','M3','M4','M5','M6','M7','M8','M9','M10','M11','M12','c']]).fit() print res.summary() cohort_over_time['predicted'] = res.predict() f_cohort = pd.DataFrame(None, index= pd.date_range(cohort_over_time.index.max(), periods=5*12, freq='M')) f_cohort = prepareDesignMatrix(f_cohort) f_cohort['forecast'] = res.predict(f_cohort) cohort_over_time['predicted'].plot(style='--') cohort_over_time['Customer ID'].plot(legend=None) ax = f_cohort['forecast'].plot(style='--', figsize=(12,4)) ax.set_xlabel('Date'); ax.set_ylabel('New Customer Cohort Size') |
OLS Regression Results ============================================================================== Dep. Variable: Customer ID R-squared: 0.686 Model: OLS Adj. R-squared: 0.627 Method: Least Squares F-statistic: 11.70 Date: Thu, 07 Jul 2016 Prob (F-statistic): 3.40e-11 Time: 23:34:26 Log-Likelihood: -303.76 No. Observations: 71 AIC: 631.5 Df Residuals: 59 BIC: 658.7 Df Model: 11 Covariance Type: nonrobust ============================================================================== coef std err t P>|t| [95.0% Conf. Int.] ------------------------------------------------------------------------------ M1 -3.0231 8.150 -0.371 0.712 -19.331 13.285 M2 -3.1231 7.489 -0.417 0.678 -18.109 11.863 M3 22.0436 7.489 2.943 0.005 7.058 37.029 M4 23.7103 7.489 3.166 0.002 8.724 38.696 M5 6.7103 7.489 0.896 0.374 -8.276 21.696 M6 -4.6231 7.489 -0.617 0.539 -19.609 10.363 M7 -19.6231 7.489 -2.620 0.011 -34.609 -4.637 M8 6.5436 7.489 0.874 0.386 -8.442 21.529 M9 22.8769 7.489 3.055 0.003 7.891 37.863 M10 58.3769 7.489 7.795 0.000 43.391 73.363 M11 61.2103 7.489 8.173 0.000 46.224 76.196 M12 -22.4564 7.489 -2.999 0.004 -37.442 -7.471 c 148.6231 2.100 70.781 0.000 144.421 152.825 ============================================================================== Omnibus: 1.285 Durbin-Watson: 0.999 Prob(Omnibus): 0.526 Jarque-Bera (JB): 1.307 Skew: -0.242 Prob(JB): 0.520 Kurtosis: 2.543 Cond. No. 1.37e+16 ============================================================================== Warnings: [1] Standard Errors assume that the covariance matrix of the errors is correctly specified. [2] The smallest eigenvalue is 4.12e-31. This might indicate that there are strong multicollinearity problems or that the design matrix is singular.
The Valuation Recipe
The present value of Zakka Canada for each month can be expressed as (recall (3) previously)
- Plus the present value of all new and repeat gross profit value from the new cohort of unobserved customers.
- Plus the present value of all gross profit on repeat sales from our previously observed customers.
- Less the present value of fixed cost for each month.
I will refer to (1) as Discounted Repeat Value (or DRV) and (2) as Residual Customer Value (RCV) and value them individually.
RCV
Our current observed customer base will continue to purchase repeatedly until they have died. Since we are valuing the company after the data has ended, previous realized purchases will have no bearing on the present value. The reason we want to separate our observed and unobserved customer is because we are able to infer a more precise distribution for each of our current customers spend and purchase amount. Every month, we discount the expected repeat purchase (multiplied by the expected gross profit) by the rate of return for customer A,B,C... and so on and sum over every month to obtain the value of our current customer base as of today.
Zakka Canada's Residual Value: $193, 224
Math/Python Box: Residual Customer Value
The RCV consists of first predicting the expected purchases of each observed customer followed by their expected from the GG model given the
th customer's
. Note that since our periodicity is monthly while the model calculates in days, we use 30 as a default proxy for 1 month. Our residual value formula becomes:
where is the monthly discount rate.
1 2 3 4 5 | def RCV(t, model_trans, d, rfm, zeta): discount_factor = 1./(1+d)**t dif = (model_trans.conditional_expected_number_of_purchases_up_to_time(t*30, rfm['frequency'], rfm['recency'], rfm['T']) - model_trans.conditional_expected_number_of_purchases_up_to_time((t-1)*30, rfm['frequency'], rfm['recency'], rfm['T'])) return ((dif*zeta).sum())*discount_factor |
We can approximate this value by incrementally increasing until the incremental RCV is below some tolerance value which by default is 0.000001.
1 2 3 4 5 6 7 8 9 | def approximate(fn, t=1, eps_tol=1e-6, eps=0, **kwargs): eps = 0 cf = 0 while True: cf += fn(t=t, **kwargs) if(cf - eps < eps_tol): break eps = cf; t+=1 return cf |
The code below executes both functions, using ez from the gamma spend section and the rfm matrix from part 1,
1 | residual_value = approximate(RCV,model_trans=bgf,d=mr,rfm=ret_cust_data,zeta=ez) |
DRV
The DRV deals exclusively with the customers we have yet to observe so we rely on the unconditional distribution's expected value on spend and purchases. Refer to the example illustration below to get an idea of what's happening.
For each month, a new cohort of unique customers (colored bars), as dictated by the time-series forecast model, will arrive and create their first purchase. Furthermore, they will continue to repeatedly purchase (over the timeline) until they all leave the company (white region). We will discount both their first purchase and the "at-that-time" present value of their future repeat purchases, back to the present value (Dec 2015).
Math/Python Box: Discounted Repeated ValueZakka Canada's Discounted Repeated Value: $1,393,291
We first estimate each time period gross profit (of an unobserved customer) using expected and
. Then we discount each repeat sale for that single customer the same way we did for RCV. This is a constant which we will call
.
Recall that the BG/NBD model only measures in terms of repeat transactions. Each customer would of had to already made a purchase to be part of the cohort. Thus, we can add in one purchase at the time the new cohort arrives.
1 2 3 4 | def SDRV(t, model, d, zeta): #code's a bit different but the same idea discount_factor = 1./(1+d)**t dif = model.expected_number_of_purchases_up_to_time(t*30) - model.expected_number_of_purchases_up_to_time((t-1)*30) return zeta*(dif*discount_factor+(t==1)) |
That was one customer, at each time period , a cohort of size
arrives and has a "at-the-time" present value of
. We need to then discount each period back to the present day again to get our DRV
1 2 3 4 5 6 7 8 9 10 11 | def DRV(t, model, d, zeta, ts_model, sdrv): q = dt.datetime(2016,1,1)+pd.DateOffset(months=t) x = pd.DataFrame([[0,0,0,0,0,0,0,0,0,0,0,0]], index=[q], columns=['M1','M2','M3','M4','M5','M6','M7','M8','M9','M10','M11','M12']) cohort_size = ts_model.predict(prepareDesignMatrix(x))[0] val = sdrv * cohort_size/( (1+d)**t ) #print 'present value of cohort size %s at t = %s : %s' % (cohort_size, t,val) return val drv = approximate(DRV, model=bgf, d=mr, zeta=(p*v)/(q-1), ts_model=res, sdrv=approximate(SDRV, model=bgf, d=mr, zeta=(p*v)/(q-1))) |
PV of Fixed cost
Lastly, we can subtract the monthly fixed cost present value to derive Zakka Canada's firm value.
Math/Python Box: PV of Fixed CostZakka Canada's Fixed Cost Present Value: $1,387,121
Assuming the fixed cost is paid until infinity, the present value is simply a perpetuity
1 | pv_cost = monthly_fc / mr |
The price of Zakka Canada today where investors demand an annual 15% return would be equivalent to $199,395 which is much lower than initial estimates. A cool feature about breaking down our customers into two components is that we are able to examine how much our current customer base is worth as of today. Much like the model's original purpose, it is intuitive to say that the owners should not spend more than the RCV amount to obtain more sales from these customers.
Present Value of Zakka Canada: $199,395
Recap & Future Considerations
In part 1, I introduced the BG/NBD model to understand quantity, in this post, the GG model helped us understand value. Both models breakdown our customer-base into both the observed and unobserved components for us to infer and sample from. Using a simple present value cash flow model, I introduced a new perspective on how to value a company from the bottom up through its customers.
While this valuation was much lower than expected, it is a very rough estimate given the information known about Zakka Canada. Couple of things to improve on for precision:
- We can breakdown fixed cost into more forecastable or stable estimates
- We can look at exact profit margins on each specific product. Furthermore, we can probably forecast the expected purchases for each product through grouping.
- Estimate a confidence interval or distribution for firm value through our models.
- Valuation in continuous time rather than by static per-month discounting and the 30 day constant
- Grouping our customers into different categories (e.g geographical locations, business size, etc)
Hope you enjoyed this blog post, it took awhile to put up since I've been working and now travelling as well as have been doing some CSGO stuff on the side. Stay tuned for more! 🙂
Hello Kevin!
I just wanted to say thank you for this tutorial it has been very helpful! I do have a question for you. In your python notebook you note that there is not a closed for solution of the unobserved values for repeat and new cohorts of customers for the BG/NBD model but there is one for the Pareto/NBD model. How did you come to this conclusion? Also, where can the closed form solution for Pareto/NBD be found?
Thanks you again!
Hi Ryan, I sent an email to one of the authors regarding this model. This is his response (DET and DERT refer to the present-value of expected transactions):
A grad student derived DET (and perhaps DERT; I can’t remember) for the BG/NBD a few years ago, but it was a big scary mess, and I haven’t checked it.
My suggestion would be to avoid using the BG/NBD at all, if you care about getting DET?DERT estimates, and use the BG/BB model instead.
PF
For the formula see: http://www.brucehardie.com/papers/020/fader_et_al_mksc_10.pdf
Hi Kevin,
I encountered some problems when using the Gamma Gamma Model:
1. The similarity of 'monetary_value' and 'frequency' is calculated to be -0.057015. Is it normal for the similarity to be negative?
2.The calculated values(the Expected conditional average profit, population average profit and the average sample profit) are very different.
Expected conditional Average profit: 66.2569210220485, Population Average profit -1.5717530700135038, Average profit: 32.72737401711886
I calculated the data of each category separately and the data is normal. I don't know what's wrong with this.
Is this based on your own data?
Yes, it based on my own data.
There is another problem here,
cust_id frequency recency T monetary_value alive predicted_purchases rlv cate_1 cate_2 cate_3 cate_4 cate_5 cate_6 cate_7 cate_8
185269336 222 673 679 57.18 98.92% 4.42 12328.63 4.19 4.04 3.61 4.21 3.89 3.74 0.2 2.2
125772739 252 773 785 0 94.08% 4.14 0 4.01 3.75 3.24 3.83 3.35 3.53 0.2 2.31
81964731 287 1086 1088 1.49 99.79% 3.63 146.52 3.51 3.44 3.21 3.54 3.38 3.27 0.21 2.21
134016591 224 915 917 0 99.75% 3.35 0 3.22 3.14 2.9 3.25 3.08 2.96 0.21 1.91
191331747 160 706 717 45.45 97.58% 2.98 5658.05 2.85 2.72 2.4 2.82 2.56 2.53 0.2 1.57
125779433 68 261 279 15.53 72.49% 2.33 905.44 2.38 1.99 1.54 1.92 1.48 1.86 0.19 1
130407414 149 842 854 0.12 98.16% 2.35 348.34 2.27 2.18 1.96 2.25 2.07 2.05 0.2 1.34
predicted_purchases sum of every category (cate 1...8) is more larger than predicted_purchases, In this case, is the value calculated by rlv (sum of every category)correct?
How come you have not used the function ggf.customer_lifetime_value to estimate CLTV?
Thanks and great article