Lifetimes Part 2: Gamma Spend Model and Financial Valuation

See here for the full iPython Notebook code.  Some of the descriptions are outdated but the code is almost the same.

After getting to know what lifetimes can provide, I started applying it from a financial perspective.  I wanted to answer the most important question for Zakka Canada:  Leveraging our customer analytic models, how can I estimate the firm value that Zakka Canada is worth as of today?

The rest of this post is divided into two parts: 1) modelling the monetary value of our customer base and 2) estimating the price of Zakka Canada through a simple present value cash flow valuation model.

I. The Gamma Spend/Profit Model

Currently our bg/nbd model talks in expected number of transactions while financial valuation models talk in dollar amount.  Thus, we need to introduce a transaction value model alongside each unit of our transaction.  Conveniently, Fader and Hardie have introduced the Gamma-Gamma Spend Model that estimates our customer's per-transaction dollar value.  If you had been following the mathematical boxes before, the concepts used in this model is very similar in which we first pick a distribution to model a customer's mean spend value and then allow its parameters to vary over the entire customer-base as a separate probability distribution.  Furthermore, we are able to infer the spend value for our unobserved (or best guess) customers as well as make predictions for our current customers, who we do have data on, for a more precise spend value.

Mathematical and Python Box: Gamma Gamma Model

Reference Paper

If we assume a customer's transaction value () follow a gamma distribution , it follows from gamma distribution convolution and scaling properties that the average transaction value of a customer given that he/she had number of repeat transactions is .  To break it down:

Next let the rate parameter also be a gamma distributed random variable  which intuitively represents the heterogeneity of our customer base.  Intuitively, this allows the model to be more flexible for customers that are big and small spenders. The code for the model fitting is below

ggf = GammaGammaFitter(penalizer_coef = 0)
ret_cust_data = data[data['frequency'] > 0] # only look at customers with repeat purchases
ret_cust_data['monetary_value'] = ret_cust_data['monetary_value']*0.55 # Set Monetary value to gross profit only (see below)['frequency'], ret_cust_data['monetary_value'])
p,q,v = ggf._unload_params('p', 'q', 'v')
print ggf
lifetimes.GammaGammaFitter: fitted with 3599 subjects, p: 3.95, q: 3.17, v: 28.89

Much like the BG/NBD model, we want to view our customers in terms of the observed and the unobserved, where the latter describes the unconditional distribution while the former applies bayes theorem for a more precise distribution of a specific customer.  Starting with the observed, the authors first derive through the same process in part 1.  This is the probability of an average transaction value given number of repeat transactions.  The probability distribution also serves as a likelihood function later on when we derive the observed distribution.  Below is a plot of under 1, 3, 5, 25 and 100 repeat transactions after fitting the model.

Observed monetary distribution under different X

Customers with different number of repeat transactions can be placed under different spending behaviors.  With a low number of repeat transactions (e.g 1,2), the distribution itself is wider and has a lower mean spend than customers with higher repeat transactions.  For example, for a $40 transaction value, it's much more likely for a loyal customer of ours to spend that much than someone who is new and has had only 1 transaction with the company.

Next we derive the unobserved distribution for our customer's mean transaction value ().  Referring back to the first definition of the model, each customer's transaction follows a gamma distribution with an expected value of .  Since is defined by a probability distribution, is also a probability distribution ().  From change of variables technique, it follows that  and

After breaking apart into and , we can simplify to

which is an inverse-gamma p.d.f with expected value of

and variance of


To recap, this distribution represents the unconditional probability distribution for mean transaction value over our entire customer-base rather than individual transactions .  This is also the reason why we used the gamma properties to derive rather than .  Recall that the origin of came from .  If had been a constant all along, each customer is then assumed to have a single expected transaction value.  Therefore, when we allowed to function as a range of probabilistic values, we allow our model to be flexible toward different customers and their mean spending behavior.

With the underlying unobserved distribution made, we can further develop our observed customers that have shopped with us before to statistically infer their specific spending behavior.  Similar to part 1, using Bayes theorem, the posterior distribution of for a given customer is equivalent to

Derivation of the above is provided in the paper.  Recall again from previous derivation that was created as a function of .  We can simply swap the prior parameters with the posterior parameters such that  becomes and becomes

The new customer-specific spend distribution is an inverse-gamma distribution with an expected value of

and variance of


Let's do just that, referring back to the RFM matrix, we now focus on M which stands for monetary value, this is simply the sample average revenue from each Zakka Canada's customer over their repeat transactions.  We're going to be more interested in gross profit for the purpose of valuation.  It was reported to me that the approximate profit margin on each sale is 55%, while this is a crude measure, it's good enough for now.  We can fit our model and calculate the Expected conditional average profit, population average profit and lastly, the average sample profit.

Expected conditional Average profit: $53.27, Population Average profit $52.43, Average sample profit: $52.61
Python Box: Average Profits
avg_cond_profit = ggf.conditional_expected_average_profit(ret_cust_data['frequency'], ret_cust_data['monetary_value'])
print "Expected conditional Average profit: %s, Population Average profit %s, Average profit: %s" % (

It is worth noting here that the first two measures are very similar.  Expected conditional average profit calculates what we expect the mean profit to be for our currently observed customers and mixes it with the population average profit which is what the model thinks is to be over the long term for all customers.  This is because there's uncertainty from the amount of orders we have observed from an individual customer (customer frequency).

As an example, I plot our top six customer's mean spend/profit distribution below (see the first math/python box for detail on the distributions).  The text provided is the expected profit per transaction and the number that follows is the number of transactions they've had with us.

Top 6 Customer Mean Profit Distribution
Top 6 Customer Mean Profit Distribution
Python Box: Customer-Specific Profit Distributions

best_projected_cust was defined in part 1's Python Box: Top 6 Customers

customer_scale = p*(v+ret_cust_data['frequency']*ret_cust_data['monetary_value'])
customer_shape = p*ret_cust_data['frequency']+q
ez = customer_scale/(customer_shape-1)
plt.xlabel('$ Profit')
for ind,i in enumerate(best_projected_cust.index.tolist()):
    ccig = invgamma.rvs(customer_shape[i], scale=customer_scale[i], size = 10000)
    plt.text(ez[i]+5, invgamma.pdf(ez[i], customer_shape[i], scale=customer_scale[i]),
             '$'+str(round(ez[i],2))+'| '+str(int(ret_cust_data.loc[i,'frequency'])))
    sns.distplot(ccig, hist=False, rug=False, label=str(i));

A few interesting features that stem from the model and data is how our best customers vary.  Our best customer by far (baby blue) has an expected profit of $167 per transaction and even though he/she has established themselves as a long time customer with 17 repeat transactions, it is extremely rare to see other customers purchase at that range of value.  Thus the GG model emphasizes more uncertainty by fattening the distribution.  Conversely, customers on the lower end of the spectrum (blue, yellow) tend to bring in less profit per transaction but we find they are easier to predict in terms of their spend behavior (as shown by their peaked distributions).  We can generalize all of this intuitively: as the nominal spending for a customer increases, the distribution shifts outward and the probable range of profit per transaction is driven by

  1. The degree of the shift (higher transaction values lead to more uncertainty)
  2. The number of transactions that this customer has had previously (more transactions should lead to more certainty)

Next, we can apply this spend/profit model into our valuation model.

II. Valuation: Estimating from the Ground Up

Investors interested in a promising company often need to determine how much the company is worth present day. According to traditional financial theory, the cost (also referred to as worth or value) of a company should be equivalent to its discounted future cash flow.  Cash flow is defined as the excess cash that is derived from a business after all asset costs and liabilities have been covered.

For a small business like Zakka Canada, it's pretty simple: The company incurs a profit margin on each order and pays a fixed cost each month to get their cash flow. Furthermore,

  1. In 2015, Zakka Canada's overhead and fixed cost has been estimated at $195,000 or $16,250 monthly
  2. Average long-term equity returns range between 6-8% annually.  We assume that an investor should expect a base annual 15% return (or effectively 1.17% per month) on his/her investment.
  3. We can breakdown our customers into their observed and unobserved components, valuing each of them separately.

Predicting New Customers

Before carrying out the full valuation, we need to ask ourselves how has ZC acquired new customers over time.  At the end of each month, a new customer cohort arrives which composes of unique new customers that have made their first purchase with us.  Below is a plot of this since inception.

New Customer Cohort Sizes since June 2007
New Customer Cohort Sizes since June 2007
Python Box: Customer Cohorts
unique_cohort = trans_data[['Customer ID', 'Date']].sort('Date')
unique_cohort = unique_cohort.drop_duplicates('Customer ID')
unique_cohort['Date_M'] = unique_cohort['Date'].apply(lambda x: dt.datetime(day=1,month=x.month,year=x.year))
cohort_over_time = unique_cohort[['Date_M', 'Customer ID']].groupby('Date_M').count()
ax = cohort_over_time.plot(legend=False, figsize=(12,4))
ax.set_xlabel('Date'); ax.set_ylabel('New Customer Cohort Size')

As shown, Zakka Canada had a prominent rise in new customers until now where it had stagnated between 150 and 200 for the last five years.  Furthermore, there seems to be seasonality effects over each individual month, for example, sales tend to peak before the end of the year.  We can use this information to develop a simple time-series model for forecasting future cohorts (ultimately to be used in the valuation model).  Taking the period of stagnation, I fit the time-series model and forecast out the next 4 years.  As we can see from the plot below, the model captures most of the monthly fluctuations over time.

Red: Forecasted, Blue: Fitted, Green: Actual Subset of data I used
Red: Forecasted, Blue: Fitted, Green: Actual Subset of data I used
Math/Python Box: Forecasting Customer Cohorts

The time-series model is simply a constant plus 12 dummy variables, each denoting a unique month of the year

where is the indicator variable for which month is in.   is equal to the mean monthly  estimated new arrivals. is the error term of the regression.  The model is fitted over an ordinary least squares regression.

def prepareDesignMatrix(x):
 x['M'] = x.index.month
 for elem in x['M'].unique():
 x['M'+str(elem)] = (x['M'] == elem).astype(int)
 x['c'] = 1.
 return x.drop('M', axis=1)
cohort_over_time = cohort_over_time[cohort_over_time.index >= dt.datetime(2010,2,1)]
cohort_over_time = prepareDesignMatrix(cohort_over_time)
res = OLS(cohort_over_time['Customer ID'], cohort_over_time[['M1','M2','M3','M4','M5','M6','M7','M8','M9','M10','M11','M12','c']]).fit()
print res.summary()
cohort_over_time['predicted'] = res.predict()
f_cohort = pd.DataFrame(None, index= pd.date_range(cohort_over_time.index.max(), periods=5*12, freq='M'))
f_cohort = prepareDesignMatrix(f_cohort)
f_cohort['forecast'] = res.predict(f_cohort)
cohort_over_time['Customer ID'].plot(legend=None)
ax = f_cohort['forecast'].plot(style='--', figsize=(12,4))
ax.set_xlabel('Date'); ax.set_ylabel('New Customer Cohort Size')
                            OLS Regression Results                            
Dep. Variable:            Customer ID   R-squared:                       0.686
Model:                            OLS   Adj. R-squared:                  0.627
Method:                 Least Squares   F-statistic:                     11.70
Date:                Thu, 07 Jul 2016   Prob (F-statistic):           3.40e-11
Time:                        23:34:26   Log-Likelihood:                -303.76
No. Observations:                  71   AIC:                             631.5
Df Residuals:                      59   BIC:                             658.7
Df Model:                          11                                         
Covariance Type:            nonrobust                                         
                 coef    std err          t      P>|t|      [95.0% Conf. Int.]
M1            -3.0231      8.150     -0.371      0.712       -19.331    13.285
M2            -3.1231      7.489     -0.417      0.678       -18.109    11.863
M3            22.0436      7.489      2.943      0.005         7.058    37.029
M4            23.7103      7.489      3.166      0.002         8.724    38.696
M5             6.7103      7.489      0.896      0.374        -8.276    21.696
M6            -4.6231      7.489     -0.617      0.539       -19.609    10.363
M7           -19.6231      7.489     -2.620      0.011       -34.609    -4.637
M8             6.5436      7.489      0.874      0.386        -8.442    21.529
M9            22.8769      7.489      3.055      0.003         7.891    37.863
M10           58.3769      7.489      7.795      0.000        43.391    73.363
M11           61.2103      7.489      8.173      0.000        46.224    76.196
M12          -22.4564      7.489     -2.999      0.004       -37.442    -7.471
c            148.6231      2.100     70.781      0.000       144.421   152.825
Omnibus:                        1.285   Durbin-Watson:                   0.999
Prob(Omnibus):                  0.526   Jarque-Bera (JB):                1.307
Skew:                          -0.242   Prob(JB):                        0.520
Kurtosis:                       2.543   Cond. No.                     1.37e+16

[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
[2] The smallest eigenvalue is 4.12e-31. This might indicate that there are
strong multicollinearity problems or that the design matrix is singular.


The Valuation Recipe

The present value of Zakka Canada for each month can be expressed as (recall (3) previously)

  1. Plus the present value of all new and repeat gross profit value from the new cohort of unobserved customers.
  2. Plus the present value of all gross profit on repeat sales from our previously observed customers.
  3. Less the present value of fixed cost for each month.

I will refer to (1) as Discounted Repeat Value (or DRV) and (2) as Residual Customer Value (RCV) and value them individually.


Our current observed customer base will continue to purchase repeatedly until they have died.  Since we are valuing the company after the data has ended, previous realized purchases will have no bearing on the present value.   The reason we want to separate our observed and unobserved customer is because we are able to infer a more precise distribution for each of our current customers spend and purchase amount.  Every month, we discount the expected repeat purchase (multiplied by the expected gross profit) by the rate of return for customer A,B,C... and so on and sum over every month to obtain the value of our current customer base as of today.

Zakka Canada's Residual Value: $193, 224

Math/Python Box: Residual Customer Value

The RCV consists of first predicting the expected purchases of each observed customer followed by their expected from the GG model given the th customer's . Note that since our periodicity is monthly while the model calculates in days, we use 30 as a default proxy for 1 month. Our residual value formula becomes:

where is the monthly discount rate.

def RCV(t, model_trans, d, rfm, zeta):
    discount_factor = 1./(1+d)**t
    dif = (model_trans.conditional_expected_number_of_purchases_up_to_time(t*30, rfm['frequency'], rfm['recency'], rfm['T'])
        - model_trans.conditional_expected_number_of_purchases_up_to_time((t-1)*30, rfm['frequency'], rfm['recency'], rfm['T']))
    return ((dif*zeta).sum())*discount_factor

We can approximate this value by incrementally increasing until the incremental RCV is below some tolerance value which by default is 0.000001.

def approximate(fn, t=1, eps_tol=1e-6, eps=0, **kwargs):
    eps = 0
    cf = 0
    while True:
        cf += fn(t=t, **kwargs)
        if(cf - eps < eps_tol):
        eps = cf; t+=1
    return cf

The code below executes both functions, using ez from the gamma spend section and the rfm matrix from part 1,

residual_value = approximate(RCV,model_trans=bgf,d=mr,rfm=ret_cust_data,zeta=ez)


The DRV deals exclusively with the customers we have yet to observe so we rely on the unconditional distribution's expected value on spend and purchases.  Refer to the example illustration below to get an idea of what's happening.

DRV process

For each month, a new cohort of unique customers (colored bars), as dictated by the time-series forecast model, will arrive and create their first purchase.  Furthermore, they will continue to repeatedly purchase (over the timeline) until they all leave the company (white region).  We will discount both their first purchase and the "at-that-time" present value of their future repeat purchases, back to the present value (Dec 2015).

Zakka Canada's Discounted Repeated Value: $1,393,291

Math/Python Box: Discounted Repeated Value

We first estimate each time period gross profit (of an unobserved customer) using expected and .  Then we discount each repeat sale for that single customer the same way we did for RCV.  This is a constant which we will call .

Recall that the BG/NBD model only measures in terms of repeat transactions.  Each customer would of had to already made a purchase to be part of the cohort.  Thus, we can add in one purchase at the time the new cohort arrives.

def SDRV(t, model, d, zeta): #code's a bit different but the same idea
    discount_factor = 1./(1+d)**t
    dif = model.expected_number_of_purchases_up_to_time(t*30) - model.expected_number_of_purchases_up_to_time((t-1)*30)
    return zeta*(dif*discount_factor+(t==1))

That was one customer, at each time period , a cohort of size arrives and has a "at-the-time" present value of .  We need to then discount each period back to the present day again to get our DRV

def DRV(t, model, d, zeta, ts_model, sdrv):
    q = dt.datetime(2016,1,1)+pd.DateOffset(months=t)
    x = pd.DataFrame([[0,0,0,0,0,0,0,0,0,0,0,0]], index=[q], columns=['M1','M2','M3','M4','M5','M6','M7','M8','M9','M10','M11','M12'])
    cohort_size = ts_model.predict(prepareDesignMatrix(x))[0]
    val = sdrv * cohort_size/( (1+d)**t )
    #print 'present value of cohort size %s at t = %s : %s' % (cohort_size, t,val)
    return val
drv = approximate(DRV, model=bgf, 
                  d=mr, zeta=(p*v)/(q-1), 
                  ts_model=res, sdrv=approximate(SDRV, model=bgf, d=mr, zeta=(p*v)/(q-1)))


PV of Fixed cost

Lastly, we can subtract the monthly fixed cost present value to derive Zakka Canada's firm value.

Zakka Canada's Fixed Cost Present Value: $1,387,121

Math/Python Box: PV of Fixed Cost

Assuming the fixed cost is paid until infinity, the present value is simply a perpetuity

pv_cost = monthly_fc / mr

The price of Zakka Canada today where investors demand an annual 15% return would be equivalent to $199,395 which is much lower than initial estimates. A cool feature about breaking down our customers into two components is that we are able to examine how much our current customer base is worth as of today.  Much like the model's original purpose, it is intuitive to say that the owners should not spend more than the RCV amount to obtain more sales from these customers.

Present Value of Zakka Canada: $199,395

Recap & Future Considerations

In part 1, I introduced the BG/NBD model to understand quantity, in this post, the GG model helped us understand value.  Both models breakdown our customer-base into both the observed and unobserved components for us to infer and sample from.  Using a simple present value cash flow model, I introduced a new perspective on how to value a company from the bottom up through its customers.

While this valuation was much lower than expected, it is a very rough estimate given the information known about Zakka Canada.  Couple of things to improve on for precision:

  • We can breakdown fixed cost into more forecastable or stable estimates
  • We can look at exact profit margins on each specific product.  Furthermore, we can probably forecast the expected purchases for each product through grouping.
  • Estimate a confidence interval or distribution for firm value through our models.
  • Valuation in continuous time rather than by static per-month discounting and the 30 day constant
  • Grouping our customers into different categories (e.g geographical locations, business size, etc)

Hope you enjoyed this blog post, it took awhile to put up since I've been working and now travelling as well as have been doing some CSGO stuff on the side.  Stay tuned for more! 🙂

7 thoughts on “Lifetimes Part 2: Gamma Spend Model and Financial Valuation

  1. Hello Kevin!

    I just wanted to say thank you for this tutorial it has been very helpful! I do have a question for you. In your python notebook you note that there is not a closed for solution of the unobserved values for repeat and new cohorts of customers for the BG/NBD model but there is one for the Pareto/NBD model. How did you come to this conclusion? Also, where can the closed form solution for Pareto/NBD be found?

    Thanks you again!

    1. Hi Ryan, I sent an email to one of the authors regarding this model. This is his response (DET and DERT refer to the present-value of expected transactions):

      A grad student derived DET (and perhaps DERT; I can’t remember) for the BG/NBD a few years ago, but it was a big scary mess, and I haven’t checked it.

      My suggestion would be to avoid using the BG/NBD at all, if you care about getting DET?DERT estimates, and use the BG/BB model instead.


      For the formula see:

  2. Hi Kevin,
    I encountered some problems when using the Gamma Gamma Model:
    1. The similarity of 'monetary_value' and 'frequency' is calculated to be -0.057015. Is it normal for the similarity to be negative?
    2.The calculated values(the Expected conditional average profit, population average profit and the average sample profit) are very different.
    Expected conditional Average profit: 66.2569210220485, Population Average profit -1.5717530700135038, Average profit: 32.72737401711886

      1. Yes, it based on my own data.
        There is another problem here,

        cust_id frequency recency T monetary_value alive predicted_purchases rlv cate_1 cate_2 cate_3 cate_4 cate_5 cate_6 cate_7 cate_8
        185269336 222 673 679 57.18 98.92% 4.42 12328.63 4.19 4.04 3.61 4.21 3.89 3.74 0.2 2.2
        125772739 252 773 785 0 94.08% 4.14 0 4.01 3.75 3.24 3.83 3.35 3.53 0.2 2.31
        81964731 287 1086 1088 1.49 99.79% 3.63 146.52 3.51 3.44 3.21 3.54 3.38 3.27 0.21 2.21
        134016591 224 915 917 0 99.75% 3.35 0 3.22 3.14 2.9 3.25 3.08 2.96 0.21 1.91
        191331747 160 706 717 45.45 97.58% 2.98 5658.05 2.85 2.72 2.4 2.82 2.56 2.53 0.2 1.57
        125779433 68 261 279 15.53 72.49% 2.33 905.44 2.38 1.99 1.54 1.92 1.48 1.86 0.19 1
        130407414 149 842 854 0.12 98.16% 2.35 348.34 2.27 2.18 1.96 2.25 2.07 2.05 0.2 1.34

        predicted_purchases sum of every category (cate 1...8) is more larger than predicted_purchases, In this case, is the value calculated by rlv (sum of every category)correct?

Leave a Reply

Your email address will not be published. Required fields are marked *