Probabilistic Models for Computing Customer-Centric Metrics

Introduction

In this post, I will summarize the common statistical methods to describe and predict customers’ purchase behavior in non-contractual settings. It fits probabilistic models to historical transaction records for computing customer-centric metrics of managerial interest.

Models

Model Author Description R package

NBD

Ehrenberg (1959)

basic benchmark, assumes a heterogenous purchase process, but doesn't account for the possibility of customer defection

BTYDplus

Pareto/NBD

Schmittlein, Morrison, and Colombo (1987)

combines the NBD model for transactions of active customers with a heterogeneuos dropout process, and to this date can still be considered a gold standard for buy-till-you-die models.

BTYDplus, BTYD, CLVTools

BG/NBD

(P. Fader, Hardie, and Lee 2005)

adjusts the Pareto/NBD assumptions with respect to the dropout process in order to speed up computation. However, the BG/NBD model particularly assumes that every customer without a repeat transaction has not defected yet, independent of the elapsed time of inactivity.

BTYDplus, BTYD, CLVTools

MBG/NBD

Batislam, Denizel, and Filiztekin (2007), Hoppe and Wagner (2007)

improve BG/NBD by allowing customers without any activity to also remain inactive

BTYDplus

BG/CNBD-k

Reutterer, Platzer, and Schröder (2020)

extend BG/NBD by allowing for regularity within the transaction timings. If such regularity is present (even in a mild form), these models can yield significant improvements in terms of customer level forecasting accuracy, while the computational costs remain at a similar order of magnitude.

BTYDplus

MBG/CNBD-k

Reutterer, Platzer, and Schröder (2020)

extend MBG/NBD by allowing for regularity within the transaction timings

BTYDplus

Pareto/NBD (HB)

Ma and Liu (2007)

it sticks to the original Pareto/NBD assumptions, but using MCMC approach rather than MLE

BTYDplus

Pareto/NBD (Abe)

Abe (2009)

relaxes the independence of purchase and dropout process, plus is capable of incorporating customer covariates.

BTYDplus

Pareto/GGG

Platzer and Reutterer (2016)

allows for a varying degree of regularity within the transaction timings.

BTYDplus

REMARK:

In practice, the Pareto/NBD model sometimes costs too much computation time if the size of your transaction data is quite large. That’s when BG/NBD comes in. With the limited computing resource, the BG/NBD will be a second choose because of the fast computation. However, the biggest issue of BG/NBD is that the zero-repeaters are always alive. To solve this issue and keep fast computation, the MBG/NBD model is a very useful model dealing with large data set.

# check mbgnbd
?zetaclv::mbgnbd_predict

time-invariant/time-varying model

In the CLVTools R package 📦, there are more advanced models:

  1. Pareto/NBD model with time-invariant contextual factors (Fader & Hardie 2007)

  2. Pareto/NBD model with time-varying contextual factors (Bachmann, Meierer & Näf 2021)

  3. Standard BG/NBD model (Fader, Hardie, & Lee 2005)

  4. BG/NBD model with time-invariant contextual factors (Fader & Hardie 2007)

  5. Standard Gamma/Gompertz/NBD (Bemmaor & Glady 2012)

  6. Gamma/Gompertz/NBD model with time-invariant contextual factors (Näf, Bachmann & Meierer 2020)

  7. Gamma/Gamma model to estimate customer spending (Colombo & Jiang 1999; Fader, Hardie & Lee 2005; Fader & Hardie 2013)

Reference

  1. Customer Base Analysis with BTYDplus. This is the tutorial for R package BTYDplus 📦. Most of the literature references in the above tables can be found here.

  2. CLVTools R pacakge walkthrough page

Chen Xing
Chen Xing
Founder & Data Scientist

Enjoy Life & Enjoy Work!

Related