CLVTools icon indicating copy to clipboard operation
CLVTools copied to clipboard

"Estimation End" Date Incongruent with Period of Observation

Open CLVhunter opened this issue 2 years ago • 2 comments

  • After successfully testing the model with a holdout, I’m looking to use the entire cohort dataset to make a prediction into the future (no holdout).

  • I’m doing this on several years of monthly cohorts.

  • As the cohorts get older, their “estimation end” date gets further from the actual end of observation date (01/01/24 in this case).

  • For example: I have all transaction log data until 01/01/24. The observation period should be from time of first transaction until 01/01/24. The package defines the fitting period from the date of the first transaction to the date of the last transaction.

  • This becomes a problem for older cohorts who haven’t made a purchase on or near 01/01/24.

  • I have a cohort whose last purchase was on 03/23/22. But the end of actual observation is 01/01/24 (no one in this cohort purchased from 03/23/22 to 01/01/24). It’s going to show a lot of people are still alive (as if it was 03/23/22), but they are no longer alive on 01/01/24 since they haven’t made a purchase in almost 2 years.

  • This is an extreme case, but for a lot of my cohorts, the Estimation End date is not the same as or close to the end of observation.

Is there a way to set the fitting period “Estimation End” to a certain date?

CLVhunter avatar Jan 22 '24 21:01 CLVhunter

This is a known pattern when modeling purchase records from some industries.

Peter Fader, Bruce Hardie, and Michael Ross address this briefly in their book "The customer-base audit" (https://www.pennpress.org/9781613631607/the-customer-base-audit/). In footnote 9 (page 193) they elaborate on this and provide a solution: "Even if you have complete records for every single customer from the day your firm started, you may choose to have a “pre-20xy” cohort to make the various plots that follow more legible. Besides, long after the shakeout (described in chapter 5) takes place, the distinctions between cohorts are less meaningful and interesting. The benefits arising from tracking old cohorts as separate entities will often not be worth the effort to do so."

mmeierer avatar Jan 22 '24 21:01 mmeierer

Besides the suggestion by @mmeierer to combine older cohorts into a single one there is also the more "hands-on" solution to manually set the estimation end to an arbitrary date. For this, the clv.time object in the clv.data needs to be adjusted:

data('cdnow')
# create the clv.data object without holdout period
clv.cdnow <- clvdata(cdnow, "ymd", "w", estimation.split = NULL)

# set the estimation end to any desired date
# note that '1999-01-01' is beyond the last recorded transaction ('1998-06-30')
[email protected]@timepoint.estimation.end <- as.Date('1999-01-01')

# also update the length of the estimation period (in number of periods)
# manually calculating the number of periods (weeks in this case)
[email protected]@estimation.period.in.tu <- as.numeric(
  ([email protected]@timepoint.estimation.end - [email protected]@timepoint.estimation.start)/7
)

Model fitting and making predictions should work and respect these changes. For plotting this approach will likely not work because plotting is based on the actual transaction data and not on clv.time.

Alternatively, you could also add a single fake transaction (maybe for a fake customer) on the actual observation end (01/01/24). If there are many customers, the impact of a single customer with a fake transaction should be negligible.

pschil avatar Feb 13 '24 22:02 pschil