Change-point detection and segmentation are important statistical tasks with a broad range of applications across the sciences and engineering. These tasks have been studied extensively for continuous-valued time series. However, relatively less attention has been paid to discrete time series, where analogous problems naturally arise, e.g., in genetics, biomedicine, neuroscience, health system management, finance, and the social sciences.
We propose a new Bayesian modelling framework for piece-wise homogeneous variable-memory Markov chains, along with a collection of effective algorithmic tools for change-point detection and segmentation of discrete time series. Building on the recently introduced Bayesian Context Trees framework, the distributions of different segments in a discrete time series are described as variable-memory Markov chains. Inference for the presence and location of change-points is then performed via Markov chain Monte Carlo sampling. The key observation that facilitates effective sampling is that the prior predictive likelihood in each segment of the data can be computed exactly, averaged over all models and parameters. This makes it possible to sample directly from the posterior distribution of the number and location of the change-points, leading to accurate estimates and providing a natural quantitative measure of uncertainty in the results. Estimates of the actual model in each segment can also be obtained, at essentially no additional computational cost. Results on both simulated and real-world data indicate that the proposed methodology performs better than or as well as state-of-the-art techniques.