Package 'breakfast' reference manual

Title:	Methods for Fast Multiple Change-Point/Break-Point Detection and Estimation
Description:	A developing software suite for multiple change-point and change-point-type feature detection/estimation (data segmentation) in data sequences.
Authors:	Andreas Anastasiou [aut], Yining Chen [aut, cre], Haeran Cho [aut], Piotr Fryzlewicz [aut]
Maintainer:	Yining Chen <[email protected]>
License:	GPL-2
Version:	2.5
Built:	2025-02-21 04:45:22 UTC
Source:	https://github.com/cran/breakfast

Breakfast: Methods for Fast Multiple Change-point Detection and Estimation

Description

A developing software suite for multiple change-point detection/estimation (data segmentation) in data sequences.

Details

The current version implements methods for detecting changes in the data sequence modelled as (i) a piecewise-constant function plus i.i.d. Gaussian noise, (ii) a piecewise-constant function plus autoregressive time series, (iii) a piecewise-linear and continuous function plus i.i.d. Gaussian noise, and (iv) a piecewise-linear and discontinuous function plus i.i.d. Gaussian noise. This is carried out via a two-stage procedure combining solution path generation and model selection methodologies. Change-point detection in breakfast is carried out in two stages, first the computation of a solution path, followed by a model selection step along the path. A variety of solution path and model selection methods are included, which can be accessed individually, or through breakfast. Currently supported solution path methods are: sol.idetect, sol.idetect_seq, sol.wbs, sol.wbs2, sol.not, sol.tguh and sol.wcm.

Currently supported model selection methods are: model.ic, model.lp, model.sdll, model.thresh and model.gsa.

Check back future versions for more change-point models and the corresponding methods.

Author(s)

We would like to thank Shakeel Gavioli-Akilagun, Anica Kostic, Shuhan Yang and Christine Yuen for their comments and suggestions that helped improve this package.

Methods for fast detection of multiple change-points

Description

This function estimates the number and locations of change-points in a univariate data sequence, which is modelled as (i) a piecewise-constant function plus i.i.d. Gaussian noise, (ii) a piecewise-constant function plus autoregressive time series, (iii) a piecewise-linear and continuous function plus i.i.d. Gaussian noise, or (iv) a piecewise-linear and discontinuous function plus i.i.d. Gaussian noise. This is carried out via a two-stage procedure combining solution path generation and model selection methodologies.

Usage

breakfast(
  x,
  type = c("const", "lin.cont", "lin.discont"),
  solution.path = NULL,
  model.selection = NULL
)
breakfast(
  x,
  type = c("const", "lin.cont", "lin.discont"),
  solution.path = NULL,
  model.selection = NULL
)

Arguments

`x`	A numeric vector containing the data to be processed
`type`	The type of change-point models fitted to the data; currently supported models are: piecewise constant signals (`type = "const"`, chosen by default), piecewise linear and continuous signals (`type = "lin.cont"`) and piecewise linear and discontinuous signals (`type = "lin.discont"`).
`solution.path`	A string or a vector of strings containing the name(s) of solution path generating method(s); if individual methods are accessed via this option, default tuning parameters are used. Alternatively, you can directly access each solution path generating method via `sol.[method]`. If both `solution.path` and `model.selection` are unspecified, we return the output from the suggested combinations based on their performance, which depends on `type` as below: When `type = "const"`: `("idetect", "ic")`, `("idetect_seq", "thresh")`, `("not", "ic")`, `("tguh", "lp")`, `("wbs", "ic")`, `("wbs2", "sdll")` and `("wcm", "gsa")`. When `type = "lin.cont"` or `type = "lin.discont"`: `("idetect_seq", "thresh")`, `("not", "ic")` and `("idetect", "sdll")`. If `solution.path` is specified but `model.selection` is not, we return the output from the specified `solution.path` methods combined with the suggested model selection methods (respectively) as above. "idetect" IDetect, supporting `type = "const", "lin.cont", "lin.discont"`, see sol.idetect "idetect_seq" Sequential IDetect, supporting `type = "const", "lin.cont", "lin.discont"`, see sol.idetect_seq "not" Narrowest-Over-Threshold, supporting `type = "const", "lin.cont", "lin.discont"`, see sol.not "tguh" Tail-Greedy Unbalanced Haar, supporting `type = "const"`, see sol.tguh "wbs" Wild Binary Segmentation, supporting `type = "const"`, see sol.wbs "wbs2" Wild Binary Segmentation 2, supporting `type = "const"`, see sol.wbs2 "wcm" Wild Contrast Maximisation, supporting `type = "const"` in combination with model.gsa handling model (ii), see sol.wcm "all" All of the above that support the `type`
`model.selection`	A string or a vector of strings containing the name(s) of model selection method(s); if individual methods are accessed via this option, default tuning parameters are used. Alternatively, you can directly access each model selection method via `model.[method]`. If both `solution.path` and `model.selection` are unspecified, we return the output from the suggested combinations based on their performance, see `solution.path`. If `model.selection` is specified but `solution.path` is not, we return the output from the specified `model.selection` methods combined with the suggested solution path methods (respectively). Not all `solution.path` methods are supported by all `model.selection` methods; check the individual functions for more information. "ic" Strengthened Schwarz information criterion, supporting `type = "const", "lin.cont", "lin.discont"`, see model.ic "lp" Localised pruning, supporting `type = "const"`, see model.lp "sdll" Steepest Drop to Low Levels method, supporting `type = "const", "lin.cont", "lin.discont"`, see model.sdll "thresh" Thresholding, supporting `type = "const", "lin.cont", "lin.discont"`, see model.thresh "gsa" gappy Schwarz algorithm, supporting `type = "const"` in combination with sol.wcm handling model (ii), see model.gsa "all" All of the above that support the given `type`

Details

Please also take a look at the vignette for tips/suggestions/examples of using the breakfast package.

Value

An S3 object of class breakfast.cpts, which contains the following fields:

x: Input vector x
cptmodel.list: A list containing S3 objects of class cptmodel; each contains the following fields:

solution.path: The solution path method used
model.selection: The model selection method used to return the final change-point estimators object
no.of.cpt: The number of estimated change-points in the piecewise-constant mean of the vector cptpath.object$x
cpts: The locations of estimated change-points in the piecewise-constant mean of the vector cptpath.object$x. These are the end-points of the corresponding constant-mean intervals
est: An estimate of the piecewise-constant mean of the vector cptpath.object$x; the values are the sample means of the data (replicated a suitable number of times) between each pair of consecutive detected change-points

References

A. Anastasiou & P. Fryzlewicz (2022) Detecting multiple generalized change-points by isolating single ones. Metrika, 85(2), 141–174.

R. Baranowski, Y. Chen & P. Fryzlewicz (2019) Narrowest-over-threshold detection of multiple change points and change-point-like features. Journal of the Royal Statistical Society: Series B, 81(3), 649–672.

H. Cho & C. Kirch (2022) Two-stage data segmentation permitting multiscale change points, heavy tails and dependence. Annals of the Institute of Statistical Mathematics, 74(4), 653–684.

H. Cho & P. Fryzlewicz (2024) Multiple change point detection under serial dependence: Wild contrast maximisation and gappy Schwarz algorithm. Journal of Time Series Analysis, 45(3): 479–494.

P. Fryzlewicz (2014) Wild binary segmentation for multiple change-point detection. The Annals of Statistics, 42(6), 2243–2281.

P. Fryzlewicz (2018) Tail-greedy bottom-up data decompositions and fast multiple change-point detection. The Annals of Statistics, 46(6B), 3390–3421.

P. Fryzlewicz (2020) Detecting possibly frequent change-points: Wild Binary Segmentation 2 and steepest-drop model selection. Journal of the Korean Statistical Society, 49(4), 1027–1070.

Examples

f <- rep(rep(c(0, 1), each = 50), 10)
x <- f + rnorm(length(f)) * .5
breakfast(x)
f <- rep(rep(c(0, 1), each = 50), 10)
x <- f + rnorm(length(f)) * .5
breakfast(x)

Estimate the location of change-points when the number of them is fixed

Description

Return a solution with the given number of change-points or change-point-type features from the solution path

Usage

model.fixednum(cptpath.object, fixednum = NULL)
model.fixednum(cptpath.object, fixednum = NULL)

Arguments

`cptpath.object`	A solution-path object, returned by a `sol.[name]` routine. Note that the field `cptpath.object$x` contains the input data sequence.
`fixednum`	The number of change-points or change-point-type features

Details

The model selection method which returns results with a given number of change-points or change-point-type features. If there are multiple such elements on the solution path, the one with the smaller residual sum of squares will be returned. On the other hand, if no such element exists, an empty set (i.e. with no change-points) will be returned.

Value

An S3 object of class cptmodel, which contains the following fields:

`solution.path`	The solution path method used to obtain `cptpath.object`
`type`	The model type used, inherited from the given `cptpath.object`
`model.selection`	The model selection method used to return the final change-point or change-point-type feature estimators object, here its value is `"ic"`
`no.of.cpt`	The number of estimated features in the mean of the vector `cptpath.object$x` based on the given `type` of the model
`cpts`	The locations of estimated features in the mean of the vector `cptpath.object$x`. These are the end-points of the corresponding constant-mean or constant-slope intervals
`est`	An estimate of the mean of the vector `cptpath.object$x`; for piecewise-constant signals, the values are the sample means of the data (replicated a suitable number of times) between each pair of consecutive detected change-points; for piecewise-linear but discontinuous signals, the values are the estimated linear trend (replicated a suitable number of times) between each pair of consecutive detected change of slopes; for piecewise-linear and continuous signals, it is similar to the previous case but with the continuity constraint enforced, which envolves solving a global least squares problem.

Examples

x <- c(rep(0, 100), rep(1, 100), rep(0, 100)) + rnorm(300)
model.fixednum(sol.wbs(x),2)
model.fixednum(sol.not(x),2)
x <- c(rep(0, 100), rep(1, 100), rep(0, 100)) + rnorm(300)
model.fixednum(sol.wbs(x),2)
model.fixednum(sol.not(x),2)

Estimating change-points in the piecewise-constant mean of a noisy data sequence with auto-regressive noise via gappy Schwarz algorithm

Description

This function estimates the number and locations of change-points in the piecewise-constant mean of a noisy data sequence with auto-regressive noise via gappy Schwarz algorithm from a candidate model sequence generated by sol.wcm.

Usage

model.gsa(cptpath.object, p.max = 10, pen = log(length(cptpath.object$x))^1.01)
model.gsa(cptpath.object, p.max = 10, pen = log(length(cptpath.object$x))^1.01)

Arguments

`cptpath.object`	A solution-path object, returned by a `sol.wcm` routine. Note that the field `cptpath.object$x` contains the input data sequence.
`p.max`	The maximum AR order. The default is `p.max = 10`.
`pen`	Penalty used for the Schwarz criterion. `log(length(cptpath.object$x))^1.01` is used as default.

Details

From the largest to the smallest (i.e. empty) candidate models generated by sol.wcm, gappy Schwarz algorithm locally evaluates the Schwarz criterion (SC, under piecewise constant signal + AR(p) noise model, with the AR order p to be determined adaptively) and its modification SC0 on each segment determined by the next smallest candidate model. It selects the larger model as the final model if over each segment, all newly introduced estimators are deemed ‘significant’ according to SC and SC0; see Cho and Fryzlewicz (2023) for details.

Value

An S3 object of class cptmodel, which contains the following fields:

`solution.path`	The solution path method used to obtain `cptpath.object`, here its value is `"wcm"`
`model.selection`	The model selection method used to return the final change-point estimators object, here its value is `"gsa"`
`no.of.cpt`	The number of estimated change-points in the piecewise-constant mean of the vector `cptpath.object$x`
`cpts`	The locations of estimated change-points in the piecewise-constant mean of the vector `cptpath.object$x`. These are the end-points of the corresponding constant-mean intervals
`est`	An estimate of the piecewise-constant mean of the vector `cptpath.object$x`; the values are the sample means of the data (replicated a suitable number of times) between each pair of consecutive detected change-points

References

H. Cho & P. Fryzlewicz (2024) Multiple change point detection under serial dependence: Wild contrast maximisation and gappy Schwarz algorithm. Journal of Time Series Analysis, 45(3): 479–494.

Examples

set.seed(111)
f <- rep(c(0, 5, 2, 8, 1, -2), c(100, 200, 200, 50, 200, 250))
x <- f + arima.sim(list(ar = c(.75, -.5), ma = c(.8, .7, .6, .5, .4, .3)), n = length(f), sd = 1)
model.gsa(sol.wcm(x))
set.seed(111)
f <- rep(c(0, 5, 2, 8, 1, -2), c(100, 200, 200, 50, 200, 250))
x <- f + arima.sim(list(ar = c(.75, -.5), ma = c(.8, .7, .6, .5, .4, .3)), n = length(f), sd = 1)
model.gsa(sol.wcm(x))

Estimating change-points or change-point-type features in the mean of a noisy data sequence via the strengthened Schwarz information criterion

Description

This function estimates the number and locations of change-points or change-point-type features in the mean of a noisy data sequence via the sSIC (strengthened Schwarz information criterion) method.

Usage

model.ic(cptpath.object, alpha = 1.01, q.max = NULL)
model.ic(cptpath.object, alpha = 1.01, q.max = NULL)

Arguments

`cptpath.object`	A solution-path object, returned by a `sol.[name]` routine. Note that the field `cptpath.object$x` contains the input data sequence.
`alpha`	The parameter associated with the sSIC. The default value is 1.01. Note that the SIC is recovered when alpha = 1.
`q.max`	The maximum number of features allowed. If nothing or `NULL` is provided, the default value of `min(100, n/log(n))` (rounded to an integer) will be used.

Details

The model selection method for algorithms that produce nested solution path is described in "Wild binary segmentation for multiple change-point detection", P. Fryzlewicz (2014), The Annals of Statitics, 42: 2243–2281. The corresponding description for those that produce non-nested solution set can be found in "Narrowest-over-threshold detection of multiple change points and change-point-like features", R. Baranowski, Y. Chen and P. Fryzlewicz (2019), Journal of Royal Statistical Society: Series B, 81(3), 649–672.

Value

An S3 object of class cptmodel, which contains the following fields:

`solution.path`	The solution path method used to obtain `cptpath.object`
`type`	The model type used, inherited from the given `cptpath.object`
`model.selection`	The model selection method used to return the final change-point or change-point-type feature estimators object, here its value is `"ic"`
`no.of.cpt`	The number of estimated features in the mean of the vector `cptpath.object$x` based on the given `type` of the model
`cpts`	The locations of estimated features in the mean of the vector `cptpath.object$x`. These are the end-points of the corresponding constant-mean or constant-slope intervals
`est`	An estimate of the mean of the vector `cptpath.object$x`; for piecewise-constant signals, the values are the sample means of the data (replicated a suitable number of times) between each pair of consecutive detected change-points; for piecewise-linear but discontinuous signals, the values are based on the estimated linear trend between each pair of consecutive detected change of slopes; for piecewise-linear and continuous signals, it is similar to the previous case but with the continuity constraint enforced, which envolves solving a global least squares problem.

References

P. Fryzlewicz (2014). Wild binary segmentation for multiple change-point detection. The Annals of Statistics, 42(6), 2243–2281.

R. Baranowski, Y. Chen & P. Fryzlewicz (2019). Narrowest-over-threshold detection of multiple change points and change-point-like features. Journal of the Royal Statistical Society: Series B, 81(3), 649–672.

Examples

x <- c(rep(0, 100), rep(1, 100), rep(0, 100)) + rnorm(300)
model.ic(sol.wbs(x))
model.ic(sol.not(x))
x <- c(rep(0, 100), rep(1, 100), rep(0, 100)) + rnorm(300)
model.ic(sol.wbs(x))
model.ic(sol.not(x))

Estimating change-points in the piecewise-constant mean of a noisy data sequence via the localised pruning

Description

This function estimates the number and locations of change-points in the piecewise-constant mean of a noisy data sequence via the localised pruning method, which performs a Schwarz criterion-based model selection on the given candidate set in a localised way.

Usage

model.lp(
  cptpath.object,
  min.d = 5,
  penalty = c("log", "polynomial"),
  pen.exp = 1.01,
  do.thr = TRUE,
  th.const = 0.5
)
model.lp(
  cptpath.object,
  min.d = 5,
  penalty = c("log", "polynomial"),
  pen.exp = 1.01,
  do.thr = TRUE,
  th.const = 0.5
)

Arguments

`cptpath.object`	A solution-path object, returned by a `sol.[name]` routine. Note that the field `cptpath.object$x` contains the input data sequence.
`min.d`	A number specifying the minimal spacing between change points; `min.d = 5` by default
`penalty`	A string specifying the type of penalty term to be used in Schwarz criterion; possible values are: `"log"` Use `penalty = log(length(x))^pen.exp` `"polynomial"` Use `penalty = length(x)^pen.exp`
`pen.exp`	Exponent for the penalty term (see `penalty`)
`do.thr`	If `do.thr = TRUE`, mild threshoding on the CUSUM test statistics is performed after internal standardisation step in order to "pre-prune down" the candidates
`th.const`	A constant multiplied to `sqrt(2log(length(x)))` to form a mild threshold; if not supplied, a default value (`0.5` the value suggested in Fryzlewicz (2020)) is used, see `th.const` in `model.sdll`

Details

Further information can be found in Cho and Kirch (2022).

Value

An S3 object of class cptmodel, which contains the following fields:

`solution.path`	The solution path method used to obtain `cptpath.object`
`model.selection`	The model selection method used to return the final change-point estimators object, here its value is `"lp"`
`no.of.cpt`	The number of estimated change-points in the piecewise-constant mean of the vector `cptpath.object$x`
`cpts`	The locations of estimated change-points in the piecewise-constant mean of the vector `cptpath.object$x`. These are the end-points of the corresponding constant-mean intervals
`est`	An estimate of the piecewise-constant mean of the vector `cptpath.object$x`; the values are the sample means of the data (replicated a suitable number of times) between each pair of consecutive detected change-points

References

H. Cho & C. Kirch (2022) Two-stage data segmentation permitting multiscale change points, heavy tails and dependence. Annals of the Institute of Statistical Mathematics, 74(4), 653–684.

Examples

f <- rep(rep(c(0, 1), each = 50), 10)
x <- f + rnorm(length(f)) * .5
model.lp(sol.not(x))
f <- rep(rep(c(0, 1), each = 50), 10)
x <- f + rnorm(length(f)) * .5
model.lp(sol.not(x))

Estimating change-points in the piecewise-constant or piecewise-linear mean of a noisy data sequence via the Steepest Drop to Low Levels method

Description

This function estimates the number and locations of change-points in the piecewise-constant or piecewise-linear mean of a noisy data sequence via the Steepest Drop to Low Levels method.

Usage

model.sdll(
  cptpath.object,
  sigma = stats::mad(diff(cptpath.object$x)/sqrt(2)),
  universal = TRUE,
  th.const = NULL,
  th.const.min.mult = 0.3,
  lambda = 0.9
)
model.sdll(
  cptpath.object,
  sigma = stats::mad(diff(cptpath.object$x)/sqrt(2)),
  universal = TRUE,
  th.const = NULL,
  th.const.min.mult = 0.3,
  lambda = 0.9
)

Arguments

`cptpath.object`	A solution-path object, returned by a `sol.[name]` routine. The `cptpath.object$type` variable decides the model type: piecewise-constant (`type == "const"`), piecewise-linear and continuous (`type == "lin.cont"`) or piecewise-linear and discontinuous (`type == "lin.discont"`). In the piecewise-constant model, SDLL model selection should work well when `cptpath.object` is an object returned by the `sol.wbs2` routine. In the piecewise-linear model (whether continuous or not), the output of `sol.idetect` should be supplied as `cptpath.object`. Note that the field `cptpath.object$x` contains the input data sequence.
`sigma`	An estimate of the standard deviation of the noise in the data `cptpath.object$x`. Can be a functional of `cptpath.object$x` or a specific value if known. The default in the piecewise-constant model is the Median Absolute Deviation of the vector `diff(cptpath.object$x)/sqrt(2)`, tuned to the Gaussian distribution. In the piecewise-linear models, `diff(cptpath.object$x, differences = 2)/sqrt(6)` is used by default. Note that `model.sdll` works particularly well when the noise is i.i.d. Gaussian.
`universal`	If `TRUE`, then the threshold that decides if there are any change-points is chosen automatically, so that the probability of type-I error (i.e. indicating change-points if there are none) is approximately `1 - alpha`. If `FALSE`, then `th.const` must be specified.
`th.const`	Only relevant if `universal == FALSE`; in that case a numerical value must be provided. Used to create the threshold (applicable to the contrast magnitudes stored in `cptpath.object`) that decides if there are any change-points in the mean vector; that threshold is then `th.const * sqrt(2 * log(n)) * sigma`, where `n` is the length of the data vector `cptpath.object$x`.
`th.const.min.mult`	A fractional multiple of the threshold, used to decide the lowest magnitude of contrasts from `cptpath.object` still considered by the SDLL model selection criterion as potentially change-point-carrying.
`lambda`	Only relevant if `universal == TRUE`; can be set to 0.9 or 0.95. The approximate probability of not detecting any change-points if the truth does not contain any.

Details

The Steepest Drop to Low Levels method is described in "Detecting possibly frequent change-points: Wild Binary Segmentation 2 and steepest-drop model selection", P. Fryzlewicz (2020), Journal of the Korean Statistical Society, 49, 1027–1070.

Value

An S3 object of class cptmodel, which contains the following fields:

`solution.path`	The solution path method used to obtain `cptpath.object`
`type`	The model type used, inherited from the given `cptpath.object`
`model.selection`	The model selection method used to return the final change-point estimators object, here its value is `"sdll"`
`no.of.cpt`	The number of estimated change-points
`cpts`	The locations of estimated change-points
`est`	An estimate of the mean of the vector `cptpath.object$x`

References

P. Fryzlewicz (2020). Detecting possibly frequent change-points: Wild Binary Segmentation 2 and steepest-drop model selection. Journal of the Korean Statistical Society, 49, 1027–1070.

Examples

f <- rep(rep(c(0, 1), each = 50), 10)
x <- f + rnorm(length(f))
model.sdll(sol.wbs2(x))
f <- rep(rep(c(0, 1), each = 50), 10)
x <- f + rnorm(length(f))
model.sdll(sol.wbs2(x))

Estimating change-points in the piecewise-constant or piecewise-linear mean of a noisy data sequence via thresholding

Description

This function estimates the number and locations of change-points in the piecewise-constant or piecewise-linear mean of a noisy data sequence via thresholding.

Usage

model.thresh(cptpath.object, sigma = NULL, th.const = NULL)
model.thresh(cptpath.object, sigma = NULL, th.const = NULL)

Arguments

`cptpath.object`	A solution-path object, returned by a `sol.[name]` routine. The `cptpath.object$type` variable decides the model type: piecewise-constant (`type == "const"`), piecewise-linear and continuous (`type == "lin.cont"`) or piecewise-linear and discontinuous (`type == "lin.discont"`). In the piecewise-linear model (whether continuous or not), the output of `sol.idetect_seq` or `sol.not` should be supplied as `cptpath.object`. Note that the field `cptpath.object$x` contains the input data sequence.
`sigma`	An estimate of the standard deviation of the noise in the data `cptpath.object$x`. Can be a functional of `cptpath.object$x` or a specific value if known. The default in the piecewise-constant model is the Median Absolute Deviation of the vector `diff(cptpath.object$x)/sqrt(2)`, tuned to the Gaussian distribution. In the piecewise-linear models, `diff(cptpath.object$x, differences = 2)/sqrt(6)` is used by default. Note that `model.thresh` works particularly well when the noise is i.i.d. Gaussian.
`th.const`	A positive real number used to define the threshold for the detection process. The default used in the piecewise-constant model is 1.15, while in the piecewise-linear model, the value is taken equal to 1.4.

Value

An S3 object of class cptmodel, which contains the following fields:

`solution.path`	The solution path method used to obtain `cptpath.object`
`type`	The model type used, inherited from the given `cptpath.object`
`model.selection`	The model selection method used to return the final change-point estimators object, here its value is `"thresh"`
`no.of.cpt`	The number of estimated change-points
`cpts`	The locations of estimated change-points
`est`	An estimate of the mean of the vector `cptpath.object$x`

Examples

f <- rep(rep(c(0, 1), each = 50), 10)
x <- f + rnorm(length(f))
model.thresh(sol.idetect_seq(x))
f <- rep(rep(c(0, 1), each = 50), 10)
x <- f + rnorm(length(f))
model.thresh(sol.idetect_seq(x))

Change-points estimated by the "breakfast" routine

Description

Plot method for objects of class breakfast.cpts

Usage

## S3 method for class 'breakfast.cpts'
plot(x, display.data = TRUE, ...)
## S3 method for class 'breakfast.cpts'
plot(x, display.data = TRUE, ...)

Arguments

`x`	a `breakfast.cpts` object
`display.data`	if `display.data = TRUE`, change-point estimators are plotted against the data by method. If `display.data = FALSE`, only the estimators are plotted; this option is recommended when `length(x)` is large.
`...`	current not in use

Examples

f <- rep(rep(c(0, 1), each = 50), 5)
x <- f + rnorm(length(f)) * .5
plot(breakfast(x, solution.path = 'all', model.selection = 'all'), display.data = TRUE)
plot(breakfast(x), display.data = FALSE)
f <- rep(rep(c(0, 1), each = 50), 5)
x <- f + rnorm(length(f)) * .5
plot(breakfast(x, solution.path = 'all', model.selection = 'all'), display.data = TRUE)
plot(breakfast(x), display.data = FALSE)

Change-points estimated by the "breakfast" routine

Description

Print method for objects of class breakfast.cpts

Usage

## S3 method for class 'breakfast.cpts'
print(x, by = c("method", "estimator"), ...)
## S3 method for class 'breakfast.cpts'
print(x, by = c("method", "estimator"), ...)

Arguments

`x`	a `breakfast.cpts` object
`by`	if `by = 'method'`, change-point estimators are printed by method; if `by = 'estimator'`, each change-point estimator is printed with the methods that detect it.
`...`	current not in use

Examples

f <- rep(rep(c(0, 1), each = 50), 5)
x <- f + rnorm(length(f)) * .5
print(breakfast(x, solution.path = 'all', model.selection = 'all'), by = 'method')
print(breakfast(x), by = 'estimator')
f <- rep(rep(c(0, 1), each = 50), 5)
x <- f + rnorm(length(f)) * .5
print(breakfast(x, solution.path = 'all', model.selection = 'all'), by = 'method')
print(breakfast(x), by = 'estimator')

Change-points estimated by solution path generation + model selection methods

Description

Print method for objects of class cptmodel

Usage

## S3 method for class 'cptmodel'
print(x, ...)
## S3 method for class 'cptmodel'
print(x, ...)

Arguments

`x`	a `cptmodel` object
`...`	current not in use

Examples

f <- rep(rep(c(0, 1), each = 50), 5)
x <- f + rnorm(length(f)) * .5
print(model.ic(sol.idetect(x)))
f <- rep(rep(c(0, 1), each = 50), 5)
x <- f + rnorm(length(f)) * .5
print(model.ic(sol.idetect(x)))

Solution path generation via the Isolate-Detect method

Description

This function arranges all possible change-points in the mean of the input vector, or in its linear trend, in the order of importance, via the Isolate-Detect (ID) method. It is developed to be used with the sdll and information criterion (ic) model selection rules.

Usage

sol.idetect(
  x,
  type = "const",
  thr_ic_cons = 0.9,
  thr_ic_lin = 1.25,
  points = 3
)
sol.idetect(
  x,
  type = "const",
  thr_ic_cons = 0.9,
  thr_ic_lin = 1.25,
  points = 3
)

Arguments

`x`	A numeric vector containing the data to be processed.
`type`	The model type considered. `type = "const"`, `type = "lin.cont"`, `type = "lin.discont"` mean, respectively, that the signal (mean of `x`) is piecewise constant, piecewise linear and continuous, and piecewise linear but not necessarily continuous. If not given, the default is `type = "const"`
`thr_ic_cons`	A positive real number with default value equal to 0.9. It is used to create the solution path for the piecewise-constant model. The lower the value, the longer the solution path.
`thr_ic_lin`	A positive real number with default value 1.25. Used to create the solution path if `type == "lin.cont"` or `type == "lin.discont"`
`points`	A positive integer with default value equal to 3. It defines the distance between two consecutive end- or start-points of the right- or left-expanding intervals, as described in the Isolate-Detect methodology.

Details

The Isolate-Detect method and its algorithm is described in "Detecting multiple generalized change-points by isolating single ones", A. Anastasiou & P. Fryzlewicz (2022), Metrika, https://doi.org/10.1007/s00184-021-00821-6.

Value

An S3 object of class cptpath, which contains the following fields:

`solutions.nested`	`TRUE`, i.e., the change-point outputs are nested
`solution.path`	Locations of possible change-points in the mean of `x`, arranged in decreasing order of change-point importance
`solution.set`	Empty list
`x`	Input vector `x`
`type`	The input parameter `type`
`cands`	Matrix of dimensions length(`x`) - 1 by 4. The first two columns are (start, end)-points of the detection intervals of the corresponding possible change-point location in the third column. The fourth column is a measure of strength of the corresponding possible change-point. The order of the rows is the same as the order returned in `solution.path`
`method`	The method used, which has value "idetect" here

References

A. Anastasiou & P. Fryzlewicz (2022). Detecting multiple generalized change-points by isolating single ones. Metrika, https://doi.org/10.1007/s00184-021-00821-6.

Examples

r3 <- rnorm(1000) + c(rep(0,300), rep(2,200), rep(-4,300), rep(0,200))
sol.idetect(r3)
r3 <- rnorm(1000) + c(rep(0,300), rep(2,200), rep(-4,300), rep(0,200))
sol.idetect(r3)

Solution path generation using the sequential approach of the Isolate-Detect method

Description

Usage

sol.idetect_seq(x, type = "const", points = 4)
sol.idetect_seq(x, type = "const", points = 4)

Arguments

`x`	A numeric vector containing the data to be processed
`type`	The model type considered. `type = "const"`, `type = "lin.cont"`, `type = "lin.discont"` mean, respectively, that the signal (mean of `x`) is piecewise constant, piecewise linear and continuous, and piecewise linear but not necessarily continuous. If not given, the default is `type = "const"`
`points`	A positive integer with default value equal to 4. It defines the distance between two consecutive end- or start-points of the right- or left-expanding intervals, as described in the Isolate-Detect methodology.

Details

Value

An S3 object of class cptpath, which contains the following fields:

`solutions.nested`	`TRUE`, i.e., the change-point outputs are nested
`solution.path`	Locations of possible change-points, arranged in decreasing order of change-point importance
`solution.set`	Empty list
`x`	Input vector `x`
`type`	The input parameter `type`
`cands`	Matrix of dimensions length(`x`) - 1 by 4. The first two columns are (start, end)-points of the detection intervals of the corresponding possible change-point location in the third column. The fourth column is a measure of strength of the corresponding possible change-point. The order of the rows is the same as the order returned in `solution.path`
`method`	The method used, which has value "idetect_seq" here

References

A. Anastasiou & P. Fryzlewicz (2022). Detecting multiple generalized change-points by isolating single ones. Metrika, https://doi.org/10.1007/s00184-021-00821-6.

Examples

r3 <- rnorm(1000) + c(rep(0,300), rep(2,200), rep(-4,300), rep(0,200))
sol.idetect_seq(r3)
r3 <- rnorm(1000) + c(rep(0,300), rep(2,200), rep(-4,300), rep(0,200))
sol.idetect_seq(r3)

Solution path generation via the Narrowest-Over-Threshold method

Description

This function arranges all possible features (e.g. change in the mean, change in the slope, etc) of the input vector in the order of importance, via the Narrowest-Over-Threshold (NOT) method.

Usage

sol.not(x, type = "const", M = 10000, systematic.intervals = TRUE, seed = NULL)
sol.not(x, type = "const", M = 10000, systematic.intervals = TRUE, seed = NULL)

Arguments

`x`	A numeric vector containing the data to be processed
`type`	The model type considered. `type = "const"` means the signals are the piecewise constant, `type = "lin.cont"` means the signals are the piecewise linear and continuous, and `type = "lin.discont"` means the signals are the piecewise linear but not necessarily continuous. If not given, the default is `type = "const"`
`M`	The maximum number of all data sub-samples at the beginning of the algorithm. The default is `M = 10000`
`systematic.intervals`	When drawing the sub-intervals, whether to use a systematic (and fixed) or random scheme. The default is `systematic.intervals = TRUE`
`seed`	If a random scheme is used, a random seed can be provided so that every time the same sets of random sub-intervals would be drawn. The default is `seed = NULL`, which means that this option is not taken

Details

The Narrowest-Over-Threshold method and its algorithm is described in "Narrowest-over-threshold detection of multiple change points and change-point-like features", R. Baranowski, Y. Chen and P. Fryzlewicz (2019), Journal of Royal Statistical Society: Series B, 81(3), 649–672.

Value

An S3 object of class cptpath, which contains the following fields:

`solutions.nested`	`FALSE`, i.e., the change-point outputs are not nested
`solution.path`	Empty list
`solution.set`	Locations of possible change-points in the mean of `x` for each threshold level (in the decreasing order), arranged in the form of a list of lists
`solution.set.th`	A list that contains threshold levels corresponding to the detections in `solution.set`
`x`	Input vector `x`
`type`	The model type used, which is given in the input. If not given, the default is `type="const"`
`M`	Input parameter `M`
`cands`	Matrix of dimensions length(`x`) - 1 by 4. The first two columns are (start, end)-points of the detection intervals of the corresponding possible change-point location in the third column resulted from applying NOT to all threshold levels. The fourth column is a measure of strength of the corresponding possible change-point. The order of the rows reflect the strength of each detection in decreasing order. To avoid repetition, each possible location would appear at most once in the matrix (with the sub-interval that carries its highest possible strength)
`method`	The method used, which has value "not" here

References

Examples

r3 <- rnorm(1000) + c(rep(0,300), rep(2,200), rep(-4,300), rep(0,200))
sol.not(r3)
r3 <- rnorm(1000) + c(rep(0,300), rep(2,200), rep(-4,300), rep(0,200))
sol.not(r3)

Solution path generation via the Tail-Greedy Unbalanced Haar method

Description

This function arranges all possible change-points in the mean of the input vector in the order of importance, via the Tail-Greedy Unbalanced Haar method.

Usage

sol.tguh(x, type = "const", p = 0.01)
sol.tguh(x, type = "const", p = 0.01)

Arguments

`x`	A numeric vector containing the data to be processed
`type`	The model type considered. `type = "const"` means piecewise-constant; this is the only type currently supported in `sol.tguh`
`p`	Specifies the number of region pairs merged in each pass through the data, as the proportion of all remaining region pairs. The default is `p = 0.01`

Details

The Tail-Greedy Unbalanced Haar decomposition algorithm is described in "Tail-greedy bottom-up data decompositions and fast multiple change-point detection", P. Fryzlewicz (2018), The Annals of Statistics, 46, 3390–3421.

Value

An S3 object of class cptpath, which contains the following fields:

`solutions.nested`	`TRUE`, i.e., the change-point outputs are nested
`solution.path`	Locations of possible change-points in the mean of `x`, arranged in decreasing order of change-point importance
`solution.set`	Empty list
`x`	Input vector `x`
`type`	Input parameter `type`
`p`	Input parameter `p`
`cands`	Matrix of dimensions length(`x`) - 1 by 4. The first two columns are (start, end)-points of the detection intervals of the corresponding possible change-point location in the third column. The fourth column is a measure of strength of the corresponding possible change-point. The order of the rows is the same as the order returned in `solution.path`
`method`	The method used, which has value "tguh" here

References

P. Fryzlewicz (2018). Tail-greedy bottom-up data decompositions and fast multiple change-point detection. The Annals of Statistics, 46, 3390–3421.

Examples

r3 <- rnorm(1000) + c(rep(0,300), rep(2,200), rep(-4,300), rep(0,200))
sol.tguh(r3)
r3 <- rnorm(1000) + c(rep(0,300), rep(2,200), rep(-4,300), rep(0,200))
sol.tguh(r3)

Solution path generation via the Wild Binary Segmentation method

Description

This function arranges all possible change-in-mean features of the input vector in the order of importance, via the Wild Binary Segmentation (WBS) method.

Usage

sol.wbs(x, type = "const", M = 10000, systematic.intervals = TRUE, seed = NULL)
sol.wbs(x, type = "const", M = 10000, systematic.intervals = TRUE, seed = NULL)

Arguments

`x`	A numeric vector containing the data to be processed
`type`	The model type considered. Currently `type = "const"` is the only accepted value. This assumes that the mean of the input vector is piecewise-constant.
`M`	The maximum number of all data sub-samples at the beginning of the algorithm. The default is `M = 10000`
`systematic.intervals`	When drawing the sub-intervals, whether to use a systematic (and fixed) or random scheme. The default is `systematic.intervals = TRUE`
`seed`	If a random scheme is used, a random seed can be provided so that every time the same sets of random sub-intervals would be drawn. The default is `seed = NULL`, which means that this option is not set

Details

The Wild Binary Segmentation algorithm is described in "Wild binary segmentation for multiple change-point detection", P. Fryzlewicz (2014), The Annals of Statistics, 42: 2243–2281.

Value

An S3 object of class cptpath, which contains the following fields:

`solutions.nested`	`TRUE`, i.e., the change-point outputs are nested
`solution.path`	Locations of possible change-points in the mean of `x`, arranged in decreasing order of change-point importance
`solution.set`	Empty list
`x`	Input vector `x`
`type`	The input parameter `type`
`M`	Input parameter `M`
`cands`	Matrix of dimensions length(`x`) - 1 by 4. The first two columns are (start, end)-points of the detection intervals of the corresponding possible change-point location in the third column. The fourth column is a measure of strength of the corresponding possible change-point. The order of the rows is the same as the order returned in `solution.path`
`method`	The method used, which has value "wbs" here

References

P. Fryzlewicz (2014). Wild binary segmentation for multiple change-point detection. The Annals of Statistics, 42(6), 2243–2281.

Examples

r3 <- rnorm(1000) + c(rep(0,300), rep(2,200), rep(-4,300), rep(0,200))
sol.wbs(r3)
r3 <- rnorm(1000) + c(rep(0,300), rep(2,200), rep(-4,300), rep(0,200))
sol.wbs(r3)

Solution path generation via the Wild Binary Segmentation 2 method

Description

This function arranges all possible change-points in the mean of the input vector in the order of importance, via the Wild Binary Segmentation 2 method.

Usage

sol.wbs2(x, type = "const", M = 1000, systematic.intervals = TRUE, seed = NULL)
sol.wbs2(x, type = "const", M = 1000, systematic.intervals = TRUE, seed = NULL)

Arguments

`x`	A numeric vector containing the data to be processed.
`type`	The model type considered. `type = "const"` means piecewise-constant; this is the only type currently supported in `sol.wbs2`
`M`	The maximum number of data sub-samples drawn at each recursive stage of the algorithm. The default is `M = 1000`. Setting `M = 0` executes the standard binary segmentation.
`systematic.intervals`	Whether data sub-intervals for CUSUM computation are drawn systematically (TRUE; start- and end-points taken from an approximately equispaced grid) or randomly (FALSE; obtained uniformly with replacement). The default is TRUE.
`seed`	If a random scheme is used, a random seed can be provided so that every time the same sets of random sub-intervals would be drawn. The default is `seed = NULL`, which means that this option is not set

Details

The Wild Binary Segmentation 2 algorithm is described in "Detecting possibly frequent change-points: Wild Binary Segmentation 2 and steepest-drop model selection", P. Fryzlewicz (2020), Journal of the Korean Statistical Society, 49, 1027-1070.

Value

An S3 object of class cptpath, which contains the following fields:

`solutions.nested`	`TRUE`, i.e., the change-point outputs are nested
`solution.path`	Locations of possible change-points in the mean of `x`, arranged in decreasing order of change-point importance
`solution.set`	Empty list
`x`	Input vector `x`
`type`	Input parameter `type`
`M`	Input parameter `M`
`cands`	Matrix of dimensions length(`x`) - 1 by 4. The first two columns are (start, end)-points of the detection intervals of the corresponding possible change-point location in the third column. The fourth column is a measure of strength of the corresponding possible change-point. The order of the rows is the same as the order returned in `solution.path`
`method`	The method used, which has value "wbs2" here

References

P. Fryzlewicz (2020). Detecting possibly frequent change-points: Wild Binary Segmentation 2 and steepest-drop model selection. Journal of the Korean Statistical Society, 49, 1027-1070.

Examples

r3 <- rnorm(1000) + c(rep(0,300), rep(2,200), rep(-4,300), rep(0,200))
sol.wbs2(r3)
r3 <- rnorm(1000) + c(rep(0,300), rep(2,200), rep(-4,300), rep(0,200))
sol.wbs2(r3)

Solution path generation via the Wild Contrast Maximisation method

Description

This function arranges all possible change-points in the mean of the input vector in the order of importance, via the Wild Binary Segmentation 2 method.

Usage

sol.wcm(
  x,
  type = "const",
  M = 100,
  min.d = NULL,
  Q = floor(log(length(x))^1.9),
  max.iter = 5
)
sol.wcm(
  x,
  type = "const",
  M = 100,
  min.d = NULL,
  Q = floor(log(length(x))^1.9),
  max.iter = 5
)

Arguments

`x`	A numeric vector containing the data to be processed.
`type`	The type of change-point models fitted to the data; currently the class of piecewise constant signals (`type = "const"`) is supported.
`M`	The maximum number of data sub-samples drawn at each recursive stage of the algorithm. The default is `M = 100`.
`min.d`	The minimum distance between candidate change-point estimators; if `min.d = NULL`, it is set to be `max(20, 10 + ceiling(log(length(x))^1.1)`.
`Q`	The maximum number of allowable change-points. The default is `Q = floor(log(length(x))^1.9)`.
`max.iter`	The maximum number of candidate change-point models considered; if a model with the number of change-point estimators exceeding `Q` is required to generate the sequence of required candidate models, this argument is ignored. The default is `max.iter = 5`.

Details

The Wild Contrast Maximisation (WCM) algorithm generates a nested sequence of candidate models by identifying large gaps in the solution path generated by WBS2, which aids the model selection step in the presence of large random fluctuations due to serial dependence. See Cho and Fryzlewicz (2023) for further details.

Value

An S3 object of class cptpath, which contains the following fields:

`solutions.nested`	`TRUE`, i.e., the change-point outputs are nested
`solution.path`	Locations of possible change-points in the mean of `x`, arranged in decreasing order of change-point importance; this is not used by `model.gsa`
`solution.set`	A list of candidate change-point models. Each model contains possible change-points in the mean of `x`; this is used by `model.gsa`
`x`	Input vector `x`
`type`	The type of the change-point model considered, which has value "const" here
`M`	Input parameter `M`
`cands`	Matrix of dimensions `Q` by 4. The first two columns are (start, end)-points of the detection intervals of the corresponding possible change-point location in the third column. The fourth column is a measure of strength of the corresponding possible change-point. The order of the rows is the same as the order returned in `solution.path`
`method`	The method used, which has value "wcm" here

References

H. Cho & P. Fryzlewicz (2024) Multiple change point detection under serial dependence: Wild contrast maximisation and gappy Schwarz algorithm. Journal of Time Series Analysis, 45(3): 479–494.

Examples

set.seed(111)
f <- rep(c(0, 5, 2, 8, 1, -2), c(100, 200, 200, 50, 200, 250))
x <- f + arima.sim(list(ar = c(.75, -.5), ma = c(.8, .7, .6, .5, .4, .3)), n = length(f), sd = 1)
sol.wcm(x)$solution.set
set.seed(111)
f <- rep(c(0, 5, 2, 8, 1, -2), c(100, 200, 200, 50, 200, 250))
x <- f + arima.sim(list(ar = c(.75, -.5), ma = c(.8, .7, .6, .5, .4, .3)), n = length(f), sd = 1)
sol.wcm(x)$solution.set

Package 'breakfast'

Help Index

Breakfast: Methods for Fast Multiple Change-point Detection and Estimation

Description

Details

Author(s)

See Also

Methods for fast detection of multiple change-points

Description

Usage

Arguments

Details

Value

References

Examples

Estimate the location of change-points when the number of them is fixed

Description

Usage

Arguments

Details

Value

See Also

Examples

Estimating change-points in the piecewise-constant mean of a noisy data sequence with auto-regressive noise via gappy Schwarz algorithm

Description

Usage

Arguments

Details

Value

References

See Also

Examples

Estimating change-points or change-point-type features in the mean of a noisy data sequence via the strengthened Schwarz information criterion

Description

Usage

Arguments

Details

Value

References

See Also

Examples

Estimating change-points in the piecewise-constant mean of a noisy data sequence via the localised pruning

Description

Usage

Arguments

Details

Value

References

See Also

Examples

Estimating change-points in the piecewise-constant or piecewise-linear mean of a noisy data sequence via the Steepest Drop to Low Levels method

Description

Usage

Arguments

Details

Value

References

See Also

Examples

Estimating change-points in the piecewise-constant or piecewise-linear mean of a noisy data sequence via thresholding

Description

Usage

Arguments

Value

See Also

Examples

Change-points estimated by the "breakfast" routine

Description

Usage

Arguments

Examples

Change-points estimated by the "breakfast" routine

Description

Usage

Arguments

Examples

Change-points estimated by solution path generation + model selection methods

Description

Usage

Arguments