Title: | Tools for Analyzing Finite Mixture Models |
---|---|
Description: | Analyzes finite mixture models for various parametric and semiparametric settings. This includes mixtures of parametric distributions (normal, multivariate normal, multinomial, gamma), various Reliability Mixture Models (RMMs), mixtures-of-regressions settings (linear regression, logistic regression, Poisson regression, linear regression with changepoints, predictor-dependent mixing proportions, random effects regressions, hierarchical mixtures-of-experts), and tools for selecting the number of components (bootstrapping the likelihood ratio test statistic, mixturegrams, and model selection criteria). Bayesian estimation of mixtures-of-linear-regressions models is available as well as a novel data depth method for obtaining credible bands. This package is based upon work supported by the National Science Foundation under Grant No. SES-0518772 and the Chan Zuckerberg Initiative: Essential Open Source Software for Science (Grant No. 2020-255193). |
Authors: | Derek Young [aut, cre] , Tatiana Benaglia [aut], Didier Chauveau [aut], David Hunter [aut], Kedai Cheng [aut], Ryan Elmore [ctb], Thomas Hettmansperger [ctb], Hoben Thomas [ctb], Fengjuan Xuan [ctb] |
Maintainer: | Derek Young <[email protected]> |
License: | GPL (>= 2) |
Version: | 2.0.0 |
Built: | 2025-01-13 03:48:53 UTC |
Source: | https://github.com/dsy109/mixtools |
Performs a parametric bootstrap by producing B bootstrap realizations of the likelihood ratio statistic for testing the null hypothesis of a k-component fit versus the alternative hypothesis of a (k+1)-component fit to various mixture models. This is performed for up to a specified number of maximum components, k. A p-value is calculated for each test and once the p-value is above a specified significance level, the testing terminates. An optional histogram showing the distribution of the likelihood ratio statistic along with the observed statistic can also be produced.
boot.comp(y, x = NULL, N = NULL, max.comp = 2, B = 100, sig = 0.05, arbmean = TRUE, arbvar = TRUE, mix.type = c("logisregmix", "multmix", "mvnormalmix", "normalmix", "poisregmix", "regmix", "regmix.mixed", "repnormmix"), hist = TRUE, ...)
boot.comp(y, x = NULL, N = NULL, max.comp = 2, B = 100, sig = 0.05, arbmean = TRUE, arbvar = TRUE, mix.type = c("logisregmix", "multmix", "mvnormalmix", "normalmix", "poisregmix", "regmix", "regmix.mixed", "repnormmix"), hist = TRUE, ...)
y |
The raw data for |
x |
The predictor values required only for the regression mixtures |
N |
An n-vector of number of trials for the logistic regression type |
max.comp |
The maximum number of components to test for. The default is 2. This function will
perform a test of k-components versus (k+1)-components sequentially until we fail to reject the null hypothesis.
This decision rule is governed by the calculated p-value and |
B |
The number of bootstrap realizations of the likelihood ratio statistic to produce. The default is 100, but ideally, values of 1000 or more would be more acceptable. |
sig |
The significance level for which to compare the p-value against when performing the test of k-components versus (k+1)-components. |
arbmean |
If FALSE, then a scale mixture analysis can be performed for |
arbvar |
If FALSE, then a location mixture analysis can be performed for |
mix.type |
The type of mixture analysis you wish to perform. The data inputted for |
hist |
An argument to provide a matrix plot of histograms for the boostrapped likelihood ratio statistic. |
... |
Additional arguments passed to the various EM algorithms for the mixture of interest. |
boot.comp
returns a list with items:
p.values |
The p-values for each test of k-components versus (k+1)-components. |
log.lik |
The B bootstrap realizations of the likelihood ratio statistic. |
obs.log.lik |
The observed likelihood ratio statistic for each test which is used in determining the p-values. |
McLachlan, G. J. and Peel, D. (2000) Finite Mixture Models, John Wiley and Sons, Inc.
logisregmixEM
, multmixEM
, mvnormalmixEM
, normalmixEM
,
poisregmixEM
, regmixEM
, regmixEM.mixed
, repnormmixEM
## Bootstrapping to test the number of components on the RTdata. data(RTdata) set.seed(100) x <- as.matrix(RTdata[, 1:3]) y <- makemultdata(x, cuts = quantile(x, (1:9)/10))$y a <- boot.comp(y = y, max.comp = 1, B = 5, mix.type = "multmix", epsilon = 1e-3) a$p.values
## Bootstrapping to test the number of components on the RTdata. data(RTdata) set.seed(100) x <- as.matrix(RTdata[, 1:3]) y <- makemultdata(x, cuts = quantile(x, (1:9)/10))$y a <- boot.comp(y = y, max.comp = 1, B = 5, mix.type = "multmix", epsilon = 1e-3) a$p.values
Performs a parametric bootstrap by producing B bootstrap samples for the parameters in the specified mixture model.
boot.se(em.fit, B = 100, arbmean = TRUE, arbvar = TRUE, N = NULL, ...)
boot.se(em.fit, B = 100, arbmean = TRUE, arbvar = TRUE, N = NULL, ...)
em.fit |
An object of class |
B |
The number of bootstrap samples to produce. The default is 100, but ideally, values of 1000 or more would be more acceptable. |
arbmean |
If FALSE, then a scale mixture analysis can be performed for |
arbvar |
If FALSE, then a location mixture analysis can be performed for |
N |
An n-vector of number of trials for the logistic regression type |
... |
Additional arguments passed to the various EM algorithms for the mixture of interest. |
boot.se
returns a list with the bootstrap samples and standard errors for the mixture of interest.
McLachlan, G. J. and Peel, D. (2000) Finite Mixture Models, John Wiley and Sons, Inc.
## Bootstrapping standard errors for a regression mixture case. data(NOdata) attach(NOdata) set.seed(100) em.out <- regmixEM(Equivalence, NO, arbvar = FALSE) out.bs <- boot.se(em.out, B = 10, arbvar = FALSE) out.bs
## Bootstrapping standard errors for a regression mixture case. data(NOdata) attach(NOdata) set.seed(100) em.out <- regmixEM(Equivalence, NO, arbvar = FALSE) out.bs <- boot.se(em.out, B = 10, arbvar = FALSE) out.bs
This data set gives the gross national product (GNP) per capita in 1996 for various countries as well as their estimated carbon dioxide (CO2) emission per capita for the same year.
data(CO2data)
data(CO2data)
This data frame consists of 28 countries and the following columns:
GNP
The gross national product per capita in 1996.
CO2
The estimated carbon dioxide emission per capita in 1996.
country
An abbreviation pertaining to the country measured
(e.g., "GRC" = Greece and "CH" = Switzerland).
Hurn, M., Justel, A. and Robert, C. P. (2003) Estimating Mixtures of Regressions, Journal of Computational and Graphical Statistics 12(1), 55–79.
Plot the components' CDF via the posterior probabilities.
compCDF(data, weights, x=seq(min(data, na.rm=TRUE), max(data, na.rm=TRUE), len=250), comp=1:NCOL(weights), makeplot=TRUE, ...)
compCDF(data, weights, x=seq(min(data, na.rm=TRUE), max(data, na.rm=TRUE), len=250), comp=1:NCOL(weights), makeplot=TRUE, ...)
data |
A matrix containing the raw data. Rows are subjects and columns are repeated measurements. |
weights |
The weights to compute the empirical CDF; however, most of time they are the posterior probabilities. |
x |
The points at which the CDFs are to be evaluated. |
comp |
The mixture components for which CDFs are desired. |
makeplot |
Logical: Should a plot be produced as a side effect? |
... |
Additional arguments (other than |
When makeplot
is TRUE
, a line plot is produced of the
CDFs evaluated at x
. The plot is not a step function plot;
the points are simply joined by line segments.
A matrix with length(comp)
rows and length(x)
columns
in which each row gives the CDF evaluated at each point of x
.
McLachlan, G. J. and Peel, D. (2000) Finite Mixture Models, John Wiley and Sons, Inc.
Elmore, R. T., Hettmansperger, T. P. and Xuan, F. (2004) The Sign Statistic, One-Way Layouts and Mixture Models, Statistical Science 19(4), 579–587.
makemultdata
, multmixmodel.sel
, multmixEM
.
## The sulfur content of the coal seams in Texas set.seed(100) A <- c(1.51, 1.92, 1.08, 2.04, 2.14, 1.76, 1.17) B <- c(1.69, 0.64, .9, 1.41, 1.01, .84, 1.28, 1.59) C <- c(1.56, 1.22, 1.32, 1.39, 1.33, 1.54, 1.04, 2.25, 1.49) D <- c(1.3, .75, 1.26, .69, .62, .9, 1.2, .32) E <- c(.73, .8, .9, 1.24, .82, .72, .57, 1.18, .54, 1.3) dis.coal <- makemultdata(A, B, C, D, E, cuts = median(c(A, B, C, D, E))) temp <- multmixEM(dis.coal) ## Now plot the components' CDF via the posterior probabilities compCDF(dis.coal$x, temp$posterior, xlab="Sulfur", ylab="", main="empirical CDFs")
## The sulfur content of the coal seams in Texas set.seed(100) A <- c(1.51, 1.92, 1.08, 2.04, 2.14, 1.76, 1.17) B <- c(1.69, 0.64, .9, 1.41, 1.01, .84, 1.28, 1.59) C <- c(1.56, 1.22, 1.32, 1.39, 1.33, 1.54, 1.04, 2.25, 1.49) D <- c(1.3, .75, 1.26, .69, .62, .9, 1.2, .32) E <- c(.73, .8, .9, 1.24, .82, .72, .57, 1.18, .54, 1.3) dis.coal <- makemultdata(A, B, C, D, E, cuts = median(c(A, B, C, D, E))) temp <- multmixEM(dis.coal) ## Now plot the components' CDF via the posterior probabilities compCDF(dis.coal$x, temp$posterior, xlab="Sulfur", ylab="", main="empirical CDFs")
Takes an object of class npEM
and returns an object of class
density
giving the kernel density estimate for the selected
component and, if applicable, the selected block.
## S3 method for class 'npEM' density(x, u=NULL, component=1, block=1, scale=FALSE, ...)
## S3 method for class 'npEM' density(x, u=NULL, component=1, block=1, scale=FALSE, ...)
x |
An object of class |
u |
Vector of points at which the density is to be evaluated |
component |
Mixture component number; should be an integer from 1 to the
number of columns of |
block |
Block of repeated measures. Only applicable in repeated measures
case, for which |
scale |
Logical: If TRUE, multiply the density values by the
corresponding mixing proportions found in |
... |
Additional arguments; not used by this method. |
The bandwidth is taken to be the same as that used to produce the npEM
object, which is given by x$bandwidth
.
density.npEM
returns a list of type "density"
. See
density
for details. In particular, the output of
density.npEM
may be used directly by functions such as
plot
or lines
.
## Look at histogram of Old Faithful waiting times data(faithful) Minutes <- faithful$waiting hist(Minutes, freq=FALSE) ## Superimpose equal-variance normal mixture fit: set.seed(100) nm <- normalmixEM(Minutes, mu=c(50,80), sigma=5, arbvar=FALSE, fast=TRUE) x <- seq(min(Minutes), max(Minutes), len=200) for (j in 1:2) lines(x, nm$lambda[j]*dnorm(x, mean=nm$mu[j], sd=nm$sigma), lwd=3, lty=2) ## Superimpose several semiparametric fits with different bandwidths: bw <- c(1, 3, 5) for (i in 1:3) { sp <- spEMsymloc(Minutes, c(50,80), bw=bw[i], eps=1e-3) for (j in 1:2) lines(density(sp, component=j, scale=TRUE), col=1+i, lwd=2) } legend("topleft", legend=paste("Bandwidth =",bw), fill=2:4)
## Look at histogram of Old Faithful waiting times data(faithful) Minutes <- faithful$waiting hist(Minutes, freq=FALSE) ## Superimpose equal-variance normal mixture fit: set.seed(100) nm <- normalmixEM(Minutes, mu=c(50,80), sigma=5, arbvar=FALSE, fast=TRUE) x <- seq(min(Minutes), max(Minutes), len=200) for (j in 1:2) lines(x, nm$lambda[j]*dnorm(x, mean=nm$mu[j], sd=nm$sigma), lwd=3, lty=2) ## Superimpose several semiparametric fits with different bandwidths: bw <- c(1, 3, 5) for (i in 1:3) { sp <- spEMsymloc(Minutes, c(50,80), bw=bw[i], eps=1e-3) for (j in 1:2) lines(density(sp, component=j, scale=TRUE), col=1+i, lwd=2) } legend("topleft", legend=paste("Bandwidth =",bw), fill=2:4)
Takes an object of class spEM
and returns an object of class
density
giving the kernel density estimate.
## S3 method for class 'spEM' density(x, u=NULL, component=1, block=1, scale=FALSE, ...)
## S3 method for class 'spEM' density(x, u=NULL, component=1, block=1, scale=FALSE, ...)
x |
An object of class |
u |
Vector of points at which the density is to be evaluated |
component |
Mixture component number; should be an integer from 1 to the
number of columns of |
block |
Block of repeated measures. Only applicable in repeated measures
case, for which |
scale |
Logical: If TRUE, multiply the density values by the
corresponding mixing proportions found in |
... |
Additional arguments; not used by this method. |
The bandwidth is taken to be the same as that used to produce the npEM
object, which is given by x$bandwidth
.
density.spEM
returns a list of type "density"
. See
density
for details. In particular, the output of
density.spEM
may be used directly by functions such as
plot
or lines
.
set.seed(100) mu <- matrix(c(0, 15), 2, 3) sigma <- matrix(c(1, 5), 2, 3) x <- rmvnormmix(300, lambda = c(.4,.6), mu = mu, sigma = sigma) d <- spEM(x, mu0 = 2, blockid = rep(1,3), constbw = TRUE) plot(d, xlim=c(-10, 40), ylim = c(0, .16), xlab = "", breaks = 30, cex.lab=1.5, cex.axis=1.5) # plot.spEM calls density.spEM here
set.seed(100) mu <- matrix(c(0, 15), 2, 3) sigma <- matrix(c(1, 5), 2, 3) x <- rmvnormmix(300, lambda = c(.4,.6), mu = mu, sigma = sigma) d <- spEM(x, mu0 = 2, blockid = rep(1,3), constbw = TRUE) plot(d, xlim=c(-10, 40), ylim = c(0, .16), xlab = "", breaks = 30, cex.lab=1.5, cex.axis=1.5) # plot.spEM calls density.spEM here
Computation of spherical or elliptical depth.
depth(pts, x, Cx = var(x))
depth(pts, x, Cx = var(x))
pts |
A kxd matrix containing the k points that one wants to compute the depth. Each row is a point. |
x |
A nxd matrix containing the reference data. Each row is an observation. |
Cx |
A dxd scatter matrix for the data x where the default is var(x). When Cx = I(d), it returns the sphercial depth. |
depth
returns a k-vector where each entry is the elliptical depth of a point in pts
.
depth
is used in regcr
.
Elmore, R. T., Hettmansperger, T. P. and Xuan, F. (2000) Spherical Data Depth and a Multivariate Median, Proceedings of Data Depth: Robust Multivariate Statistical Analysis, Computational Geometry and Applications.
set.seed(100) x <- matrix(rnorm(200),nc = 2) depth(x[1:3, ], x)
set.seed(100) x <- matrix(rnorm(200),nc = 2) depth(x[1:3, ], x)
Density and log-density for the multivariate normal distribution
with mean equal to mu
and variance matrix equal to sigma
.
dmvnorm(y, mu=NULL, sigma=NULL) logdmvnorm(y, mu=NULL, sigma=NULL)
dmvnorm(y, mu=NULL, sigma=NULL) logdmvnorm(y, mu=NULL, sigma=NULL)
y |
Either a |
mu |
|
sigma |
This |
This code is written to be efficient, using the qr-decomposition of the
covariance matrix (and using it only once, rather than recalculating it
for both the determinant and the inverse of sigma
).
dmvnorm
gives the densities, while
logdmvnorm
gives the logarithm of the densities.
Draw a two-dimensional ellipse that traces a bivariate normal density contour for a given mean vector, covariance matrix, and probability content.
ellipse(mu, sigma, alpha = .05, npoints = 250, newplot = FALSE, draw = TRUE, ...)
ellipse(mu, sigma, alpha = .05, npoints = 250, newplot = FALSE, draw = TRUE, ...)
mu |
A 2-vector giving the mean. |
sigma |
A 2x2 matrix giving the covariance matrix. |
alpha |
Probability to be excluded from the ellipse. The default value is alpha = .05, which results in a 95% ellipse. |
npoints |
Number of points comprising the border of the ellipse. |
newplot |
If newplot = TRUE and draw = TRUE, plot the ellipse on a new plot. If newplot = FALSE and draw = TRUE, add the ellipse to an existing plot. |
draw |
If TRUE, draw the ellipse. |
... |
Graphical parameters passed to |
ellipse
returns an npoints
x2 matrix of the points forming the
border of the ellipse.
Johnson, R. A. and Wichern, D. W. (2002) Applied Multivariate Statistical Analysis, Fifth Edition, Prentice Hall.
## Produce a 95% ellipse with the specified mean and covariance structure. mu <- c(1, 3) sigma <- matrix(c(1, .3, .3, 1.5), 2, 2) ellipse(mu, sigma, npoints = 200, newplot = TRUE)
## Produce a 95% ellipse with the specified mean and covariance structure. mu <- c(1, 3) sigma <- matrix(c(1, .3, .3, 1.5), 2, 2) ellipse(mu, sigma, npoints = 200, newplot = TRUE)
Parametric EM algorithm for univariate finite mixture of exponentials distributions with randomly right censored data.
expRMM_EM(x, d=NULL, lambda = NULL, rate = NULL, k = 2, complete = "tdz", epsilon = 1e-08, maxit = 1000, verb = FALSE)
expRMM_EM(x, d=NULL, lambda = NULL, rate = NULL, k = 2, complete = "tdz", epsilon = 1e-08, maxit = 1000, verb = FALSE)
x |
A vector of |
d |
The vector of censoring indication, where 1 means observed lifetime data, and 0 means censored lifetime data. |
lambda |
Initial value of mixing proportions.
If |
rate |
Initial value of component exponential rates,
all set to 1 if |
k |
Number of components of the mixture. |
complete |
Nature of complete data involved within the EM machinery,
can be "tdz" for |
epsilon |
Tolerance limit for declaring algorithm convergence based on the change between two consecutive iterations. |
maxit |
The maximum number of iterations allowed, convergence
may be declared before |
verb |
If TRUE, print updates for every iteration of the algorithm as it runs |
expRMM_EM
returns a list of class "mixEM" with the following items:
x |
The input data. |
d |
The input censoring indicator. |
lambda |
The estimates for the mixing proportions. |
rate |
The estimates for the component rates. |
loglik |
The log-likelihood value at convergence of the algorithm. |
posterior |
An |
all.loglik |
The sequence of log-likelihoods over iterations. |
all.lambda |
The sequence of mixing proportions over iterations. |
all.rate |
The sequence of component rates over iterations. |
ft |
A character vector giving the name of the function. |
Didier Chauveau
Bordes, L., and Chauveau, D. (2016), Stochastic EM algorithms for parametric and semiparametric mixture models for right-censored lifetime data, Computational Statistics, Volume 31, Issue 4, pages 1513-1538. https://link.springer.com/article/10.1007/s00180-016-0661-7
Related functions:
plotexpRMM
,
summary.mixEM
.
Other models and algorithms for censored lifetime data:
weibullRMM_SEM
,
spRMM_SEM
.
n <- 300 # sample size m <- 2 # number of mixture components lambda <- c(1/3,1-1/3); rate <- c(1,1/10) # mixture parameters set.seed(1234) x <- rexpmix(n, lambda, rate) # iid ~ exponential mixture cs <- runif(n,0,max(x)) # Censoring (uniform) and incomplete data t <- apply(cbind(x,cs),1,min) # observed or censored data d <- 1*(x <= cs) # censoring indicator ###### EM for RMM, exponential lifetimes l0 <- rep(1/m,m); r0 <- c(1, 0.5) # "arbitrary" initial values a <- expRMM_EM(t, d, lambda = l0, rate = r0, k = m) summary(a) # EM estimates etc plotexpRMM(a, lwd=2) # default plot of EM sequences plot(a, which=2) # or equivalently, S3 method for "mixEM" object
n <- 300 # sample size m <- 2 # number of mixture components lambda <- c(1/3,1-1/3); rate <- c(1,1/10) # mixture parameters set.seed(1234) x <- rexpmix(n, lambda, rate) # iid ~ exponential mixture cs <- runif(n,0,max(x)) # Censoring (uniform) and incomplete data t <- apply(cbind(x,cs),1,min) # observed or censored data d <- 1*(x <= cs) # censoring indicator ###### EM for RMM, exponential lifetimes l0 <- rep(1/m,m); r0 <- c(1, 0.5) # "arbitrary" initial values a <- expRMM_EM(t, d, lambda = l0, rate = r0, k = m) summary(a) # EM estimates etc plotexpRMM(a, lwd=2) # default plot of EM sequences plot(a, which=2) # or equivalently, S3 method for "mixEM" object
Returns output for 2-component mixture of regressions with flaring using an EM algorithm with one step of Newton-Raphson requiring an adaptive barrier for maximization of the objective function. A mixture of regressions with flare occurs when there appears to be a common regression relationship for the data, but the error terms have a mixture structure of one normal component and one exponential component.
flaremixEM(y, x, lambda = NULL, beta = NULL, sigma = NULL, alpha = NULL, nu = NULL, epsilon = 1e-04, maxit = 10000, verb = FALSE, restart = 50)
flaremixEM(y, x, lambda = NULL, beta = NULL, sigma = NULL, alpha = NULL, nu = NULL, epsilon = 1e-04, maxit = 10000, verb = FALSE, restart = 50)
y |
An n-vector of response values. |
x |
An n-vector of predictor values. An intercept term will be added by default. |
lambda |
Initial value of mixing proportions. Entries should sum to 1. |
beta |
Initial value of |
sigma |
A vector of standard deviations. |
alpha |
A scalar for the exponential component's rate. |
nu |
A vector specifying the barrier constants to use. The first barrier constant where the algorithm converges is used. |
epsilon |
The convergence criterion. |
maxit |
The maximum number of iterations. |
verb |
If TRUE, then various updates are printed during each iteration of the algorithm. |
restart |
The number of times to restart the algorithm in case convergence is not attained. The default is 50. |
flaremixEM
returns a list of class mixEM
with items:
x |
The set of predictors (which includes a column of 1's). |
y |
The response values. |
posterior |
An nx2 matrix of posterior probabilities for observations. |
lambda |
The final mixing proportions. |
beta |
The final regression coefficients. |
sigma |
The final standard deviations. |
alpha |
The final exponential rate. |
loglik |
The final log-likelihood. |
all.loglik |
A vector of each iteration's log-likelihood. |
ft |
A character vector giving the name of the function. |
## Simulation output. set.seed(100) j=1 while(j == 1){ x1 <- runif(30, 0, 10) x2 <- runif(20, 10, 20) x3 <- runif(30, 20, 30) y1 <- 3+4*x1+rnorm(30, sd = 1) y2 <- 3+4*x2+rexp(20, rate = .05) y3 <- 3+4*x3+rnorm(30, sd = 1) x <- c(x1, x2, x3) y <- c(y1, y2, y3) nu <- (1:30)/2 out <- try(flaremixEM(y, x, beta = c(3, 4), nu = nu, lambda = c(.75, .25), sigma = 1), silent = TRUE) if(any(class(out) == "try-error")){ j <- 1 } else j <- 2 } out[4:7] plot(x, y, pch = 19) abline(out$beta)
## Simulation output. set.seed(100) j=1 while(j == 1){ x1 <- runif(30, 0, 10) x2 <- runif(20, 10, 20) x3 <- runif(30, 20, 30) y1 <- 3+4*x1+rnorm(30, sd = 1) y2 <- 3+4*x2+rexp(20, rate = .05) y3 <- 3+4*x3+rnorm(30, sd = 1) x <- c(x1, x2, x3) y <- c(y1, y2, y3) nu <- (1:30)/2 out <- try(flaremixEM(y, x, beta = c(3, 4), nu = nu, lambda = c(.75, .25), sigma = 1), silent = TRUE) if(any(class(out) == "try-error")){ j <- 1 } else j <- 2 } out[4:7] plot(x, y, pch = 19) abline(out$beta)
Return EM algorithm output for mixtures of gamma distributions.
gammamixEM(x, lambda = NULL, alpha = NULL, beta = NULL, k = 2, mom.start = TRUE, fix.alpha = FALSE, epsilon = 1e-08, maxit = 1000, maxrestarts = 20, verb = FALSE)
gammamixEM(x, lambda = NULL, alpha = NULL, beta = NULL, k = 2, mom.start = TRUE, fix.alpha = FALSE, epsilon = 1e-08, maxit = 1000, maxrestarts = 20, verb = FALSE)
x |
A vector of length n consisting of the data. |
lambda |
Initial value of mixing proportions. If |
alpha |
Starting value of vector of component shape parameters. If non-NULL, |
beta |
Starting value of vector of component scale parameters. If non-NULL and a vector,
|
k |
Number of components. Initial value ignored unless |
mom.start |
Logical to indicate if a method of moments starting value strategy should be implemented. If |
epsilon |
The convergence criterion. Convergence is declared when the change in the observed data log-likelihood increases by less than epsilon. |
fix.alpha |
Logical to indicate if the components should have a common shape parameter |
maxit |
The maximum number of iterations. |
maxrestarts |
The maximum number of restarts allowed in case of a problem with the particular starting values chosen (each restart uses randomly chosen starting values). |
verb |
If TRUE, then various updates are printed during each iteration of the algorithm. |
gammamixEM
returns a list of class mixEM
with items:
x |
The raw data. |
lambda |
The final mixing proportions. |
gamma.pars |
A 2xk matrix where each column provides the component estimates of |
loglik |
The final log-likelihood. |
posterior |
An nxk matrix of posterior probabilities for observations. |
all.loglik |
A vector of each iteration's log-likelihood. This vector includes both the initial and the final values; thus, the number of iterations is one less than its length. |
ft |
A character vector giving the name of the function. |
Dempster, A. P., Laird, N. M., and Rubin, D. B. (1977) Maximum Likelihood From Incomplete Data Via the EM Algorithm, Journal of the Royal Statistical Society, Series B, 39(1), 1–38.
Young, D. S., Chen, X., Hewage, D., and Nilo-Poyanco, R. (2019) Finite Mixture-of-Gamma Distributions: Estimation, Inference, and Model-Based Clustering, Advances in Data Analysis and Classification, 13(4), 1053–1082.
##Analyzing a 3-component mixture of gammas. set.seed(100) x <- c(rgamma(200, shape = 0.2, scale = 14), rgamma(200, shape = 32, scale = 10), rgamma(200, shape = 5, scale = 6)) out <- gammamixEM(x, lambda = c(1, 1, 1)/3, verb = TRUE) out[2:4]
##Analyzing a 3-component mixture of gammas. set.seed(100) x <- c(rgamma(200, shape = 0.2, scale = 14), rgamma(200, shape = 32, scale = 10), rgamma(200, shape = 5, scale = 6)) out <- gammamixEM(x, lambda = c(1, 1, 1)/3, verb = TRUE) out[2:4]
From Thomas et al (2011):
"Habituation is a standard method of studying infant behaviors. Indeed,
much of what is known about infant memory and perception rests on
habituation methods. Six-month infants (n = 51) were habituated to a
checker-board pattern on two occasions,
one week apart. On each occasion, the
infant was presented with the checkerboard pattern and the length of time
the infant viewed the pattern before disengaging was recorded; this denoted
the end of a trial. After disengagement, another trial was presented. The
procedure was implemented for eleven trials. The conventional index of
habituation performance is the summed observed fixation to the checkerboard
pattern over the eleven trials. Thus, an index of reliability focuses on how
these fixation times, in seconds, on the two assessment occasions correlate:
."
data(Habituationdata)
data(Habituationdata)
A data frame with two variables, m1
and m2
, and
51 cases. The two variables are the summed observations times
for the two occasions described above.
Hoben Thomas
Original source: Thomas et al. (2011). See references section.
Thomas, H., Lohaus, A., and Domsch, H. (2011), Extensions of Reliability Theory, in Nonparametric Statistics and Mixture Models: A Festschrift in Honor of Thomas Hettmansperger (Singapore: World Scientific), pp. 309-316.
Returns EM algorithm output for a mixture-of-experts model. Currently, this code only handles a 2-component mixture-of-experts, but will be extended to the general k-component hierarchical mixture-of-experts.
hmeEM(y, x, lambda = NULL, beta = NULL, sigma = NULL, w = NULL, k = 2, addintercept = TRUE, epsilon = 1e-08, maxit = 10000, verb = FALSE)
hmeEM(y, x, lambda = NULL, beta = NULL, sigma = NULL, w = NULL, k = 2, addintercept = TRUE, epsilon = 1e-08, maxit = 10000, verb = FALSE)
y |
An n-vector of response values. |
x |
An nxp matrix of predictors. See |
lambda |
Initial value of mixing proportions, which are modeled as an inverse
logit function of the predictors. Entries should sum to 1.
If NULL, then |
beta |
Initial value of |
sigma |
A vector of standard deviations. If NULL, then |
w |
A p-vector of coefficients for the way the mixing proportions are modeled. See |
k |
Number of components. Currently, only |
addintercept |
If TRUE, a column of ones is appended to the x matrix before the value of p is calculated. |
epsilon |
The convergence criterion. |
maxit |
The maximum number of iterations. |
verb |
If TRUE, then various updates are printed during each iteration of the algorithm. |
hmeEM
returns a list of class mixEM
with items:
x |
The set of predictors (which includes a column of 1's if |
y |
The response values. |
w |
The final coefficients for the functional form of the mixing proportions. |
lambda |
An nxk matrix of the final mixing proportions. |
beta |
The final regression coefficients. |
sigma |
The final standard deviations. If |
loglik |
The final log-likelihood. |
posterior |
An nxk matrix of posterior probabilities for observations. |
all.loglik |
A vector of each iteration's log-likelihood. |
restarts |
The number of times the algorithm restarted due to unacceptable choice of initial values. |
ft |
A character vector giving the name of the function. |
Jacobs, R. A., Jordan, M. I., Nowlan, S. J. and Hinton, G. E. (1991) Adaptive Mixtures of Local Experts, Neural Computation 3(1), 79–87.
McLachlan, G. J. and Peel, D. (2000) Finite Mixture Models, John Wiley and Sons, Inc.
## EM output for NOdata. data(NOdata) attach(NOdata) set.seed(100) em.out <- regmixEM(Equivalence, NO) hme.out <- hmeEM(Equivalence, NO, beta = em.out$beta) hme.out[3:7]
## EM output for NOdata. data(NOdata) attach(NOdata) set.seed(100) em.out <- regmixEM(Equivalence, NO) hme.out <- hmeEM(Equivalence, NO, beta = em.out$beta) hme.out[3:7]
Computes the integrated squared error for a selected estimated density
from npEM
output (selected by specifying the component
and block number),
relative to a true pdf that must be specified by the user.
The range for the numerical integration must be specified. This function
also returns (by default) a plot of the
true and estimated densities.
ise.npEM(npEMout, component=1, block=1, truepdf, lower=-Inf, upper=Inf, plots = TRUE, ...)
ise.npEM(npEMout, component=1, block=1, truepdf, lower=-Inf, upper=Inf, plots = TRUE, ...)
npEMout |
An object of class |
component , block
|
Component and block of particular density to analyze
from |
truepdf |
an R function taking a numeric first argument and returning a numeric vector of the same length. Returning a non-finite element will generate an error. |
lower , upper
|
the limits of integration. Can be infinite. |
plots |
logical: Should plots be produced? |
... |
additional arguments to be passed to |
This function calls the wkde
(weighted kernel
density estimate) function.
Just as for the integrate
function,
a list of class "integrate"
with components
value |
the final estimate of the integral. |
abs.error |
estimate of the modulus of the absolute error. |
subdivisions |
the number of subintervals produced in the subdivision process. |
message |
|
call |
the matched call. |
Benaglia, T., Chauveau, D., and Hunter, D. R. (2009), An EM-like algorithm for semi- and non-parametric estimation in multivariate mixtures, Journal of Computational and Graphical Statistics, 18, 505-526.
Benaglia, T., Chauveau, D., Hunter, D. R., and Young, D. (2009), mixtools: An R package for analyzing finite mixture models. Journal of Statistical Software, 32(6):1-29.
# Mixture with mv gaussian model set.seed(100) m <- 2 # no. of components r <- 3 # no. of repeated measures (coordinates) lambda <- c(0.4, 0.6) # Note: Need first 2 coordinates conditionally iid due to block structure mu <- matrix(c(0, 0, 0, 3, 3, 5), m, r, byrow=TRUE) # means sigma <- matrix(rep(1, 6), m, r, byrow=TRUE) # stdevs blockid = c(1,1,2) # block structure of coordinates n <- 200 x <- rmvnormmix(n, lambda, mu, sigma) # simulated data # fit the model with "arbitrary" initial centers centers <- matrix(c(0, 0, 0, 4, 4, 4), 2, 3, byrow=TRUE) a <- npEM(x, centers, blockid, eps=1e-8, verb=FALSE) # Calculate integrated squared error for j=2, b=1: j <- 2 # component b <- 1 # block coords <- a$blockid == b ise.npEM(a, j, b, dnorm, lower=0, upper=10, plots=TRUE, mean=mu[j,coords][1], sd=sigma[j, coords][1]) # The following (lengthy) example recreates the normal multivariate # mixture model simulation from Benaglia et al (2009). mu <- matrix(c(0, 0, 0, 3, 4, 5), m, r, byrow=TRUE) nbrep <- 5 # Benaglia et al use 300 replications # matrix for storing sums of Integrated Squared Errors ISE <- matrix(0,m,r,dimnames=list(Components=1:m, Blocks=1:r)) nblabsw <- 0 # no. of label switches for (mc in 1:nbrep) { print(paste("REPETITION", mc)) x <- rmvnormmix(n,lambda,mu,sigma) # simulated data a <- npEM(x, centers, verb=FALSE) #default: if (a$lambda[1] > a$lambda[2]) nblabsw <- nblabsw + 1 for (j in 1:m) { # for each component for (k in 1:r) { # for each coordinate; not assuming iid! # dnorm with correct mean, sd is the true density: ISE[j,k] <- ISE[j,k] + ise.npEM(a, j, k, dnorm, lower=mu[j,k]-5, upper=mu[j,k]+5, plots=FALSE, mean=mu[j,k], sd=sigma[j,k])$value } } MISE <- ISE/nbrep # Mean ISE sqMISE <- sqrt(MISE) # root-mean-integrated-squared error } sqMISE
# Mixture with mv gaussian model set.seed(100) m <- 2 # no. of components r <- 3 # no. of repeated measures (coordinates) lambda <- c(0.4, 0.6) # Note: Need first 2 coordinates conditionally iid due to block structure mu <- matrix(c(0, 0, 0, 3, 3, 5), m, r, byrow=TRUE) # means sigma <- matrix(rep(1, 6), m, r, byrow=TRUE) # stdevs blockid = c(1,1,2) # block structure of coordinates n <- 200 x <- rmvnormmix(n, lambda, mu, sigma) # simulated data # fit the model with "arbitrary" initial centers centers <- matrix(c(0, 0, 0, 4, 4, 4), 2, 3, byrow=TRUE) a <- npEM(x, centers, blockid, eps=1e-8, verb=FALSE) # Calculate integrated squared error for j=2, b=1: j <- 2 # component b <- 1 # block coords <- a$blockid == b ise.npEM(a, j, b, dnorm, lower=0, upper=10, plots=TRUE, mean=mu[j,coords][1], sd=sigma[j, coords][1]) # The following (lengthy) example recreates the normal multivariate # mixture model simulation from Benaglia et al (2009). mu <- matrix(c(0, 0, 0, 3, 4, 5), m, r, byrow=TRUE) nbrep <- 5 # Benaglia et al use 300 replications # matrix for storing sums of Integrated Squared Errors ISE <- matrix(0,m,r,dimnames=list(Components=1:m, Blocks=1:r)) nblabsw <- 0 # no. of label switches for (mc in 1:nbrep) { print(paste("REPETITION", mc)) x <- rmvnormmix(n,lambda,mu,sigma) # simulated data a <- npEM(x, centers, verb=FALSE) #default: if (a$lambda[1] > a$lambda[2]) nblabsw <- nblabsw + 1 for (j in 1:m) { # for each component for (k in 1:r) { # for each coordinate; not assuming iid! # dnorm with correct mean, sd is the true density: ISE[j,k] <- ISE[j,k] + ise.npEM(a, j, k, dnorm, lower=mu[j,k]-5, upper=mu[j,k]+5, plots=FALSE, mean=mu[j,k], sd=sigma[j,k])$value } } MISE <- ISE/nbrep # Mean ISE sqMISE <- sqrt(MISE) # root-mean-integrated-squared error } sqMISE
Returns EM algorithm output for mixtures of logistic regressions with arbitrarily many components.
logisregmixEM(y, x, N = NULL, lambda = NULL, beta = NULL, k = 2, addintercept = TRUE, epsilon = 1e-08, maxit = 10000, verb = FALSE)
logisregmixEM(y, x, N = NULL, lambda = NULL, beta = NULL, k = 2, addintercept = TRUE, epsilon = 1e-08, maxit = 10000, verb = FALSE)
y |
An n-vector of successes out of N trials. |
x |
An nxp matrix of predictors. See |
N |
An n-vector of number of trials for the logistic regression.
If NULL, then |
lambda |
Initial value of mixing proportions. Entries should sum to
1. This determines number of components. If NULL, then |
beta |
Initial value of |
k |
Number of components. Ignored unless |
addintercept |
If TRUE, a column of ones is appended to the x matrix before the value of p is calculated. |
epsilon |
The convergence criterion. |
maxit |
The maximum number of iterations. |
verb |
If TRUE, then various updates are printed during each iteration of the algorithm. |
logisregmixEM
returns a list of class mixEM
with items:
x |
The predictor values. |
y |
The response values. |
lambda |
The final mixing proportions. |
beta |
The final logistic regression coefficients. |
loglik |
The final log-likelihood. |
posterior |
An nxk matrix of posterior probabilities for observations. |
all.loglik |
A vector of each iteration's log-likelihood. |
restarts |
The number of times the algorithm restarted due to unacceptable choice of initial values. |
ft |
A character vector giving the name of the function. |
McLachlan, G. J. and Peel, D. (2000) Finite Mixture Models, John Wiley and Sons, Inc.
## EM output for data generated from a 2-component logistic regression model. set.seed(100) beta <- matrix(c(1, .5, 2, -.8), 2, 2) x <- runif(50, 0, 10) x1 <- cbind(1, x) xbeta <- x1%*%beta N <- ceiling(runif(50, 50, 75)) w <- rbinom(50, 1, .3) y <- w*rbinom(50, size = N, prob = (1/(1+exp(-xbeta[, 1]))))+ (1-w)*rbinom(50, size = N, prob = (1/(1+exp(-xbeta[, 2])))) out.1 <- logisregmixEM(y, x, N, verb = TRUE, epsilon = 1e-01) out.1 ## EM output for data generated from a 2-component binary logistic regression model. beta <- matrix(c(-10, .1, 20, -.1), 2, 2) x <- runif(500, 50, 250) x1 <- cbind(1, x) xbeta <- x1%*%beta w <- rbinom(500, 1, .3) y <- w*rbinom(500, size = 1, prob = (1/(1+exp(-xbeta[, 1]))))+ (1-w)*rbinom(500, size = 1, prob = (1/(1+exp(-xbeta[, 2])))) out.2 <- logisregmixEM(y, x, beta = beta, lambda = c(.3, .7), verb = TRUE, epsilon = 1e-01) out.2
## EM output for data generated from a 2-component logistic regression model. set.seed(100) beta <- matrix(c(1, .5, 2, -.8), 2, 2) x <- runif(50, 0, 10) x1 <- cbind(1, x) xbeta <- x1%*%beta N <- ceiling(runif(50, 50, 75)) w <- rbinom(50, 1, .3) y <- w*rbinom(50, size = N, prob = (1/(1+exp(-xbeta[, 1]))))+ (1-w)*rbinom(50, size = N, prob = (1/(1+exp(-xbeta[, 2])))) out.1 <- logisregmixEM(y, x, N, verb = TRUE, epsilon = 1e-01) out.1 ## EM output for data generated from a 2-component binary logistic regression model. beta <- matrix(c(-10, .1, 20, -.1), 2, 2) x <- runif(500, 50, 250) x1 <- cbind(1, x) xbeta <- x1%*%beta w <- rbinom(500, 1, .3) y <- w*rbinom(500, size = 1, prob = (1/(1+exp(-xbeta[, 1]))))+ (1-w)*rbinom(500, size = 1, prob = (1/(1+exp(-xbeta[, 2])))) out.2 <- logisregmixEM(y, x, beta = beta, lambda = c(.3, .7), verb = TRUE, epsilon = 1e-01) out.2
Change data into a matrix of multinomial counts using the cutpoint method and generate EM algorithm starting values for a k-component mixture of multinomials.
makemultdata(..., cuts)
makemultdata(..., cuts)
... |
Either vectors (possibly of different lengths) of raw data
or an nxm matrix (or data frame) of data. If |
cuts |
A vector of cutpoints. This vector is sorted by the algorithm. |
The (i, j)th entry of the matrix y
(for j < p)
is equal to the number of entries
in the ith column of x
that are less than or equal to cuts
[j].
The (i, p)th entry is equal to the number of entries greater than
cuts
[j].
makemultdata
returns an object which is a list with components:
x |
An nxm matrix of the raw data. |
y |
An nxp matrix of the discretized data where p is one more than the number of cutpoints. Each row is a multinomial vector of counts. In particular, each row should sum to the number of repeated measures for that sample. |
Elmore, R. T., Hettmansperger, T. P. and Xuan, F. (2004) The Sign Statistic, One-Way Layouts and Mixture Models, Statistical Science 19(4), 579–587.
compCDF
, multmixmodel.sel
, multmixEM
## Randomly generated data. set.seed(100) y <- matrix(rpois(70, 6), 10, 7) cuts <- c(2, 5, 7) out1 <- makemultdata(y, cuts = cuts) out1 ## The sulfur content of the coal seams in Texas. A <- c(1.51, 1.92, 1.08, 2.04, 2.14, 1.76, 1.17) B <- c(1.69, 0.64, .9, 1.41, 1.01, .84, 1.28, 1.59) C <- c(1.56, 1.22, 1.32, 1.39, 1.33, 1.54, 1.04, 2.25, 1.49) D <- c(1.3, .75, 1.26, .69, .62, .9, 1.2, .32) E <- c(.73, .8, .9, 1.24, .82, .72, .57, 1.18, .54, 1.3) out2 <- makemultdata(A, B, C, D, E, cuts = median(c(A, B, C, D, E))) out2 ## The reaction time data. data(RTdata) out3 <- makemultdata(RTdata, cuts = 100*c(5, 10, 12, 14, 16, 20, 25, 30, 40, 50)) dim(out3$y) out3$y[1:10,]
## Randomly generated data. set.seed(100) y <- matrix(rpois(70, 6), 10, 7) cuts <- c(2, 5, 7) out1 <- makemultdata(y, cuts = cuts) out1 ## The sulfur content of the coal seams in Texas. A <- c(1.51, 1.92, 1.08, 2.04, 2.14, 1.76, 1.17) B <- c(1.69, 0.64, .9, 1.41, 1.01, .84, 1.28, 1.59) C <- c(1.56, 1.22, 1.32, 1.39, 1.33, 1.54, 1.04, 2.25, 1.49) D <- c(1.3, .75, 1.26, .69, .62, .9, 1.2, .32) E <- c(.73, .8, .9, 1.24, .82, .72, .57, 1.18, .54, 1.3) out2 <- makemultdata(A, B, C, D, E, cuts = median(c(A, B, C, D, E))) out2 ## The reaction time data. data(RTdata) out3 <- makemultdata(RTdata, cuts = 100*c(5, 10, 12, 14, 16, 20, 25, 30, 40, 50)) dim(out3$y) out3$y[1:10,]
Construct a mixturegram for determining an apporpriate number of components.
mixturegram(data, pmbs, method = c("pca", "kpca", "lda"), all.n = FALSE, id.con = NULL, score = 1, iter.max = 50, nstart = 25, ...)
mixturegram(data, pmbs, method = c("pca", "kpca", "lda"), all.n = FALSE, id.con = NULL, score = 1, iter.max = 50, nstart = 25, ...)
data |
The data, which must either be a vector or a matrix. If a matrix, then the rows correspond to the observations. |
pmbs |
A list of length (K-1) such that each element is an nxk matrix of the posterior membership probabilities. These are obtained from each of the "best" estimated k-component mixture models, k = 2,...,K. |
method |
The dimension reduction method used. |
all.n |
A logical specifying whether the mixturegram should plot the profiles of all observations ( |
id.con |
An argument that allows one to impose some sort of (meaningful) identifiability constraint so that the mixture components are in some sort of comparable order between mixture models with different numbers of components. If |
score |
The value for the specified dimension reduction technique's score, which is used for constructing the mixturegram. By default, this value is |
iter.max |
The maximum number of iterations allowed for the k-means clustering algorithm, which is passed to the |
nstart |
The number of random sets chosen based on k centers, which is passed to the |
... |
Additional arguments that can be passed to the underlying |
mixturegram
returns a mixturegram where the profiles are plotted over component values of k = 1,...,K.
Young, D. S., Ke, C., and Zeng, X. (2018) The Mixturegram: A Visualization Tool for Assessing the Number of Components in Finite Mixture Models, Journal of Computational and Graphical Statistics, 27(3), 564–575.
##Data generated from a 2-component mixture of normals. set.seed(100) n <- 100 w <- rmultinom(n,1,c(.3,.7)) y <- sapply(1:n,function(i) w[1,i]*rnorm(1,-6,1) + w[2,i]*rnorm(1,0,1)) selection <- function(i,data,rep=30){ out <- replicate(rep,normalmixEM(data,epsilon=1e-06, k=i,maxit=5000),simplify=FALSE) counts <- lapply(1:rep,function(j) table(apply(out[[j]]$posterior,1, which.max))) counts.length <- sapply(counts, length) counts.min <- sapply(counts, min) counts.test <- (counts.length != i)|(counts.min < 5) if(sum(counts.test) > 0 & sum(counts.test) < rep) out <- out[!counts.test] l <- unlist(lapply(out, function(x) x$loglik)) tmp <- out[[which.max(l)]] } all.out <- lapply(2:5, selection, data = y, rep = 2) pmbs <- lapply(1:length(all.out), function(i) all.out[[i]]$post) mixturegram(y, pmbs, method = "pca", all.n = FALSE, id.con = NULL, score = 1, main = "Mixturegram (Well-Separated Data)")
##Data generated from a 2-component mixture of normals. set.seed(100) n <- 100 w <- rmultinom(n,1,c(.3,.7)) y <- sapply(1:n,function(i) w[1,i]*rnorm(1,-6,1) + w[2,i]*rnorm(1,0,1)) selection <- function(i,data,rep=30){ out <- replicate(rep,normalmixEM(data,epsilon=1e-06, k=i,maxit=5000),simplify=FALSE) counts <- lapply(1:rep,function(j) table(apply(out[[j]]$posterior,1, which.max))) counts.length <- sapply(counts, length) counts.min <- sapply(counts, min) counts.test <- (counts.length != i)|(counts.min < 5) if(sum(counts.test) > 0 & sum(counts.test) < rep) out <- out[!counts.test] l <- unlist(lapply(out, function(x) x$loglik)) tmp <- out[[which.max(l)]] } all.out <- lapply(2:5, selection, data = y, rep = 2) pmbs <- lapply(1:length(all.out), function(i) all.out[[i]]$post) mixturegram(y, pmbs, method = "pca", all.n = FALSE, id.con = NULL, score = 1, main = "Mixturegram (Well-Separated Data)")
Return EM algorithm output for mixtures of multinomial distributions.
multmixEM(y, lambda = NULL, theta = NULL, k = 2, maxit = 10000, epsilon = 1e-08, verb = FALSE)
multmixEM(y, lambda = NULL, theta = NULL, k = 2, maxit = 10000, epsilon = 1e-08, verb = FALSE)
y |
Either An nxp matrix of data (multinomial counts), where n is the
sample size and p is the number of multinomial bins, or the
output of the |
lambda |
Initial value of mixing proportions. Entries should sum to
1. This determines number of components. If NULL, then |
theta |
Initial value of |
k |
Number of components. Ignored unless |
epsilon |
The convergence criterion. |
maxit |
The maximum number of iterations. |
verb |
If TRUE, then various updates are printed during each iteration of the algorithm. |
multmixEM
returns a list of class mixEM
with items:
y |
The raw data. |
lambda |
The final mixing proportions. |
theta |
The final multinomial parameters. |
loglik |
The final log-likelihood. |
posterior |
An nxk matrix of posterior probabilities for observations. |
all.loglik |
A vector of each iteration's log-likelihood. |
restarts |
The number of times the algorithm restarted due to unacceptable choice of initial values. |
ft |
A character vector giving the name of the function. |
McLachlan, G. J. and Peel, D. (2000) Finite Mixture Models, John Wiley and Sons, Inc.
Elmore, R. T., Hettmansperger, T. P. and Xuan, F. (2004) The Sign Statistic, One-Way Layouts and Mixture Models, Statistical Science 19(4), 579–587.
compCDF
, makemultdata
, multmixmodel.sel
## The sulfur content of the coal seams in Texas set.seed(100) A <- c(1.51, 1.92, 1.08, 2.04, 2.14, 1.76, 1.17) B <- c(1.69, 0.64, .9, 1.41, 1.01, .84, 1.28, 1.59) C <- c(1.56, 1.22, 1.32, 1.39, 1.33, 1.54, 1.04, 2.25, 1.49) D <- c(1.3, .75, 1.26, .69, .62, .9, 1.2, .32) E <- c(.73, .8, .9, 1.24, .82, .72, .57, 1.18, .54, 1.3) dis.coal <- makemultdata(A, B, C, D, E, cuts = median(c(A, B, C, D, E))) em.out <- multmixEM(dis.coal) em.out[1:4]
## The sulfur content of the coal seams in Texas set.seed(100) A <- c(1.51, 1.92, 1.08, 2.04, 2.14, 1.76, 1.17) B <- c(1.69, 0.64, .9, 1.41, 1.01, .84, 1.28, 1.59) C <- c(1.56, 1.22, 1.32, 1.39, 1.33, 1.54, 1.04, 2.25, 1.49) D <- c(1.3, .75, 1.26, .69, .62, .9, 1.2, .32) E <- c(.73, .8, .9, 1.24, .82, .72, .57, 1.18, .54, 1.3) dis.coal <- makemultdata(A, B, C, D, E, cuts = median(c(A, B, C, D, E))) em.out <- multmixEM(dis.coal) em.out[1:4]
Assess the number of components in a mixture of multinomials model using the Akaike's information criterion (AIC), Schwartz's Bayesian information criterion (BIC), Bozdogan's consistent AIC (CAIC), and Integrated Completed Likelihood (ICL).
multmixmodel.sel(y, comps = NULL, ...)
multmixmodel.sel(y, comps = NULL, ...)
y |
Either An nxp matrix of data (multinomial counts), where n is the
sample size and p is the number of multinomial bins, or the
output of the |
comps |
Vector containing the numbers of components to consider. If NULL, this is set to be 1:(max possible), where (max possible) is floor((m+1)/2) and m is the minimum row sum of y. |
... |
Arguments passed to |
multmixmodel.sel
returns a table summarizing the AIC, BIC, CAIC, ICL, and log-likelihood
values along with the winner (the number with the lowest aforementioned values).
compCDF
, makemultdata
, multmixEM
##Data generated using the multinomial cutpoint method. set.seed(100) x <- matrix(rpois(70, 6), 10, 7) x.new <- makemultdata(x, cuts = 5) multmixmodel.sel(x.new$y, comps = c(1,2), epsilon = 1e-03)
##Data generated using the multinomial cutpoint method. set.seed(100) x <- matrix(rpois(70, 6), 10, 7) x.new <- makemultdata(x, cuts = 5) multmixmodel.sel(x.new$y, comps = c(1,2), epsilon = 1e-03)
Return EM algorithm output for mixtures of multivariate normal distributions.
mvnormalmixEM(x, lambda = NULL, mu = NULL, sigma = NULL, k = 2, arbmean = TRUE, arbvar = TRUE, epsilon = 1e-08, maxit = 10000, verb = FALSE)
mvnormalmixEM(x, lambda = NULL, mu = NULL, sigma = NULL, k = 2, arbmean = TRUE, arbvar = TRUE, epsilon = 1e-08, maxit = 10000, verb = FALSE)
x |
A matrix of size nxp consisting of the data. |
lambda |
Initial value of mixing proportions. Entries should sum to
1. This determines number of components. If NULL, then |
mu |
A list of size k consisting of initial values for the p-vector mean parameters.
If NULL, then the vectors are generated from a normal distribution with
mean and standard deviation according to a binning method done on the data.
If both |
sigma |
A list of size k consisting of initial values for the pxp variance-covariance matrices.
If NULL, then |
k |
Number of components. Ignored unless |
arbmean |
If TRUE, then the component densities are allowed to have different |
arbvar |
If TRUE, then the component densities are allowed to have different |
epsilon |
The convergence criterion. |
maxit |
The maximum number of iterations. |
verb |
If TRUE, then various updates are printed during each iteration of the algorithm. |
normalmixEM
returns a list of class mixEM
with items:
x |
The raw data. |
lambda |
The final mixing proportions. |
mu |
A list of with the final mean vectors. |
sigma |
A list with the final variance-covariance matrices. |
loglik |
The final log-likelihood. |
posterior |
An nxk matrix of posterior probabilities for observations. |
all.loglik |
A vector of each iteration's log-likelihood. |
restarts |
The number of times the algorithm restarted due to unacceptable choice of initial values. |
ft |
A character vector giving the name of the function. |
McLachlan, G. J. and Peel, D. (2000) Finite Mixture Models, John Wiley and Sons, Inc.
##Fitting randomly generated data with a 2-component location mixture of bivariate normals. set.seed(100) x.1 <- rmvnorm(40, c(0, 0)) x.2 <- rmvnorm(60, c(3, 4)) X.1 <- rbind(x.1, x.2) mu <- list(c(0, 0), c(3, 4)) out.1 <- mvnormalmixEM(X.1, arbvar = FALSE, mu = mu, epsilon = 1e-02) out.1[2:5] ##Fitting randomly generated data with a 2-component scale mixture of bivariate normals. x.3 <- rmvnorm(40, c(0, 0), sigma = matrix(c(200, 1, 1, 150), 2, 2)) x.4 <- rmvnorm(60, c(0, 0)) X.2 <- rbind(x.3, x.4) lambda <- c(0.40, 0.60) sigma <- list(diag(1, 2), matrix(c(200, 1, 1, 150), 2, 2)) out.2 <- mvnormalmixEM(X.2, arbmean = FALSE, sigma = sigma, lambda = lambda, epsilon = 1e-02) out.2[2:5]
##Fitting randomly generated data with a 2-component location mixture of bivariate normals. set.seed(100) x.1 <- rmvnorm(40, c(0, 0)) x.2 <- rmvnorm(60, c(3, 4)) X.1 <- rbind(x.1, x.2) mu <- list(c(0, 0), c(3, 4)) out.1 <- mvnormalmixEM(X.1, arbvar = FALSE, mu = mu, epsilon = 1e-02) out.1[2:5] ##Fitting randomly generated data with a 2-component scale mixture of bivariate normals. x.3 <- rmvnorm(40, c(0, 0), sigma = matrix(c(200, 1, 1, 150), 2, 2)) x.4 <- rmvnorm(60, c(0, 0)) X.2 <- rbind(x.3, x.4) lambda <- c(0.40, 0.60) sigma <- list(diag(1, 2), matrix(c(200, 1, 1, 150), 2, 2)) out.2 <- mvnormalmixEM(X.2, arbmean = FALSE, sigma = sigma, lambda = lambda, epsilon = 1e-02) out.2[2:5]
An extension of the original npEM
algorithm, for mixtures
of multivariate data where the coordinates of a row (case)
in the data matrix are assumed to be made of independent but multivariate blocks (instead of just coordinates),
conditional on the mixture
component (subpopulation) from which they are drawn (Chauveau and Hoang 2015).
mvnpEM(x, mu0, blockid = 1:ncol(x), samebw = TRUE, bwdefault = apply(x,2,bw.nrd0), init = NULL, eps = 1e-8, maxiter = 500, verb = TRUE)
mvnpEM(x, mu0, blockid = 1:ncol(x), samebw = TRUE, bwdefault = apply(x,2,bw.nrd0), init = NULL, eps = 1e-8, maxiter = 500, verb = TRUE)
x |
An |
mu0 |
Either an |
blockid |
A vector of length |
samebw |
Logical: If |
bwdefault |
Bandwidth default for density estimation,a simplistic application of the
default |
init |
Initialization method, based on an initial |
eps |
Tolerance limit for declaring algorithm convergence. Convergence
is declared whenever the maximum change in any coordinate of the
|
maxiter |
The maximum number of iterations allowed; convergence
may be declared before |
verb |
Verbose mode; if TRUE, print updates for every iteration of the algorithm as it runs |
mvnpEM
returns a list of class mvnpEM
with the following items:
data |
The raw data (an |
posteriors |
An |
lambda |
The sequence of mixing proportions over iterations. |
blockid |
The |
samebw |
The |
bandwidth |
The final bandwidth matrix
after convergence of the algorithm.
Its shape depends on the |
lambdahat |
The final mixing proportions. |
loglik |
The sequence of pseudo log-likelihood values over iterations. |
Benaglia, T., Chauveau, D., and Hunter, D. R. (2009), An EM-like algorithm for semi- and non-parametric estimation in multivariate mixtures, Journal of Computational and Graphical Statistics, 18, 505-526.
Benaglia, T., Chauveau, D. and Hunter, D.R. (2011), Bandwidth Selection in an EM-like algorithm for nonparametric multivariate mixtures. Nonparametric Statistics and Mixture Models: A Festschrift in Honor of Thomas P. Hettmansperger. World Scientific Publishing Co., pages 15-27.
Chauveau, D., and Hoang, V. T. L. (2015), Nonparametric mixture models with conditionally independent multivariate component densities, Preprint under revision. https://hal.archives-ouvertes.fr/hal-01094837
# Example as in Chauveau and Hoang (2015) with 6 coordinates ## Not run: m=2; r=6; blockid <-c(1,1,2,2,3,3) # 3 bivariate blocks # generate some data x ... a <- mvnpEM(x, mu0=2, blockid, samebw=F) # adaptive bandwidth plot(a) # this S3 method produces 6 plots of univariate marginals summary(a) ## End(Not run)
# Example as in Chauveau and Hoang (2015) with 6 coordinates ## Not run: m=2; r=6; blockid <-c(1,1,2,2,3,3) # 3 bivariate blocks # generate some data x ... a <- mvnpEM(x, mu0=2, blockid, samebw=F) # adaptive bandwidth plot(a) # this S3 method produces 6 plots of univariate marginals summary(a) ## End(Not run)
This data set gives the equivalence ratios and peak nitrogen oxide emissions in a study using pure ethanol as a spark-ignition engine fuel.
data(NOdata)
data(NOdata)
This data frame consists of:
NO
The peak nitrogen oxide emission levels.
Equivalence
The equivalence ratios for the engine at compression ratios from 7.5 to 18.
Brinkman, N. D. (1981) Ethanol Fuel – A Single-Cylinder Engine Study of Efficiency and Exhaust Emissions, S.A.E. Transactions, 68.
Hurn, M., Justel, A. and Robert, C. P. (2003) Estimating Mixtures of Regressions, Journal of Computational and Graphical Statistics 12(1), 55–79.
Return EM algorithm output for mixtures of normal distributions.
normalmixEM(x, lambda = NULL, mu = NULL, sigma = NULL, k = 2, mean.constr = NULL, sd.constr = NULL, epsilon = 1e-08, maxit = 1000, maxrestarts = 20, verb = FALSE, fast = FALSE, ECM = FALSE, arbmean = TRUE, arbvar = TRUE)
normalmixEM(x, lambda = NULL, mu = NULL, sigma = NULL, k = 2, mean.constr = NULL, sd.constr = NULL, epsilon = 1e-08, maxit = 1000, maxrestarts = 20, verb = FALSE, fast = FALSE, ECM = FALSE, arbmean = TRUE, arbvar = TRUE)
x |
A vector of length n consisting of the data. |
lambda |
Initial value of mixing proportions. Automatically
repeated as necessary
to produce a vector of length |
mu |
Starting value of vector of component means. If non-NULL and a
scalar, |
sigma |
Starting value of vector of component standard deviations
for algorithm. If non-NULL
and a scalar, |
k |
Number of components. Initial value ignored unless |
mean.constr |
Equality constraints on the mean parameters, given as
a vector of length |
sd.constr |
Equality constraints on the standard deviation parameters.
See |
epsilon |
The convergence criterion. Convergence is declared when the change in the observed data log-likelihood increases by less than epsilon. |
maxit |
The maximum number of iterations. |
maxrestarts |
The maximum number of restarts allowed in case of a problem with the particular starting values chosen due to one of the variance estimates getting too small (each restart uses randomly chosen starting values). It is well-known that when each component of a normal mixture may have its own mean and variance, the likelihood has no maximizer; in such cases, we hope to find a "nice" local maximum with this algorithm instead, but occasionally the algorithm finds a "not nice" solution and one of the variances goes to zero, driving the likelihood to infinity. |
verb |
If TRUE, then various updates are printed during each iteration of the algorithm. |
fast |
If TRUE and k==2 and arbmean==TRUE, then use
|
ECM |
logical: Should this algorithm be an ECM algorithm in the sense of Meng and Rubin (1993)? If FALSE, the algorithm is a true EM algorithm; if TRUE, then every half-iteration alternately updates the means conditional on the variances or the variances conditional on the means, with an extra E-step in between these updates. |
arbmean |
If TRUE, then the component densities are allowed to have different |
arbvar |
If TRUE, then the component densities are allowed to have different |
This is the standard EM algorithm for normal mixtures that maximizes
the conditional expected complete-data
log-likelihood at each M-step of the algorithm.
If desired, the
EM algorithm may be replaced by an ECM algorithm (see ECM
argument)
that alternates between maximizing with respect to the mu
and lambda
while holding sigma
fixed, and maximizing with
respect to sigma
and lambda
while holding mu
fixed. In the case where arbmean
is FALSE
and arbvar
is TRUE
, there is no closed-form EM algorithm,
so the ECM option is forced in this case.
normalmixEM
returns a list of class mixEM
with items:
x |
The raw data. |
lambda |
The final mixing proportions. |
mu |
The final mean parameters. |
sigma |
The final standard deviations. If |
scale |
If |
loglik |
The final log-likelihood. |
posterior |
An nxk matrix of posterior probabilities for observations. |
all.loglik |
A vector of each iteration's log-likelihood. This vector includes both the initial and the final values; thus, the number of iterations is one less than its length. |
restarts |
The number of times the algorithm restarted due to unacceptable choice of initial values. |
ft |
A character vector giving the name of the function. |
McLachlan, G. J. and Peel, D. (2000) Finite Mixture Models, John Wiley and Sons, Inc.
Meng, X.-L. and Rubin, D. B. (1993) Maximum Likelihood Estimation Via the ECM Algorithm: A General Framework, Biometrika 80(2): 267-278.
Benaglia, T., Chauveau, D., Hunter, D. R., and Young, D. mixtools: An R package for analyzing finite mixture models. Journal of Statistical Software, 32(6):1-29, 2009.
mvnormalmixEM
, normalmixEM2comp
,
normalmixMMlc
, spEMsymloc
##Analyzing the Old Faithful geyser data with a 2-component mixture of normals. data(faithful) attach(faithful) set.seed(100) system.time(out<-normalmixEM(waiting, arbvar = FALSE, epsilon = 1e-03)) out system.time(out2<-normalmixEM(waiting, arbvar = FALSE, epsilon = 1e-03, fast=TRUE)) out2 # same thing but much faster
##Analyzing the Old Faithful geyser data with a 2-component mixture of normals. data(faithful) attach(faithful) set.seed(100) system.time(out<-normalmixEM(waiting, arbvar = FALSE, epsilon = 1e-03)) out system.time(out2<-normalmixEM(waiting, arbvar = FALSE, epsilon = 1e-03, fast=TRUE)) out2 # same thing but much faster
Return EM algorithm output for mixtures of univariate normal distributions for the special case of 2 components, exploiting the simple structure of the problem to speed up the code.
normalmixEM2comp(x, lambda, mu, sigsqrd, eps= 1e-8, maxit = 1000, verb=FALSE)
normalmixEM2comp(x, lambda, mu, sigsqrd, eps= 1e-8, maxit = 1000, verb=FALSE)
x |
A vector of length |
lambda |
Initial value of first-component mixing proportion. |
mu |
A 2-vector of initial values for the mean parameters. |
sigsqrd |
Either a scalar or a 2-vector with initial value(s) for the variance parameters. If a scalar, the algorithm assumes that the two components have equal variances; if a 2-vector, it assumes that the two components do not have equal variances. |
eps |
The convergence criterion. Convergence is declared when the change in the observed data log-likelihood increases by less than epsilon. |
maxit |
The maximum possible number of iterations. |
verb |
If TRUE, then various updates are printed during each iteration of the algorithm. |
This code is written to be very fast, sometimes more than an order of magnitude
faster than normalmixEM
for the same problem. It is less numerically
stable that normalmixEM
in the sense that it does not safeguard
against underflow as carefully.
Note that when the two components are assumed to have unequal variances, the loglikelihood is unbounded. However, in practice this is rarely a problem and quite often the algorithm converges to a "nice" local maximum. Since this algorithm requires starting values for all parameters, it is recommended that the user initialize this routine from different starting values to gauge the reasonableness of the outputted solutions. Cases where, say, the components are not well-separated and/or the component variances are somewhat similar, may have quite different estimates depending on the starting values used. Such situations could also be accompanied by slow convergence of the algorithm relative to the convergence criteria used.
normalmixEM2comp
returns a list of class mixEM
with items:
x |
The raw data. |
lambda |
The final mixing proportions (lambda and 1-lambda). |
mu |
The final two mean parameters. |
sigma |
The final one or two standard deviations. |
loglik |
The final log-likelihood. |
posterior |
An nx2 matrix of posterior probabilities for observations. |
all.loglik |
A vector of each iteration's log-likelihood. This vector includes both the initial and the final values; thus, the number of iterations is one less than its length. |
restarts |
The number of times the algorithm restarted due to unacceptable choice of initial values (always zero). |
ft |
A character vector giving the name of the function. |
McLachlan, G. J. and Peel, D. (2000) Finite Mixture Models, John Wiley and Sons, Inc.
##Analyzing the Old Faithful geyser data with a 2-component mixture of normals. data(faithful) attach(faithful) set.seed(100) system.time(out <- normalmixEM2comp(waiting, lambda=.5, mu=c(50,80), sigsqrd=100)) out$all.loglik # Note: must be monotone increasing # Compare elapsed time with more general version system.time(out2 <- normalmixEM(waiting, lambda=c(.5,.5), mu=c(50,80), sigma=c(10,10), arbvar=FALSE)) out2$all.loglik # Values should be identical to above
##Analyzing the Old Faithful geyser data with a 2-component mixture of normals. data(faithful) attach(faithful) set.seed(100) system.time(out <- normalmixEM2comp(waiting, lambda=.5, mu=c(50,80), sigsqrd=100)) out$all.loglik # Note: must be monotone increasing # Compare elapsed time with more general version system.time(out2 <- normalmixEM(waiting, lambda=c(.5,.5), mu=c(50,80), sigma=c(10,10), arbvar=FALSE)) out2$all.loglik # Values should be identical to above
Return EC-MM (see below) algorithm output for mixtures of normal distributions
with linear constraints on the means and variances parameters,
as in Chauveau and Hunter (2013).
The linear constraint for the means is of the form
, where
and
are matrix
and vector specified as parameters.
The linear constraints for the variances are actually specified on
the inverse variances, by
, where
is the vector of inverse variances, and
is a matrix
specified as a parameter (see below).
normalmixMMlc(x, lambda = NULL, mu = NULL, sigma = NULL, k = 2, mean.constr = NULL, mean.lincstr = NULL, mean.constant = NULL, var.lincstr = NULL, gparam = NULL, epsilon = 1e-08, maxit = 1000, maxrestarts=20, verb = FALSE)
normalmixMMlc(x, lambda = NULL, mu = NULL, sigma = NULL, k = 2, mean.constr = NULL, mean.lincstr = NULL, mean.constant = NULL, var.lincstr = NULL, gparam = NULL, epsilon = 1e-08, maxit = 1000, maxrestarts=20, verb = FALSE)
x |
A vector of length n consisting of the data. |
lambda |
Initial value of mixing proportions. Automatically
repeated as necessary
to produce a vector of length |
mu |
Starting value of vector of component means.
If non-NULL and a vector,
|
sigma |
Starting value of vector of component standard deviations
for algorithm.
Obsolete for linear constraints on the inverse variances;
use |
k |
Number of components. Initial value ignored unless |
mean.constr |
First, simplest way to define
equality constraints on the mean parameters, given as
a vector of length |
mean.lincstr |
Matrix |
mean.constant |
Vector of |
var.lincstr |
Matrix |
gparam |
Vector of |
epsilon |
The convergence criterion. Convergence is declared when the change in the observed data log-likelihood increases by less than epsilon. |
maxit |
The maximum allowed number of iterations. |
maxrestarts |
The maximum number of restarts allowed in case of a problem with the particular starting values chosen due to one of the variance estimates getting too small (each restart uses randomly chosen starting values). It is well-known that when each component of a normal mixture may have its own mean and variance, the likelihood has no maximizer; in such cases, we hope to find a "nice" local maximum with this algorithm instead, but occasionally the algorithm finds a "not nice" solution and one of the variances goes to zero, driving the likelihood to infinity. |
verb |
If TRUE, then various updates are printed during each iteration of the algorithm. |
This is a specific "EC-MM" algorithm for normal mixtures
with linear constraints on the means and variances parameters.
EC-MM here means that this algorithm is similar to
an ECM algorithm as in Meng and Rubin (1993),
except that it uses conditional MM
(Minorization-Maximization)-steps
instead of simple M-steps. Conditional means that it
alternates between maximizing with respect to the mu
and lambda
while holding sigma
fixed, and maximizing with
respect to sigma
and lambda
while holding mu
fixed. This ECM generalization of EM is forced in the case of linear constraints because there is no closed-form EM algorithm.
normalmixMMlc
returns a list of class mixEM
with items:
x |
The raw data. |
lambda |
The final mixing proportions. |
mu |
The final mean parameters. |
sigma |
The final standard deviation(s) |
scale |
Scale factor for the component standard deviations, if applicable. |
loglik |
The final log-likelihood. |
posterior |
An nxk matrix of posterior probabilities for observations. |
all.loglik |
A vector of each iteration's log-likelihood. This vector includes both the initial and the final values; thus, the number of iterations is one less than its length. |
restarts |
The number of times the algorithm restarted due to unacceptable choice of initial values. |
beta |
The final |
gamma |
The final |
ft |
A character vector giving the name of the function. |
Didier Chauveau
McLachlan, G. J. and Peel, D. (2000) Finite Mixture Models, John Wiley & Sons, Inc.
Meng, X.-L. and Rubin, D. B. (1993) Maximum Likelihood Estimation Via the ECM Algorithm: A General Framework, Biometrika 80(2): 267-278.
Chauveau, D. and Hunter, D.R. (2013) ECM and MM algorithms for mixtures with constrained parameters, preprint https://hal.archives-ouvertes.fr/hal-00625285.
Thomas, H., Lohaus, A., and Domsch, H. (2011) Stable Unstable Reliability Theory, British Journal of Mathematical and Statistical Psychology 65(2): 201-221.
normalmixEM
, mvnormalmixEM
,
normalmixEM2comp
, tauequivnormalmixEM
## Analyzing synthetic data as in the tau equivalent model ## From Thomas et al (2011), see also Chauveau and Hunter (2013) ## a 3-component mixture of normals with linear constraints. lbd <- c(0.6,0.3,0.1); m <- length(lbd) sigma <- sig0 <- sqrt(c(1,9,9)) # means constaints mu = M beta M <- matrix(c(1,1,1,0,-1,1), 3, 2) beta <- c(1,5) # unknown constrained mean mu0 <- mu <- as.vector(M %*% beta) # linear constraint on the inverse variances pi = A.g A <- matrix(c(1,1,1,0,1,0), m, 2, byrow=TRUE) iv0 <- 1/(sig0^2) g0 <- c(iv0[2],iv0[1] - iv0[2]) # gamma^0 init # simulation and EM fits set.seed(50); n=100; x <- rnormmix(n,lbd,mu,sigma) s <- normalmixEM(x,mu=mu0,sigma=sig0,maxit=2000) # plain EM # EM with var and mean linear constraints sc <- normalmixMMlc(x, lambda=lbd, mu=mu0, sigma=sig0, mean.lincstr=M, var.lincstr=A, gparam=g0) # plot and compare both estimates dnormmixt <- function(t, lam, mu, sig){ m <- length(lam); f <- 0 for (j in 1:m) f <- f + lam[j]*dnorm(t,mean=mu[j],sd=sig[j]) f} t <- seq(min(x)-2, max(x)+2, len=200) hist(x, freq=FALSE, col="lightgrey", ylim=c(0,0.3), ylab="density",main="") lines(t, dnormmixt(t, lbd, mu, sigma), col="darkgrey", lwd=2) # true lines(t, dnormmixt(t, s$lambda, s$mu, s$sigma), lty=2) lines(t, dnormmixt(t, sc$lambda, sc$mu, sc$sigma), col=1, lty=3) legend("topleft", c("true","plain EM","constr EM"), col=c("darkgrey",1,1), lty=c(1,2,3), lwd=c(2,1,1))
## Analyzing synthetic data as in the tau equivalent model ## From Thomas et al (2011), see also Chauveau and Hunter (2013) ## a 3-component mixture of normals with linear constraints. lbd <- c(0.6,0.3,0.1); m <- length(lbd) sigma <- sig0 <- sqrt(c(1,9,9)) # means constaints mu = M beta M <- matrix(c(1,1,1,0,-1,1), 3, 2) beta <- c(1,5) # unknown constrained mean mu0 <- mu <- as.vector(M %*% beta) # linear constraint on the inverse variances pi = A.g A <- matrix(c(1,1,1,0,1,0), m, 2, byrow=TRUE) iv0 <- 1/(sig0^2) g0 <- c(iv0[2],iv0[1] - iv0[2]) # gamma^0 init # simulation and EM fits set.seed(50); n=100; x <- rnormmix(n,lbd,mu,sigma) s <- normalmixEM(x,mu=mu0,sigma=sig0,maxit=2000) # plain EM # EM with var and mean linear constraints sc <- normalmixMMlc(x, lambda=lbd, mu=mu0, sigma=sig0, mean.lincstr=M, var.lincstr=A, gparam=g0) # plot and compare both estimates dnormmixt <- function(t, lam, mu, sig){ m <- length(lam); f <- 0 for (j in 1:m) f <- f + lam[j]*dnorm(t,mean=mu[j],sd=sig[j]) f} t <- seq(min(x)-2, max(x)+2, len=200) hist(x, freq=FALSE, col="lightgrey", ylim=c(0,0.3), ylab="density",main="") lines(t, dnormmixt(t, lbd, mu, sigma), col="darkgrey", lwd=2) # true lines(t, dnormmixt(t, s$lambda, s$mu, s$sigma), lty=2) lines(t, dnormmixt(t, sc$lambda, sc$mu, sc$sigma), col=1, lty=3) legend("topleft", c("true","plain EM","constr EM"), col=c("darkgrey",1,1), lty=c(1,2,3), lwd=c(2,1,1))
Returns nonparametric EM algorithm output (Benaglia et al, 2009) for mixtures of multivariate (repeated measures) data where the coordinates of a row (case) in the data matrix are assumed to be independent, conditional on the mixture component (subpopulation) from which they are drawn.
npEM(x, mu0, blockid = 1:ncol(x), bw = bw.nrd0(as.vector(as.matrix(x))), samebw = TRUE, h = bw, eps = 1e-8, maxiter = 500, stochastic = FALSE, verb = TRUE)
npEM(x, mu0, blockid = 1:ncol(x), bw = bw.nrd0(as.vector(as.matrix(x))), samebw = TRUE, h = bw, eps = 1e-8, maxiter = 500, stochastic = FALSE, verb = TRUE)
x |
An |
mu0 |
Either an |
blockid |
A vector of length |
bw |
Bandwidth for density estimation, equal to the standard deviation
of the kernel density. By default, a simplistic application of the
default |
samebw |
Logical: If |
h |
Alternative way to specify the bandwidth, to provide backward compatibility. |
eps |
Tolerance limit for declaring algorithm convergence. Convergence
is declared whenever the maximum change in any coordinate of the
|
maxiter |
The maximum number of iterations allowed, for both
stochastic and non-stochastic versions;
for non-stochastic algorithms ( |
stochastic |
Flag, if FALSE (the default), runs the non-stochastic version
of the npEM algorithm, as in Benaglia et al (2009). Set to TRUE to
run a stochastic version which simulates the posteriors at each
iteration, and runs for |
verb |
If TRUE, print updates for every iteration of the algorithm as it runs |
npEM
returns a list of class npEM
with the following items:
data |
The raw data (an |
posteriors |
An |
bandwidth |
If |
blockid |
Same as the |
lambda |
The sequence of mixing proportions over iterations. |
lambdahat |
The final mixing proportions if |
loglik |
The sequence of log-likelihoods over iterations. |
Benaglia, T., Chauveau, D., and Hunter, D. R. (2009), An EM-like algorithm for semi- and non-parametric estimation in multivariate mixtures, Journal of Computational and Graphical Statistics, 18, 505-526.
Benaglia, T., Chauveau, D., Hunter, D. R., and Young, D. (2009), mixtools: An R package for analyzing finite mixture models. Journal of Statistical Software, 32(6):1-29.
Benaglia, T., Chauveau, D. and Hunter, D.R. (2011), Bandwidth Selection in an EM-like algorithm for nonparametric multivariate mixtures. Nonparametric Statistics and Mixture Models: A Festschrift in Honor of Thomas P. Hettmansperger. World Scientific Publishing Co., pages 15-27.
Bordes, L., Chauveau, D., and Vandekerkhove, P. (2007), An EM algorithm for a semiparametric mixture model, Computational Statistics and Data Analysis, 51: 5429-5443.
plot.npEM
, normmixrm.sim
, spEMsymloc
,
spEM
, plotseq.npEM
## Examine and plot water-level task data set. ## First, try a 3-component solution where no two coordinates are ## assumed i.d. data(Waterdata) set.seed(100) ## Not run: a <- npEM(Waterdata[,3:10], mu0=3, bw=4) # Assume indep but not iid plot(a) # This produces 8 plots, one for each coordinate ## End(Not run) ## Next, same thing but pairing clock angles that are directly opposite one ## another (1:00 with 7:00, 2:00 with 8:00, etc.) ## Not run: b <- npEM(Waterdata[,3:10], mu0=3, blockid=c(4,3,2,1,3,4,1,2), bw=4) # iid in pairs plot(b) # Now only 4 plots, one for each block ## End(Not run)
## Examine and plot water-level task data set. ## First, try a 3-component solution where no two coordinates are ## assumed i.d. data(Waterdata) set.seed(100) ## Not run: a <- npEM(Waterdata[,3:10], mu0=3, bw=4) # Assume indep but not iid plot(a) # This produces 8 plots, one for each coordinate ## End(Not run) ## Next, same thing but pairing clock angles that are directly opposite one ## another (1:00 with 7:00, 2:00 with 8:00, etc.) ## Not run: b <- npEM(Waterdata[,3:10], mu0=3, blockid=c(4,3,2,1,3,4,1,2), bw=4) # iid in pairs plot(b) # Now only 4 plots, one for each block ## End(Not run)
Returns nonparametric Smoothed Likelihood algorithm output (Levine et al, 2011) for mixtures of multivariate (repeated measures) data where the coordinates of a row (case) in the data matrix are assumed to be independent, conditional on the mixture component (subpopulation) from which they are drawn.
npMSL(x, mu0, blockid = 1:ncol(x), bw = bw.nrd0(as.vector(as.matrix(x))), samebw = TRUE, bwmethod = "S", h = bw, eps = 1e-8, maxiter=500, bwiter = maxiter, nbfold = NULL, ngrid=200, post=NULL, verb = TRUE)
npMSL(x, mu0, blockid = 1:ncol(x), bw = bw.nrd0(as.vector(as.matrix(x))), samebw = TRUE, bwmethod = "S", h = bw, eps = 1e-8, maxiter=500, bwiter = maxiter, nbfold = NULL, ngrid=200, post=NULL, verb = TRUE)
x |
An |
mu0 |
Either an |
blockid |
A vector of length |
bw |
Bandwidth for density estimation, equal to the standard deviation
of the kernel density. By default, a simplistic application of the
default |
samebw |
Logical: If |
bwmethod |
Define the adaptive bandwidth strategy when |
h |
Alternative way to specify the bandwidth, to provide backward compatibility. |
eps |
Tolerance limit for declaring algorithm convergence. Convergence
is declared whenever the maximum change in any coordinate of the
|
maxiter |
The maximum number of iterations allowed, convergence
may be declared before |
bwiter |
The maximum number of iterations allowed for adaptive bandwidth stage,
when |
nbfold |
A parameter passed to the internal function |
ngrid |
Number of points in the discretization of the intervals over which are approximated the (univariate) integrals for non linear smoothing of the log-densities, as required in the E step of the npMSL algorithm, see Levine et al (2011). |
post |
If non-NULL, an |
verb |
If TRUE, print updates for every iteration of the algorithm as it runs |
npMSL
returns a list of class npEM
with the following items:
data |
The raw data (an |
posteriors |
An |
bandwidth |
If |
blockid |
Same as the |
lambda |
The sequence of mixing proportions over iterations. |
lambdahat |
The final mixing proportions. |
loglik |
The sequence of log-likelihoods over iterations. |
f |
An array of size |
meanNaN |
Average number of |
meanUdfl |
Average number of "underflow" that occured over iterations (for internal testing and control purpose). |
Benaglia, T., Chauveau, D., and Hunter, D. R. (2009), An EM-like algorithm for semi- and non-parametric estimation in multivariate mixtures, Journal of Computational and Graphical Statistics, 18, 505-526.
Benaglia, T., Chauveau, D. and Hunter, D.R. (2011), Bandwidth Selection in an EM-like algorithm for nonparametric multivariate mixtures. Nonparametric Statistics and Mixture Models: A Festschrift in Honor of Thomas P. Hettmansperger. World Scientific Publishing Co., pages 15-27.
Chauveau D., Hunter D. R. and Levine M. (2014), Semi-Parametric Estimation for Conditional Independence Multivariate Finite Mixture Models. Preprint (under revision).
Levine, M., Hunter, D. and Chauveau, D. (2011), Maximum Smoothed Likelihood for Multivariate Mixtures, Biometrika 98(2): 403-416.
npEM
, plot.npEM
,
normmixrm.sim
, spEMsymloc
,
spEM
, plotseq.npEM
## Examine and plot water-level task data set. ## Block structure pairing clock angles that are directly opposite one ## another (1:00 with 7:00, 2:00 with 8:00, etc.) set.seed(111) # Ensure that results are exactly reproducible data(Waterdata) blockid <- c(4,3,2,1,3,4,1,2) # see Benaglia et al (2009a) ## Not run: a <- npEM(Waterdata[,3:10], mu0=3, blockid=blockid, bw=4) # npEM solution b <- npMSL(Waterdata[,3:10], mu0=3, blockid=blockid, bw=4) # smoothed version # Comparisons on the 4 default plots, one for each block par(mfrow=c(2,2)) for (l in 1:4){ plot(a, blocks=l, breaks=5*(0:37)-92.5, xlim=c(-90,90), xaxt="n",ylim=c(0,.035), xlab="") plot(b, blocks=l, hist=FALSE, newplot=FALSE, addlegend=FALSE, lty=2, dens.col=1) axis(1, at=30*(1:7)-120, cex.axis=1) legend("topleft",c("npMSL"),lty=2, lwd=2)} ## End(Not run)
## Examine and plot water-level task data set. ## Block structure pairing clock angles that are directly opposite one ## another (1:00 with 7:00, 2:00 with 8:00, etc.) set.seed(111) # Ensure that results are exactly reproducible data(Waterdata) blockid <- c(4,3,2,1,3,4,1,2) # see Benaglia et al (2009a) ## Not run: a <- npEM(Waterdata[,3:10], mu0=3, blockid=blockid, bw=4) # npEM solution b <- npMSL(Waterdata[,3:10], mu0=3, blockid=blockid, bw=4) # smoothed version # Comparisons on the 4 default plots, one for each block par(mfrow=c(2,2)) for (l in 1:4){ plot(a, blocks=l, breaks=5*(0:37)-92.5, xlim=c(-90,90), xaxt="n",ylim=c(0,.035), xlab="") plot(b, blocks=l, hist=FALSE, newplot=FALSE, addlegend=FALSE, lty=2, dens.col=1) axis(1, at=30*(1:7)-120, cex.axis=1) legend("topleft",c("npMSL"),lty=2, lwd=2)} ## End(Not run)
Takes an object of class mixEM
and returns various graphical output for select mixture models.
## S3 method for class 'mixEM' plot(x, whichplots = 1, loglik = 1 %in% whichplots, density = 2 %in% whichplots, xlab1="Iteration", ylab1="Log-Likelihood", main1="Observed Data Log-Likelihood", col1=1, lwd1=2, xlab2=NULL, ylab2=NULL, main2=NULL, col2=NULL, lwd2=2, alpha = 0.05, marginal = FALSE, ...)
## S3 method for class 'mixEM' plot(x, whichplots = 1, loglik = 1 %in% whichplots, density = 2 %in% whichplots, xlab1="Iteration", ylab1="Log-Likelihood", main1="Observed Data Log-Likelihood", col1=1, lwd1=2, xlab2=NULL, ylab2=NULL, main2=NULL, col2=NULL, lwd2=2, alpha = 0.05, marginal = FALSE, ...)
x |
An object of class |
whichplots |
vector telling which plots to produce: 1 = loglikelihood
plot, 2 = density plot. Irrelevant if |
loglik |
If TRUE, a plot of the log-likelihood versus the EM iterations is given. |
density |
Graphics pertaining to certain mixture models. The details are given below. |
xlab1 , ylab1 , main1 , col1 , lwd1
|
Graphical parameters |
xlab2 , ylab2 , main2 , col2 , lwd2
|
Same as |
alpha |
A vector of significance levels when constructing confidence ellipses and confidence bands for the mixture of multivariate normals and mixture of regressions cases, respectively. The default is 0.05. |
marginal |
For the mixture of bivariate normals, should optional marginal histograms be included? |
... |
Graphical parameters passed to |
plot.mixEM
returns a plot of the log-likelihood versus the EM iterations by default for all objects of class
mixEM
. In addition, other plots may be produced for the following k-component mixture model functions:
normalmixEM |
A histogram of the raw data is produced along with k density curves determined by |
repnormmixEM |
A histogram of the raw data produced in a similar manner as for |
mvnormalmixEM |
A 2-dimensional plot with each point color-coded to denote its most probable component membership. In
addition, the estimated component means are plotted along with (1 - |
regmixEM |
A plot of the response versus the predictor with each point color-coded to denote its most probable component
membership. In addition, the estimated component regression lines are plotted along with (1 - |
logisregmixEM |
A plot of the binary response versus the predictor with each point color-coded to denote its most probable compopnent membership. In addition, the estimate component logistic regression lines are plotted. |
regmixEM.mixed |
Provides a 2x2 matrix of plots summarizing the posterior slope and posterior intercept terms from a
mixture of random effects regression. See |
##Analyzing the Old Faithful geyser data with a 2-component mixture of normals. data(faithful) attach(faithful) set.seed(100) out <- normalmixEM(waiting, arbvar = FALSE, verb = TRUE, epsilon = 1e-04) plot(out, density = TRUE, w = 1.1) ##Fitting randomly generated data with a 2-component location mixture of bivariate normals. x.1 <- rmvnorm(40, c(0, 0)) x.2 <- rmvnorm(60, c(3, 4)) X.1 <- rbind(x.1, x.2) out.1 <- mvnormalmixEM(X.1, arbvar = FALSE, verb = TRUE, epsilon = 1e-03) plot(out.1, density = TRUE, alpha = c(0.01, 0.05, 0.10), marginal = TRUE)
##Analyzing the Old Faithful geyser data with a 2-component mixture of normals. data(faithful) attach(faithful) set.seed(100) out <- normalmixEM(waiting, arbvar = FALSE, verb = TRUE, epsilon = 1e-04) plot(out, density = TRUE, w = 1.1) ##Fitting randomly generated data with a 2-component location mixture of bivariate normals. x.1 <- rmvnorm(40, c(0, 0)) x.2 <- rmvnorm(60, c(3, 4)) X.1 <- rbind(x.1, x.2) out.1 <- mvnormalmixEM(X.1, arbvar = FALSE, verb = TRUE, epsilon = 1e-03) plot(out.1, density = TRUE, alpha = c(0.01, 0.05, 0.10), marginal = TRUE)
Takes an object of class mixMCMC
and returns various graphical output for select mixture models.
## S3 method for class 'mixMCMC' plot(x, trace.plots = TRUE, summary.plots = FALSE, burnin = 2000, ...)
## S3 method for class 'mixMCMC' plot(x, trace.plots = TRUE, summary.plots = FALSE, burnin = 2000, ...)
x |
An object of class |
trace.plots |
If TRUE, trace plots of the various parameters estimated by the MCMC methods is given. |
summary.plots |
Graphics pertaining to certain mixture models. The details are given below. |
burnin |
The values 1 to |
... |
Graphical parameters passed to |
plot.mixMCMC
returns trace plots of the various parameters estimated by the MCMC methods for all objects of class
mixMCMC
. In addition, other plots may be produced for the following k-component mixture model functions:
regmixMH |
Credible bands for the regression lines in a mixture of linear regressions. See |
## M-H algorithm for NOdata with acceptance rate about 40%. data(NOdata) attach(NOdata) set.seed(100) beta <- matrix(c(1.3, -0.1, 0.6, 0.1), 2, 2) sigma <- c(.02, .05) MH.out <- regmixMH(Equivalence, NO, beta = beta, s = sigma, sampsize = 2500, omega = .0013) plot(MH.out, summary.plots = TRUE, burnin = 2450, alpha = 0.01)
## M-H algorithm for NOdata with acceptance rate about 40%. data(NOdata) attach(NOdata) set.seed(100) beta <- matrix(c(1.3, -0.1, 0.6, 0.1), 2, 2) sigma <- c(.02, .05) MH.out <- regmixMH(Equivalence, NO, beta = beta, s = sigma, sampsize = 2500, omega = .0013) plot(MH.out, summary.plots = TRUE, burnin = 2450, alpha = 0.01)
Takes an object of class mvnpEM
, as the one returned by the mvnpEM
algorithm, and returns a set of plots of the
density estimates for each coordinate within each multivariate block.
All the components are displayed on each plot so it is possible
to see the mixture structure for each coordinate and block. The final bandwidth values are also displayed,
in a format depending on the bandwidth strategy .
## S3 method for class 'mvnpEM' plot(x, truenorm = FALSE, lambda = NULL, mu = NULL, v = NULL, lgdcex = 1, ...)
## S3 method for class 'mvnpEM' plot(x, truenorm = FALSE, lambda = NULL, mu = NULL, v = NULL, lgdcex = 1, ...)
x |
An object of class |
truenorm |
Mostly for checking purpose, if the nonparametric model is to be compared with a multivariate Gaussian mixture as the true model. |
lambda |
true weight parameters, for Gaussian models only (see above) |
mu |
true mean parameters, for Gaussian models only (see above) |
v |
true covariance matrices, for Gaussian models only (see above) |
lgdcex |
Character expansion factor for |
... |
Any remaining arguments are passed to |
plot.mvnpEM
currently just plots the figure.
# example as in Chauveau and Hoang (2015) with 6 coordinates ## Not run: m=2; r=6; blockid <-c(1,1,2,2,3,3) # 3 bivariate blocks # generate some data x ... a <- mvnpEM(x, mu0=2, blockid, samebw=F) # adaptive bandwidth plot(a) # this S3 method produces 6 plots of univariate marginals summary(a) ## End(Not run)
# example as in Chauveau and Hoang (2015) with 6 coordinates ## Not run: m=2; r=6; blockid <-c(1,1,2,2,3,3) # 3 bivariate blocks # generate some data x ... a <- mvnpEM(x, mu0=2, blockid, samebw=F) # adaptive bandwidth plot(a) # this S3 method produces 6 plots of univariate marginals summary(a) ## End(Not run)
Takes an object of class npEM
and returns a set of plots of the
density estimates for each block and each component. There is one plot
per block, with all the components displayed on each block so it is possible
to see the mixture structure for each block.
## S3 method for class 'npEM' plot(x, blocks = NULL, hist=TRUE, addlegend = TRUE, scale=TRUE, title=NULL, breaks="Sturges", ylim=NULL, dens.col, newplot = TRUE, pos.legend = "topright", cex.legend = 1, ...) ## S3 method for class 'spEM' plot(x, blocks = NULL, hist=TRUE, addlegend = TRUE, scale=TRUE, title=NULL, breaks="Sturges", ylim=NULL, dens.col, newplot = TRUE, pos.legend = "topright", cex.legend = 1, ...)
## S3 method for class 'npEM' plot(x, blocks = NULL, hist=TRUE, addlegend = TRUE, scale=TRUE, title=NULL, breaks="Sturges", ylim=NULL, dens.col, newplot = TRUE, pos.legend = "topright", cex.legend = 1, ...) ## S3 method for class 'spEM' plot(x, blocks = NULL, hist=TRUE, addlegend = TRUE, scale=TRUE, title=NULL, breaks="Sturges", ylim=NULL, dens.col, newplot = TRUE, pos.legend = "topright", cex.legend = 1, ...)
x |
An object of class |
blocks |
Blocks (of repeated measures coordinates) to plot; not relevant for univariate case. Default is to plot all blocks. |
hist |
If TRUE, superimpose density estimate plots on a histogram of the data |
addlegend |
If TRUE, adds legend to the plot. |
scale |
If TRUE, scale each density estimate by its corresponding estimated mixing proportion, so that the total area under all densities equals 1 and the densities plotted may be added to produce an estimate of the mixture density. When FALSE, each density curve has area 1 in the plot. |
title |
Alternative vector of main titles for plots (recycled as many times as needed) |
breaks |
Passed directly to the |
ylim |
|
dens.col |
Color values to use for the individual component density
functions, repeated as necessary. Default value is |
newplot |
If TRUE, creates a new plot. |
pos.legend |
Single argument specifying the
position of the legend. See ‘Details’ section of
|
cex.legend |
Character expansion factor for |
... |
Any remaining arguments are passed to the |
plot.npEM
returns a list with two elements:
x |
List of matrices. The |
y |
|
npEM
, density.npEM
, spEMsymloc
,
plotseq.npEM
## Examine and plot water-level task data set. ## First, try a 3-component solution where no two coordinates are ## assumed i.d. data(Waterdata) set.seed(100) ## Not run: a <- npEM(Waterdata[,3:10], 3, bw=4) par(mfrow=c(2,4)) plot(a) # This produces 8 plots, one for each coordinate ## End(Not run) ## Not run: ## Next, same thing but pairing clock angles that are directly opposite one ## another (1:00 with 7:00, 2:00 with 8:00, etc.) b <- npEM(Waterdata[,3:10], 3, blockid=c(4,3,2,1,3,4,1,2), bw=4) par(mfrow=c(2,2)) plot(b) # Now only 4 plots, one for each block ## End(Not run)
## Examine and plot water-level task data set. ## First, try a 3-component solution where no two coordinates are ## assumed i.d. data(Waterdata) set.seed(100) ## Not run: a <- npEM(Waterdata[,3:10], 3, bw=4) par(mfrow=c(2,4)) plot(a) # This produces 8 plots, one for each coordinate ## End(Not run) ## Not run: ## Next, same thing but pairing clock angles that are directly opposite one ## another (1:00 with 7:00, 2:00 with 8:00, etc.) b <- npEM(Waterdata[,3:10], 3, blockid=c(4,3,2,1,3,4,1,2), bw=4) par(mfrow=c(2,2)) plot(b) # Now only 4 plots, one for each block ## End(Not run)
Plot mixture density for the semiparametric mixture model output by spEMsymlocN01, with one component known and set to normal(0,1), and a symmetric nonparametric density with location parameter.
## S3 method for class 'spEMN01' plot(x, bw = x$bandwidth, knownpdf = dnorm, add.plot = FALSE, ...)
## S3 method for class 'spEMN01' plot(x, bw = x$bandwidth, knownpdf = dnorm, add.plot = FALSE, ...)
x |
An object of class "spEMN01" as returned by spEMsymlocN01 |
bw |
Bandwidth for weighted kernel density estimation. |
knownpdf |
The known density of component 1, default to |
add.plot |
Set to TRUE to add to an existing plot. |
... |
further arguments passed to |
A plot of the density of the mixture
Didier Chauveau
Chauveau, D., Saby, N., Orton, T. G., Lemercier B., Walter, C. and Arrouys, D. Large-scale simultaneous hypothesis testing in soil monitoring: A semi-parametric mixture approach, preprint (2013).
Function for plotting sequences of estimates along iterations, from an object returned by the expRMM_EM
, an EM algorithm for mixture of exponential
distributions with randomly right censored data (see reference below).
plotexpRMM(a, title=NULL, rowstyle=TRUE, subtitle=NULL, ...)
plotexpRMM(a, title=NULL, rowstyle=TRUE, subtitle=NULL, ...)
a |
An object returned by |
title |
The title of the plot, set to some default value if |
rowstyle |
Window organization, for plots in rows (the default) or columns. |
subtitle |
A subtitle for the plot, set to some default value if |
... |
Other parameters (such as |
The plot returned
Didier Chauveau
Bordes, L., and Chauveau, D. (2016), Stochastic EM algorithms for parametric and semiparametric mixture models for right-censored lifetime data, Computational Statistics, Volume 31, Issue 4, pages 1513-1538. https://link.springer.com/article/10.1007/s00180-016-0661-7
Related functions:
expRMM_EM
, summary.mixEM
, plot.mixEM
.
Other models and algorithms for censored lifetime data
(name convention is model_algorithm):
weibullRMM_SEM
, spRMM_SEM
.
n=300 # sample size m=2 # number of mixture components lambda <- c(1/3,1-1/3); rate <- c(1,1/10) # mixture parameters set.seed(1234) x <- rexpmix(n, lambda, rate) # iid ~ exponential mixture cs=runif(n,0,max(x)) # Censoring (uniform) and incomplete data t <- apply(cbind(x,cs),1,min) # observed or censored data d <- 1*(x <= cs) # censoring indicator ###### EM for RMM, exponential lifetimes l0 <- rep(1/m,m); r0 <- c(1, 0.5) # "arbitrary" initial values a <- expRMM_EM(t, d, lambda=l0, rate=r0, k = m) summary(a) # EM estimates etc plotexpRMM(a, lwd=2) # plot of EM sequences
n=300 # sample size m=2 # number of mixture components lambda <- c(1/3,1-1/3); rate <- c(1,1/10) # mixture parameters set.seed(1234) x <- rexpmix(n, lambda, rate) # iid ~ exponential mixture cs=runif(n,0,max(x)) # Censoring (uniform) and incomplete data t <- apply(cbind(x,cs),1,min) # observed or censored data d <- 1*(x <= cs) # censoring indicator ###### EM for RMM, exponential lifetimes l0 <- rep(1/m,m); r0 <- c(1, 0.5) # "arbitrary" initial values a <- expRMM_EM(t, d, lambda=l0, rate=r0, k = m) summary(a) # EM estimates etc plotexpRMM(a, lwd=2) # plot of EM sequences
Plot FDR estimates against index of sorted p-values
from, e.g., normalmixEM or the semiparametric mixture model posterior probabilities
output by
spEMsymlocN01
, or any EM-algorithm like
normalmixEM
which returns posterior probabilities. The function
can simultaneously plot FDR estimates from two strategies for comparison.
Plot of the true FDR can be added if complete data are available
(typically in simulation studies).
plotFDR(post1, post2 = NULL, lg1 = "FDR 1", lg2 = NULL, title = NULL, compH0 = 1, alpha = 0.1, complete.data = NULL, pctfdr = 0.3)
plotFDR(post1, post2 = NULL, lg1 = "FDR 1", lg2 = NULL, title = NULL, compH0 = 1, alpha = 0.1, complete.data = NULL, pctfdr = 0.3)
post1 |
The matrix of posterior probabilities from objects such as the output
from |
post2 |
A second object like |
lg1 |
Text describing the FDR estimate in |
lg2 |
Text describing the FDR estimate in |
title |
Plot title, a default is provided if |
compH0 |
The component indicator associated to the null hypothesis H0,
normally 1 since it is defined in this way in |
alpha |
The target FDR level; the index at which the FDR estimate crosses the horizontal line for level |
complete.data |
An array with |
pctfdr |
The level up to which the FDR is plotted, i.e. the scale of the vertical axis. |
A plot of one or two FDR estimates, with the true FDR if available
Didier Chauveau
Chauveau, D., Saby, N., Orton, T. G., Lemercier B., Walter, C. and Arrouys, D. Large-scale simultaneous hypothesis testing in monitoring carbon content from French soil database – A semi-parametric mixture approach, Geoderma 219-220 (2014), 117-124.
plotly
Plot the components' CDF via the posterior probabilities using plotly
.
plotly_compCDF(data, weights, x=seq(min(data, na.rm=TRUE), max(data, na.rm=TRUE), len=250), comp=1:NCOL(weights), makeplot=TRUE, cex = 3, width = 3, legend.text = "Composition", legend.text.size = 15, legend.size = 15, title = "Empirical CDF", title.x = 0.5, title.y = 0.95, title.size = 15, xlab = "Data", xlab.size = 15, xtick.size = 15, ylab = "Probability", ylab.size = 15, ytick.size = 15, col.comp = NULL)
plotly_compCDF(data, weights, x=seq(min(data, na.rm=TRUE), max(data, na.rm=TRUE), len=250), comp=1:NCOL(weights), makeplot=TRUE, cex = 3, width = 3, legend.text = "Composition", legend.text.size = 15, legend.size = 15, title = "Empirical CDF", title.x = 0.5, title.y = 0.95, title.size = 15, xlab = "Data", xlab.size = 15, xtick.size = 15, ylab = "Probability", ylab.size = 15, ytick.size = 15, col.comp = NULL)
data |
A matrix containing the raw data. Rows are subjects and columns are repeated measurements. |
weights |
The weights to compute the empirical CDF; however, most of time they are the posterior probabilities. |
x |
The points at which the CDFs are to be evaluated. |
comp |
The mixture components for which CDFs are desired. |
makeplot |
Logical: Should a plot be produced as a side effect? |
cex |
Size of markers. |
width |
Line width. |
title |
Text of the main title. |
title.size |
Size of the main title. |
title.x |
Horsizontal position of the main title. |
title.y |
Vertical posotion of the main title. |
xlab |
Label of X-axis. |
xlab.size |
Size of the lable of X-axis. |
xtick.size |
Size of tick lables of X-axis. |
ylab |
Label of Y-axis. |
ylab.size |
Size of the lable of Y-axis. |
ytick.size |
Size of tick lables of Y-axis. |
legend.text |
Title of legend. |
legend.text.size |
Size of the legend title. |
legend.size |
Size of legend. |
col.comp |
Color of compositions. Number of color specified needs to be consistent with number of compositions. |
When makeplot
is TRUE
, a line plot is produced of the
CDFs evaluated at x
. The plot is not a step function plot;
the points are simply joined by line segments.
A matrix with length(comp)
rows and length(x)
columns
in which each row gives the CDF evaluated at each point of x
.
McLachlan, G. J. and Peel, D. (2000) Finite Mixture Models, John Wiley and Sons, Inc.
Elmore, R. T., Hettmansperger, T. P. and Xuan, F. (2004) The Sign Statistic, One-Way Layouts and Mixture Models, Statistical Science 19(4), 579–587.
makemultdata
, multmixmodel.sel
, multmixEM
, compCDF
.
## The sulfur content of the coal seams in Texas set.seed(100) A <- c(1.51, 1.92, 1.08, 2.04, 2.14, 1.76, 1.17) B <- c(1.69, 0.64, .9, 1.41, 1.01, .84, 1.28, 1.59) C <- c(1.56, 1.22, 1.32, 1.39, 1.33, 1.54, 1.04, 2.25, 1.49) D <- c(1.3, .75, 1.26, .69, .62, .9, 1.2, .32) E <- c(.73, .8, .9, 1.24, .82, .72, .57, 1.18, .54, 1.3) dis.coal <- makemultdata(A, B, C, D, E, cuts = median(c(A, B, C, D, E))) temp <- multmixEM(dis.coal) ## Now plot the components' CDF via the posterior probabilities plotly_compCDF(dis.coal$x, temp$posterior, xlab="Sulfur")
## The sulfur content of the coal seams in Texas set.seed(100) A <- c(1.51, 1.92, 1.08, 2.04, 2.14, 1.76, 1.17) B <- c(1.69, 0.64, .9, 1.41, 1.01, .84, 1.28, 1.59) C <- c(1.56, 1.22, 1.32, 1.39, 1.33, 1.54, 1.04, 2.25, 1.49) D <- c(1.3, .75, 1.26, .69, .62, .9, 1.2, .32) E <- c(.73, .8, .9, 1.24, .82, .72, .57, 1.18, .54, 1.3) dis.coal <- makemultdata(A, B, C, D, E, cuts = median(c(A, B, C, D, E))) temp <- multmixEM(dis.coal) ## Now plot the components' CDF via the posterior probabilities plotly_compCDF(dis.coal$x, temp$posterior, xlab="Sulfur")
plotly
This is an updated version of ellipse
. For more technical details, please refer to ellipse
.
plotly_ellipse(mu, sigma, alpha=.05, npoints=250, draw=TRUE, cex = 3, col = "#1f77b4", lwd = 3, title = "", title.x = 0.5, title.y = 0.95, title.size = 15, xlab = "X", xlab.size = 15, xtick.size = 15, ylab = "Y", ylab.size = 15, ytick.size = 15)
plotly_ellipse(mu, sigma, alpha=.05, npoints=250, draw=TRUE, cex = 3, col = "#1f77b4", lwd = 3, title = "", title.x = 0.5, title.y = 0.95, title.size = 15, xlab = "X", xlab.size = 15, xtick.size = 15, ylab = "Y", ylab.size = 15, ytick.size = 15)
mu |
A 2-vector giving the mean. |
sigma |
A 2x2 matrix giving the covariance matrix. |
alpha |
Probability to be excluded from the ellipse. The default value is alpha = .05, which results in a 95% ellipse. |
npoints |
Number of points comprising the border of the ellipse. |
draw |
If TRUE, draw the ellipse. |
cex |
Size of markers. |
lwd |
Line width of the ellipse. |
col |
Color of both markers and lines. |
title |
Text of the main title. |
title.size |
Size of the main title. |
title.x |
Horsizontal position of the main title. |
title.y |
Vertical posotion of the main title. |
xlab |
Label of X-axis. |
xlab.size |
Size of the lable of X-axis. |
xtick.size |
Size of tick lables of X-axis. |
ylab |
Label of Y-axis. |
ylab.size |
Size of the lable of Y-axis. |
ytick.size |
Size of tick lables of Y-axis. |
plotly_ellipse
returns an npoints
x2 matrix of the points forming the
border of the ellipse.
Johnson, R. A. and Wichern, D. W. (2002) Applied Multivariate Statistical Analysis, Fifth Edition, Prentice Hall.
## Produce a 95% ellipse with the specified mean and covariance structure. mu <- c(1, 3) sigma <- matrix(c(1, .3, .3, 1.5), 2, 2) plotly_ellipse(mu, sigma, npoints = 200)
## Produce a 95% ellipse with the specified mean and covariance structure. mu <- c(1, 3) sigma <- matrix(c(1, .3, .3, 1.5), 2, 2) plotly_ellipse(mu, sigma, npoints = 200)
plotly
This is an updated function of plotexpRMM
. For more technical details, please refer to plotexpRMM
.
plotly_expRMM(a , title = NULL , rowstyle = TRUE , subtitle=NULL, width = 2 , cex = 2 , col.comp = NULL, legend.text = NULL, legend.text.size = 15, legend.size = 15, title.x = 0.5, title.y = 0.95, title.size = 15, xlab.size = 15, xtick.size = 15, ylab.size = 15, ytick.size = 15)
plotly_expRMM(a , title = NULL , rowstyle = TRUE , subtitle=NULL, width = 2 , cex = 2 , col.comp = NULL, legend.text = NULL, legend.text.size = 15, legend.size = 15, title.x = 0.5, title.y = 0.95, title.size = 15, xlab.size = 15, xtick.size = 15, ylab.size = 15, ytick.size = 15)
a |
An object returned by |
title |
The title of the plot, set to some default value if |
rowstyle |
Window organization, for plots in rows (the default) or columns. |
subtitle |
A subtitle for the plot, set to some default value if |
width |
Line width. |
cex |
Size of dots. |
col.comp |
Color of different components. Number of color specified needs to be consistent with number of components. |
legend.text |
Title of legend. |
legend.text.size |
Size of the legend title. |
legend.size |
Size of legend. |
title.size |
Size of the main title. |
title.x |
Horsizontal position of the main title. |
title.y |
Vertical posotion of the main title. |
xlab.size |
Size of the lable of X-axis. |
xtick.size |
Size of tick lables of X-axis. |
ylab.size |
Size of the lable of Y-axis. |
ytick.size |
Size of tick lables of Y-axis. |
The plot returned
Didier Chauveau
Bordes, L., and Chauveau, D. (2016), Stochastic EM algorithms for parametric and semiparametric mixture models for right-censored lifetime data, Computational Statistics, Volume 31, Issue 4, pages 1513-1538. https://link.springer.com/article/10.1007/s00180-016-0661-7
Related functions:
expRMM_EM
, summary.mixEM
, plot.mixEM
, plotexpRMM
.
Other models and algorithms for censored lifetime data
(name convention is model_algorithm):
weibullRMM_SEM
, spRMM_SEM
.
n=300 # sample size m=2 # number of mixture components lambda <- c(1/3,1-1/3); rate <- c(1,1/10) # mixture parameters set.seed(1234) x <- rexpmix(n, lambda, rate) # iid ~ exponential mixture cs=runif(n,0,max(x)) # Censoring (uniform) and incomplete data t <- apply(cbind(x,cs),1,min) # observed or censored data d <- 1*(x <= cs) # censoring indicator ###### EM for RMM, exponential lifetimes l0 <- rep(1/m,m); r0 <- c(1, 0.5) # "arbitrary" initial values a <- expRMM_EM(t, d, lambda=l0, rate=r0, k = m) summary(a) # EM estimates etc plotly_expRMM(a , rowstyle = TRUE) # plot of EM sequences
n=300 # sample size m=2 # number of mixture components lambda <- c(1/3,1-1/3); rate <- c(1,1/10) # mixture parameters set.seed(1234) x <- rexpmix(n, lambda, rate) # iid ~ exponential mixture cs=runif(n,0,max(x)) # Censoring (uniform) and incomplete data t <- apply(cbind(x,cs),1,min) # observed or censored data d <- 1*(x <= cs) # censoring indicator ###### EM for RMM, exponential lifetimes l0 <- rep(1/m,m); r0 <- c(1, 0.5) # "arbitrary" initial values a <- expRMM_EM(t, d, lambda=l0, rate=r0, k = m) summary(a) # EM estimates etc plotly_expRMM(a , rowstyle = TRUE) # plot of EM sequences
plotly
This is an updated version of plotFDR
. For more technical details, please refer to plotFDR
.
plotly_FDR(post1, post2=NULL, lg1="FDR 1", lg2=NULL, compH0=1, alpha=0.1, complete.data =NULL, pctfdr=0.3, col = NULL, width = 3 , title = NULL , title.size = 15 , title.x = 0.5 , title.y = 0.95, xlab = "Index" , xlab.size = 15 , xtick.size = 15, ylab = "Probability" , ylab.size = 15 , ytick.size = 15, legend.text = "" , legend.text.size = 15 , legend.size = 15)
plotly_FDR(post1, post2=NULL, lg1="FDR 1", lg2=NULL, compH0=1, alpha=0.1, complete.data =NULL, pctfdr=0.3, col = NULL, width = 3 , title = NULL , title.size = 15 , title.x = 0.5 , title.y = 0.95, xlab = "Index" , xlab.size = 15 , xtick.size = 15, ylab = "Probability" , ylab.size = 15 , ytick.size = 15, legend.text = "" , legend.text.size = 15 , legend.size = 15)
post1 |
The matrix of posterior probabilities from objects such as the output
from |
post2 |
A second object like |
lg1 |
Text describing the FDR estimate in |
lg2 |
Text describing the FDR estimate in |
compH0 |
The component indicator associated to the null hypothesis H0,
normally 1 since it is defined in this way in |
alpha |
The target FDR level; the index at which the FDR estimate crosses the horizontal line for level |
complete.data |
An array with |
pctfdr |
The level up to which the FDR is plotted, i.e. the scale of the vertical axis. |
col |
Color of traces. |
width |
Width of traces. |
title |
Text of the main title. |
title.size |
Size of the main title. |
title.x |
Horsizontal position of the main title. |
title.y |
Vertical posotion of the main title. |
xlab |
Label of X-axis. |
xlab.size |
Size of the lable of X-axis. |
xtick.size |
Size of tick lables of X-axis. |
ylab |
Label of Y-axis. |
ylab.size |
Size of the lable of Y-axis. |
ytick.size |
Size of tick lables of Y-axis. |
legend.text |
Title of legend. |
legend.text.size |
Size of the legend title. |
legend.size |
Size of legend. |
A plot of one or two FDR estimates, with the true FDR if available
Didier Chauveau
Chauveau, D., Saby, N., Orton, T. G., Lemercier B., Walter, C. and Arrouys, D. Large-scale simultaneous hypothesis testing in monitoring carbon content from French soil database – A semi-parametric mixture approach, Geoderma 219-220 (2014), 117-124.
## Probit transform of p-values ## from a Beta-Uniform mixture model ## comparion of parametric and semiparametric EM fit ## Note: in actual situations n=thousands set.seed(50) n=300 # nb of multiple tests m=2 # 2 mixture components a=c(1,0.1); b=c(1,1); lambda=c(0.6,0.4) # parameters z=sample(1:m, n, rep=TRUE, prob = lambda) p <- rbeta(n, shape1 = a[z], shape2 = b[z]) # p-values o <- order(p) cpd <- cbind(z,p)[o,] # sorted complete data, z=1 if H0, 2 if H1 p <- cpd[,2] # sorted p-values y <- qnorm(p) # probit transform of the pvalues # gaussian EM fit with component 1 constrained to N(0,1) s1 <- normalmixEM(y, mu=c(0,-4), mean.constr = c(0,NA), sd.constr = c(1,NA)) s2 <- spEMsymlocN01(y, mu0 = c(0,-3)) # spEM with N(0,1) fit plotly_FDR(s1$post, s2$post, lg1 = "normalmixEM", lg2 = "spEMsymlocN01", complete.data = cpd) # with true FDR computed from z
## Probit transform of p-values ## from a Beta-Uniform mixture model ## comparion of parametric and semiparametric EM fit ## Note: in actual situations n=thousands set.seed(50) n=300 # nb of multiple tests m=2 # 2 mixture components a=c(1,0.1); b=c(1,1); lambda=c(0.6,0.4) # parameters z=sample(1:m, n, rep=TRUE, prob = lambda) p <- rbeta(n, shape1 = a[z], shape2 = b[z]) # p-values o <- order(p) cpd <- cbind(z,p)[o,] # sorted complete data, z=1 if H0, 2 if H1 p <- cpd[,2] # sorted p-values y <- qnorm(p) # probit transform of the pvalues # gaussian EM fit with component 1 constrained to N(0,1) s1 <- normalmixEM(y, mu=c(0,-4), mean.constr = c(0,NA), sd.constr = c(1,NA)) s2 <- spEMsymlocN01(y, mu0 = c(0,-3)) # spEM with N(0,1) fit plotly_FDR(s1$post, s2$post, lg1 = "normalmixEM", lg2 = "spEMsymlocN01", complete.data = cpd) # with true FDR computed from z
plotly
This is an updated visualization function for ise.npEM
. For more technical details, please refer to ise.npEM
.
plotly_ise.npEM(npEMout, component=1, block=1, truepdf=dnorm, lower=-Inf, upper=Inf, plots = TRUE , col = NULL , width = 3, title = NULL , title.size = 15 , title.x = 0.5 , title.y = 0.95, xlab = "t" , xlab.size = 15 , xtick.size = 15, ylab = "" , ylab.size = 15 , ytick.size = 15, legend.text = "" , legend.text.size = 15 , legend.size = 15, ...)
plotly_ise.npEM(npEMout, component=1, block=1, truepdf=dnorm, lower=-Inf, upper=Inf, plots = TRUE , col = NULL , width = 3, title = NULL , title.size = 15 , title.x = 0.5 , title.y = 0.95, xlab = "t" , xlab.size = 15 , xtick.size = 15, ylab = "" , ylab.size = 15 , ytick.size = 15, legend.text = "" , legend.text.size = 15 , legend.size = 15, ...)
npEMout |
An object of class |
component , block
|
Component and block of particular density to analyze
from |
truepdf |
an R function taking a numeric first argument and returning a numeric vector of the same length. Returning a non-finite element will generate an error. |
lower , upper
|
the limits of integration. Can be infinite. |
plots |
logical: Should plots be produced? |
... |
additional arguments to be passed to |
col |
Color of traces. |
width |
Line width of traces. |
title |
Text of the main title. |
title.size |
Size of the main title. |
title.x |
Horsizontal position of the main title. |
title.y |
Vertical posotion of the main title. |
xlab |
Label of X-axis. |
xlab.size |
Size of the lable of X-axis. |
xtick.size |
Size of tick lables of X-axis. |
ylab |
Label of Y-axis. |
ylab.size |
Size of the lable of Y-axis. |
ytick.size |
Size of tick lables of Y-axis. |
legend.text |
Title of legend. |
legend.text.size |
Size of the legend title. |
legend.size |
Size of legend. |
This function calls the wkde
(weighted kernel
density estimate) function.
Just as for the integrate
function,
a list of class "integrate"
with components
value |
the final estimate of the integral. |
abs.error |
estimate of the modulus of the absolute error. |
subdivisions |
the number of subintervals produced in the subdivision process. |
message |
|
call |
the matched call. |
Benaglia, T., Chauveau, D., and Hunter, D. R. (2009), An EM-like algorithm for semi- and non-parametric estimation in multivariate mixtures, Journal of Computational and Graphical Statistics, 18, 505-526.
Benaglia, T., Chauveau, D., Hunter, D. R., and Young, D. (2009), mixtools: An R package for analyzing finite mixture models. Journal of Statistical Software, 32(6):1-29.
npEM
, wkde
, integrate
, ise.npEM
## Not run: data(Waterdata) set.seed(100) a <- npEM(Waterdata[,3:10], mu0=3, bw=4) # Assume indep but not iid plotly_ise.npEM(a , plots = TRUE) ## End(Not run)
## Not run: data(Waterdata) set.seed(100) a <- npEM(Waterdata[,3:10], mu0=3, bw=4) # Assume indep but not iid plotly_ise.npEM(a , plots = TRUE) ## End(Not run)
mixEM
function using plotly
This is an updated version of plot.mixEM
. For more technical details, please refer to plot.mixEM
.
plotly_mixEM(x, loglik = TRUE, density = FALSE, xlab1="Iteration", xlab1.size=15 , xtick1.size=15, ylab1="Log-Likelihood", ylab1.size=15 , ytick1.size=15, title1="Observed Data Log-Likelihood", title1.size=15, title1.x = 0.5,title1.y=0.95, col1="#1f77b4", lwd1=3, cex1=6, xlab2=NULL, xlab2.size=15 , xtick2.size=15, ylab2=NULL, ylab2.size=15 , ytick2.size=15, title2=NULL, title2.size=15, title2.x = 0.5,title2.y=0.95, col.hist = "#1f77b4", col2=NULL, lwd2=3, cex2=6, alpha = 0.05, marginal = FALSE)
plotly_mixEM(x, loglik = TRUE, density = FALSE, xlab1="Iteration", xlab1.size=15 , xtick1.size=15, ylab1="Log-Likelihood", ylab1.size=15 , ytick1.size=15, title1="Observed Data Log-Likelihood", title1.size=15, title1.x = 0.5,title1.y=0.95, col1="#1f77b4", lwd1=3, cex1=6, xlab2=NULL, xlab2.size=15 , xtick2.size=15, ylab2=NULL, ylab2.size=15 , ytick2.size=15, title2=NULL, title2.size=15, title2.x = 0.5,title2.y=0.95, col.hist = "#1f77b4", col2=NULL, lwd2=3, cex2=6, alpha = 0.05, marginal = FALSE)
x |
An object of class |
loglik |
If TRUE, a plot of the log-likelihood versus the EM iterations is given. |
density |
Graphics pertaining to certain mixture models. The details are given below. |
xlab1 |
Label of x-axis to be passed to the loglikelihood plot. Trying to change these parameters using |
xlab1.size |
Font of |
xtick1.size |
Font of tick labels of x-axis to be passed to the loglikelihood plot. |
ylab1 |
Label of y-axis to be passed to the loglikelihood plot. Trying to change these parameters using |
ylab1.size |
Font of |
ytick1.size |
Font of tick labels of y-axis to be passed to the loglikelihood plot. |
title1 |
Title to be passed to the loglikelihood plot. |
title1.size |
Tile size of the loglikelihood plot. |
title1.x |
Horizontal position of the loglikelihood plot. |
title1.y |
Verticle position of the loglikelihood plot. |
col1 |
Color of the loglikelihood plot. |
lwd1 |
Width of the density curve of the loglikelihood plot. |
cex1 |
Dot size of the loglikelihood plot. |
xlab2 |
Label of x-axis to be passed to the density plot. Trying to change these parameters using |
xlab2.size |
Font of |
xtick2.size |
Font of tick labels of x-axis to be passed to the density plot. |
ylab2 |
Label of y-axis to be passed to the density plot. Trying to change these parameters using |
ylab2.size |
Font of |
ytick2.size |
Font of tick labels of y-axis to be passed to the density plot. |
title2 |
Title to be passed to the density plot. |
title2.size |
Tile size of the density plot. |
title2.x |
Horizontal position of the density plot. |
title2.y |
Verticle position of the density plot. |
col2 |
Color of the density plot. |
lwd2 |
Width of the density curve of the density plot. |
cex2 |
Dot size of the density plot. |
col.hist |
Color of the histogram of the density plot |
alpha |
A vector of significance levels when constructing confidence ellipses and confidence bands for the mixture of multivariate normals and mixture of regressions cases, respectively. The default is 0.05 |
marginal |
If |
A plot of the output of mixEM
function is presented depends on output type.
## Not run: ## EM output for data generated from a 2-component binary logistic regression model. beta <- matrix(c(-10, .1, 20, -.1), 2, 2) x <- runif(500, 50, 250) x1 <- cbind(1, x) xbeta <- x1 w <- rbinom(500, 1, .3) y <- w*rbinom(500, size = 1, prob = (1/(1+exp(-xbeta[, 1]))))+ (1-w)*rbinom(500, size = 1, prob = (1/(1+exp(-xbeta[, 2])))) out.2 <- logisregmixEM(y, x, beta = beta, lambda = c(.3, .7), verb = TRUE, epsilon = 1e-01) plotly_mixEM(out.2 , col2 = c("red" , "green") , density = TRUE) ## Fitting randomly generated data with a 2-component location mixture of bivariate normals. set.seed(100) x.1 <- rmvnorm(40, c(0, 0)) x.2 <- rmvnorm(60, c(3, 4)) X.1 <- rbind(x.1, x.2) mu <- list(c(0, 0), c(3, 4)) out.1 <- mvnormalmixEM(X.1, arbvar = FALSE, mu = mu, epsilon = 1e-02) plotly_mixEM(out.1 , col2 = c("brown" , "blue") , alpha = c(0.01 , 0.05 , 0.1), density = TRUE , marginal = FALSE) ## Fitting randomly generated data with a 2-component scale mixture of bivariate normals. x.3 <- rmvnorm(40, c(0, 0), sigma = matrix(c(200, 1, 1, 150), 2, 2)) x.4 <- rmvnorm(60, c(0, 0)) X.2 <- rbind(x.3, x.4) lambda <- c(0.40, 0.60) sigma <- list(diag(1, 2), matrix(c(200, 1, 1, 150), 2, 2)) out.2 <- mvnormalmixEM(X.2, arbmean = FALSE, sigma = sigma, lambda = lambda, epsilon = 1e-02) plotly_mixEM(out.1 , col2 = c("brown" , "blue") , alpha = c(0.01 , 0.05 , 0.1), density = TRUE , marginal = TRUE) ## EM output for simulated data from 2-component mixture of random effects. data(RanEffdata) set.seed(100) x <- lapply(1:length(RanEffdata), function(i) matrix(RanEffdata[[i]][, 2:3], ncol = 2)) x <- x[1:20] y <- lapply(1:length(RanEffdata), function(i) matrix(RanEffdata[[i]][, 1], ncol = 1)) y <- y[1:20] lambda <- c(0.45, 0.55) mu <- matrix(c(0, 4, 100, 12), 2, 2) sigma <- 2 R <- list(diag(1, 2), diag(1, 2)) em.out <- regmixEM.mixed(y, x, sigma = sigma, arb.sigma = FALSE, lambda = lambda, mu = mu, R = R, addintercept.random = FALSE, epsilon = 1e-02, verb = TRUE) plotly_mixEM(em.out , col2 = c("gold" , "purple") , density = TRUE , lwd2 = 1 , cex2 =9) ## Analyzing the Old Faithful geyser data with a 2-component mixture of normals. data(faithful) attach(faithful) set.seed(100) out <- normalmixEM(waiting, arbvar = FALSE, verb = TRUE, epsilon = 1e-04) plotly_mixEM(out, density = TRUE , col2 = c("gold" , "purple")) ## EM output for the water-level task data set. data(Waterdata) set.seed(100) water <- t(as.matrix(Waterdata[,3:10])) em.out <- repnormmixEM(water, k = 2, verb = TRUE, epsilon = 1e-03) plotly_mixEM(em.out, density = TRUE , col2 = c("gold" , "purple")) ## End(Not run)
## Not run: ## EM output for data generated from a 2-component binary logistic regression model. beta <- matrix(c(-10, .1, 20, -.1), 2, 2) x <- runif(500, 50, 250) x1 <- cbind(1, x) xbeta <- x1 w <- rbinom(500, 1, .3) y <- w*rbinom(500, size = 1, prob = (1/(1+exp(-xbeta[, 1]))))+ (1-w)*rbinom(500, size = 1, prob = (1/(1+exp(-xbeta[, 2])))) out.2 <- logisregmixEM(y, x, beta = beta, lambda = c(.3, .7), verb = TRUE, epsilon = 1e-01) plotly_mixEM(out.2 , col2 = c("red" , "green") , density = TRUE) ## Fitting randomly generated data with a 2-component location mixture of bivariate normals. set.seed(100) x.1 <- rmvnorm(40, c(0, 0)) x.2 <- rmvnorm(60, c(3, 4)) X.1 <- rbind(x.1, x.2) mu <- list(c(0, 0), c(3, 4)) out.1 <- mvnormalmixEM(X.1, arbvar = FALSE, mu = mu, epsilon = 1e-02) plotly_mixEM(out.1 , col2 = c("brown" , "blue") , alpha = c(0.01 , 0.05 , 0.1), density = TRUE , marginal = FALSE) ## Fitting randomly generated data with a 2-component scale mixture of bivariate normals. x.3 <- rmvnorm(40, c(0, 0), sigma = matrix(c(200, 1, 1, 150), 2, 2)) x.4 <- rmvnorm(60, c(0, 0)) X.2 <- rbind(x.3, x.4) lambda <- c(0.40, 0.60) sigma <- list(diag(1, 2), matrix(c(200, 1, 1, 150), 2, 2)) out.2 <- mvnormalmixEM(X.2, arbmean = FALSE, sigma = sigma, lambda = lambda, epsilon = 1e-02) plotly_mixEM(out.1 , col2 = c("brown" , "blue") , alpha = c(0.01 , 0.05 , 0.1), density = TRUE , marginal = TRUE) ## EM output for simulated data from 2-component mixture of random effects. data(RanEffdata) set.seed(100) x <- lapply(1:length(RanEffdata), function(i) matrix(RanEffdata[[i]][, 2:3], ncol = 2)) x <- x[1:20] y <- lapply(1:length(RanEffdata), function(i) matrix(RanEffdata[[i]][, 1], ncol = 1)) y <- y[1:20] lambda <- c(0.45, 0.55) mu <- matrix(c(0, 4, 100, 12), 2, 2) sigma <- 2 R <- list(diag(1, 2), diag(1, 2)) em.out <- regmixEM.mixed(y, x, sigma = sigma, arb.sigma = FALSE, lambda = lambda, mu = mu, R = R, addintercept.random = FALSE, epsilon = 1e-02, verb = TRUE) plotly_mixEM(em.out , col2 = c("gold" , "purple") , density = TRUE , lwd2 = 1 , cex2 =9) ## Analyzing the Old Faithful geyser data with a 2-component mixture of normals. data(faithful) attach(faithful) set.seed(100) out <- normalmixEM(waiting, arbvar = FALSE, verb = TRUE, epsilon = 1e-04) plotly_mixEM(out, density = TRUE , col2 = c("gold" , "purple")) ## EM output for the water-level task data set. data(Waterdata) set.seed(100) water <- t(as.matrix(Waterdata[,3:10])) em.out <- repnormmixEM(water, k = 2, verb = TRUE, epsilon = 1e-03) plotly_mixEM(em.out, density = TRUE , col2 = c("gold" , "purple")) ## End(Not run)
plotly
This is an updated version of plot.mixMCMC
. For technical details, please refer to plot.mixMCMC
.
plotly_mixMCMC(x, trace.plot = TRUE, summary.plot = FALSE, burnin = 2000, credit.region = 0.95, col.cr = NULL, cex.trace = 3, width.trace = 3, cex.summary = 3, width.summary = 1, title.trace = "", title.trace.x = 0.5, title.trace.y = 0.95, title.trace.size = 15, xlab.trace = "Index", xlab.trace.size = 15, xtick.trace.size = 15, ylab.trace = NULL, ylab.trace.size = 15, ytick.trace.size = 15, title.summary = "Credible Regions", title.summary.x = 0.5, title.summary.y = 0.95, title.summary.size = 15, xlab.summary = "Predictor", xlab.summary.size = 15, xtick.summary.size = 15, ylab.summary = "Response", ylab.summary.size = 15, ytick.summary.size = 15 )
plotly_mixMCMC(x, trace.plot = TRUE, summary.plot = FALSE, burnin = 2000, credit.region = 0.95, col.cr = NULL, cex.trace = 3, width.trace = 3, cex.summary = 3, width.summary = 1, title.trace = "", title.trace.x = 0.5, title.trace.y = 0.95, title.trace.size = 15, xlab.trace = "Index", xlab.trace.size = 15, xtick.trace.size = 15, ylab.trace = NULL, ylab.trace.size = 15, ytick.trace.size = 15, title.summary = "Credible Regions", title.summary.x = 0.5, title.summary.y = 0.95, title.summary.size = 15, xlab.summary = "Predictor", xlab.summary.size = 15, xtick.summary.size = 15, ylab.summary = "Response", ylab.summary.size = 15, ytick.summary.size = 15 )
x |
An object of class |
trace.plot |
If TRUE, trace plots of the various parameters estimated by the MCMC methods is given. |
summary.plot |
Graphics pertaining to certain mixture models. The details are given below. |
burnin |
The values 1 to |
credit.region |
Confidence level of credit region. |
col.cr |
Color of credit region. Number of color specified needs to be consistent with number of components. |
cex.trace |
Dot size of trace plots. |
width.trace |
Line width of trace plots. |
cex.summary |
Dot size of summary plots. |
width.summary |
Line width of summary plots. |
title.trace |
Text of the main title of trace plots. |
title.trace.x |
Horizontal position of main title of trace plots. |
title.trace.y |
Vertical position of main title of trace plots. |
title.trace.size |
Text sise of main title of trace plots. |
xlab.trace |
Label of X-axis of trace plots. |
xlab.trace.size |
Size of the lable of X-axis of trace plots. |
xtick.trace.size |
Size of tick lables of X-axis of trace plots. |
ylab.trace |
Label of Y-axis of trace plots. |
ylab.trace.size |
Size of the lable of Y-axis of trace plots. |
ytick.trace.size |
Size of tick lables of Y-axis of trace plots. |
title.summary |
Text of the main title of summar plot. |
title.summary.x |
Horizontal position of main title of summary plot. |
title.summary.y |
Vertical position of main title of summary plot. |
title.summary.size |
Text sise of main title of summary plot. |
xlab.summary |
Label of X-axis of summary plot. |
xlab.summary.size |
Size of the lable of X-axis of summary plot. |
xtick.summary.size |
Size of tick lables of X-axis of summary plot. |
ylab.summary |
Label of Y-axis of summary plot. |
ylab.summary.size |
Size of the lable of Y-axis of summary plot. |
ytick.summary.size |
Size of tick lables of Y-axis of summary plot. |
plotly_mixMCMC
returns trace plots of the various parameters estimated by the MCMC methods for all objects of class
mixMCMC
. In addition, other plots may be produced for the following k-component mixture model functions:
regmixMH |
Credible bands for the regression lines in a mixture of linear regressions. See |
regcr
, plot.mixMCMC
## Not run: data(NOdata) attach(NOdata) set.seed(100) beta <- matrix(c(1.3, -0.1, 0.6, 0.1), 2, 2) sigma <- c(.02, .05) MH.out <- regmixMH(Equivalence, NO, beta = beta, s = sigma, sampsize = 2500, omega = .0013) plotly_mixMCMC(x = MH.out, summary.plot = TRUE, col.cr = c("red", "green")) ## End(Not run)
## Not run: data(NOdata) attach(NOdata) set.seed(100) beta <- matrix(c(1.3, -0.1, 0.6, 0.1), 2, 2) sigma <- c(.02, .05) MH.out <- regmixMH(Equivalence, NO, beta = beta, s = sigma, sampsize = 2500, omega = .0013) plotly_mixMCMC(x = MH.out, summary.plot = TRUE, col.cr = c("red", "green")) ## End(Not run)
Construct a mixturegram for determining an apporpriate number of components using plotly
.
plotly_mixturegram(data, pmbs, method=c("pca","kpca","lda"), all.n=FALSE, id.con=NULL, score=1, iter.max=50, nstart=25, xlab = "K", xlab.size = 15, xtick.size = 15, ylab = NULL, ylab.size = 15, ytick.size = 15, cex = 12, col.dot = "red", width = 1, title = "Mixturegram", title.size = 15, title.x = 0.5, title.y = 0.95)
plotly_mixturegram(data, pmbs, method=c("pca","kpca","lda"), all.n=FALSE, id.con=NULL, score=1, iter.max=50, nstart=25, xlab = "K", xlab.size = 15, xtick.size = 15, ylab = NULL, ylab.size = 15, ytick.size = 15, cex = 12, col.dot = "red", width = 1, title = "Mixturegram", title.size = 15, title.x = 0.5, title.y = 0.95)
data |
The data, which must either be a vector or a matrix. If a matrix, then the rows correspond to the observations. |
pmbs |
A list of length (K-1) such that each element is an nxk matrix of the posterior membership probabilities. These are obtained from each of the "best" estimated k-component mixture models, k = 2,...,K. |
method |
The dimension reduction method used. |
all.n |
A logical specifying whether the mixturegram should plot the profiles of all observations ( |
id.con |
An argument that allows one to impose some sort of (meaningful) identifiability constraint so that the mixture components are in some sort of comparable order between mixture models with different numbers of components. If |
score |
The value for the specified dimension reduction technique's score, which is used for constructing the mixturegram. By default, this value is |
iter.max |
The maximum number of iterations allowed for the k-means clustering algorithm, which is passed to the |
nstart |
The number of random sets chosen based on k centers, which is passed to the |
title |
Text of the main title. |
title.size |
Size of the main title. |
title.x |
Horsizontal position of the main title. |
title.y |
Vertical posotion of the main title. |
xlab |
Label of X-axis. |
xlab.size |
Size of the lable of X-axis. |
xtick.size |
Size of tick lables of X-axis. |
ylab |
Label of Y-axis. |
ylab.size |
Size of the lable of Y-axis. |
ytick.size |
Size of tick lables of Y-axis. |
cex |
Size of dots. |
col.dot |
Color of dots. |
width |
Line width. |
plotly_mixturegram
returns a mixturegram where the profiles are plotted over component values of k = 1,...,K.
Young, D. S., Ke, C., and Zeng, X. (2018) The Mixturegram: A Visualization Tool for Assessing the Number of Components in Finite Mixture Models, Journal of Computational and Graphical Statistics, 27(3), 564–575.
## Not run: ##Data generated from a 2-component mixture of normals. set.seed(100) n <- 100 w <- rmultinom(n,1,c(.3,.7)) y <- sapply(1:n,function(i) w[1,i]*rnorm(1,-6,1) + w[2,i]*rnorm(1,0,1)) selection <- function(i,data,rep=30){ out <- replicate(rep,normalmixEM(data,epsilon=1e-06, k=i,maxit=5000),simplify=FALSE) counts <- lapply(1:rep,function(j) table(apply(out[[j]]$posterior,1, which.max))) counts.length <- sapply(counts, length) counts.min <- sapply(counts, min) counts.test <- (counts.length != i)|(counts.min < 5) if(sum(counts.test) > 0 & sum(counts.test) < rep) out <- out[!counts.test] l <- unlist(lapply(out, function(x) x$loglik)) tmp <- out[[which.max(l)]] } all.out <- lapply(2:5, selection, data = y, rep = 2) pmbs <- lapply(1:length(all.out), function(i) all.out[[i]]$post) plotly_mixturegram(y, pmbs, method = "pca", all.n = TRUE, id.con = NULL, score = 1, title = "Mixturegram (Well-Separated Data)") ## End(Not run)
## Not run: ##Data generated from a 2-component mixture of normals. set.seed(100) n <- 100 w <- rmultinom(n,1,c(.3,.7)) y <- sapply(1:n,function(i) w[1,i]*rnorm(1,-6,1) + w[2,i]*rnorm(1,0,1)) selection <- function(i,data,rep=30){ out <- replicate(rep,normalmixEM(data,epsilon=1e-06, k=i,maxit=5000),simplify=FALSE) counts <- lapply(1:rep,function(j) table(apply(out[[j]]$posterior,1, which.max))) counts.length <- sapply(counts, length) counts.min <- sapply(counts, min) counts.test <- (counts.length != i)|(counts.min < 5) if(sum(counts.test) > 0 & sum(counts.test) < rep) out <- out[!counts.test] l <- unlist(lapply(out, function(x) x$loglik)) tmp <- out[[which.max(l)]] } all.out <- lapply(2:5, selection, data = y, rep = 2) pmbs <- lapply(1:length(all.out), function(i) all.out[[i]]$post) plotly_mixturegram(y, pmbs, method = "pca", all.n = TRUE, id.con = NULL, score = 1, title = "Mixturegram (Well-Separated Data)") ## End(Not run)
This is an updater version of plot.npEM
function by using plotly
. For technical details, please refer to plot.npEM
.
plotly_npEM(x, blocks = NULL, hist=TRUE, addlegend=TRUE, scale = TRUE, title=NULL, breaks="Sturges", dens.col = NULL, newplot=TRUE, ylim = NULL , col.hist = "#1f77b4", width = 3, title.x = 0.5 , title.y = 0.95, title.size = 15, xlab = "X" , xlab.size = 15 , xtick.size = 15, ylab = "Density" , ylab.size = 15 , ytick.size = 15, legend.text = "Posteriors", legend.text.size = 15, legend.size = 15) plotly_spEM(x, blocks = NULL, hist=TRUE, addlegend=TRUE, scale = TRUE, title=NULL, breaks="Sturges", dens.col = NULL, newplot=TRUE, ylim = NULL , col.hist = "#1f77b4", width = 3, title.x = 0.5 , title.y = 0.95, title.size = 15, xlab = "X" , xlab.size = 15 , xtick.size = 15, ylab = "Density" , ylab.size = 15 , ytick.size = 15, legend.text = "Posteriors", legend.text.size = 15, legend.size = 15)
plotly_npEM(x, blocks = NULL, hist=TRUE, addlegend=TRUE, scale = TRUE, title=NULL, breaks="Sturges", dens.col = NULL, newplot=TRUE, ylim = NULL , col.hist = "#1f77b4", width = 3, title.x = 0.5 , title.y = 0.95, title.size = 15, xlab = "X" , xlab.size = 15 , xtick.size = 15, ylab = "Density" , ylab.size = 15 , ytick.size = 15, legend.text = "Posteriors", legend.text.size = 15, legend.size = 15) plotly_spEM(x, blocks = NULL, hist=TRUE, addlegend=TRUE, scale = TRUE, title=NULL, breaks="Sturges", dens.col = NULL, newplot=TRUE, ylim = NULL , col.hist = "#1f77b4", width = 3, title.x = 0.5 , title.y = 0.95, title.size = 15, xlab = "X" , xlab.size = 15 , xtick.size = 15, ylab = "Density" , ylab.size = 15 , ytick.size = 15, legend.text = "Posteriors", legend.text.size = 15, legend.size = 15)
x |
An object of class |
blocks |
Blocks (of repeated measures coordinates) to plot; not relevant for univariate case. Default is to plot all blocks. |
hist |
If TRUE, superimpose density estimate plots on a histogram of the data |
addlegend |
If TRUE, adds legend to the plot. |
scale |
If TRUE, scale each density estimate by its corresponding estimated mixing proportion, so that the total area under all densities equals 1 and the densities plotted may be added to produce an estimate of the mixture density. When FALSE, each density curve has area 1 in the plot. |
title |
Alternative vector of main titles for plots (recycled as many times as needed) |
breaks |
Passed directly to the |
ylim |
|
dens.col |
Color values to use for the individual component density
functions, repeated as necessary. Default value is |
newplot |
If TRUE, creates a new plot. |
col.hist |
Color of the histogram to plot. |
width |
Line width. |
title.size |
Size of the main title. |
title.x |
Horsizontal position of the main title. |
title.y |
Vertical posotion of the main title. |
xlab |
Label of X-axis. |
xlab.size |
Size of the lable of X-axis. |
xtick.size |
Size of tick lables of X-axis. |
ylab |
Label of Y-axis. |
ylab.size |
Size of the lable of Y-axis. |
ytick.size |
Size of tick lables of Y-axis. |
legend.text |
Title of legend. |
legend.text.size |
Size of the legend title. |
legend.size |
Size of legend. |
plotly_npEM
returns a list with two elements:
x |
List of matrices. The |
y |
|
npEM
, density.npEM
, spEMsymloc
,
plotseq.npEM
, plot.npEM
## Not run: ## Examine and plot water-level task data set. ## First, try a 3-component solution where no two coordinates are ## assumed i.d. data(Waterdata) set.seed(100) a <- npEM(Waterdata[,3:10], 3, bw=4) plotly_npEM(a , newplot = FALSE) ## Next, same thing but pairing clock angles that are directly opposite one ## another (1:00 with 7:00, 2:00 with 8:00, etc.) b <- npEM(Waterdata[,3:10], 3, blockid=c(4,3,2,1,3,4,1,2), bw=4) plotly_npEM(b , newplot = FALSE) ## End(Not run)
## Not run: ## Examine and plot water-level task data set. ## First, try a 3-component solution where no two coordinates are ## assumed i.d. data(Waterdata) set.seed(100) a <- npEM(Waterdata[,3:10], 3, bw=4) plotly_npEM(a , newplot = FALSE) ## Next, same thing but pairing clock angles that are directly opposite one ## another (1:00 with 7:00, 2:00 with 8:00, etc.) b <- npEM(Waterdata[,3:10], 3, blockid=c(4,3,2,1,3,4,1,2), bw=4) plotly_npEM(b , newplot = FALSE) ## End(Not run)
plotly
This is an updated version of plotseq.npEM
. For technical details, please refer to plotseq.npEM
.
plotly_seq.npEM (x, col = '#1f77b4' , width = 6, xlab = "Iteration" , xlab.size = 15 , xtick.size = 15, ylab.size = 15 , ytick.size = 15, title.size = 15 , title.x = 0.5 , title.y = 0.95)
plotly_seq.npEM (x, col = '#1f77b4' , width = 6, xlab = "Iteration" , xlab.size = 15 , xtick.size = 15, ylab.size = 15 , ytick.size = 15, title.size = 15 , title.x = 0.5 , title.y = 0.95)
x |
an object of class |
col |
Line color. |
width |
Line width. |
title |
Text of the main title. |
title.size |
Size of the main title. |
title.x |
Horsizontal position of the main title. |
title.y |
Vertical posotion of the main title. |
xlab |
Label of X-axis. |
xlab.size |
Size of the lable of X-axis. |
xtick.size |
Size of tick lables of X-axis. |
ylab.size |
Size of the lable of Y-axis. |
ytick.size |
Size of tick lables of Y-axis. |
plotly_seq.npEM
returns a figure with one plot for each component
proportion, and, in the case of spEMsymloc
, one plot for each
component mean.
Didier Chauveau
Benaglia, T., Chauveau, D., and Hunter, D. R. (2009), An EM-like algorithm for semi- and non-parametric estimation in multivariate mixtures, Journal of Computational and Graphical Statistics (to appear).
Bordes, L., Chauveau, D., and Vandekerkhove, P. (2007), An EM algorithm for a semiparametric mixture model, Computational Statistics and Data Analysis, 51: 5429-5443.
plot.npEM
, rnormmix
,
npEM
, spEMsymloc
, plotly_seq.npEM
## Not run: ## Examine and plot water-level task data set. ## First, try a 3-component solution where no two coordinates are ## assumed i.d. data(Waterdata) set.seed(100) ## Not run: a <- npEM(Waterdata[,3:10], mu0=3, bw=4) # Assume indep but not iid plotly_seq.npEM(a) ## End(Not run)
## Not run: ## Examine and plot water-level task data set. ## First, try a 3-component solution where no two coordinates are ## assumed i.d. data(Waterdata) set.seed(100) ## Not run: a <- npEM(Waterdata[,3:10], mu0=3, bw=4) # Assume indep but not iid plotly_seq.npEM(a) ## End(Not run)
spEMsymlocN01
using plotly
.This is an updated version of plotlspEMN01
function by using plotly
. For technical details, please refer to plot.spEMN01
.
plotly_spEMN01(x, bw=x$bandwidth, knownpdf=dnorm, add.plot=FALSE, width = 3 , col.dens = NULL, col.hist = '#1f77b4', title = NULL , title.size = 15 , title.x = 0.5 , title.y = 0.95, xlab = "t" , xlab.size = 15 , xtick.size = 15, ylab = "Density" , ylab.size = 15 , ytick.size = 15, legend.text = "Densities" , legend.text.size = 15 , legend.size = 15)
plotly_spEMN01(x, bw=x$bandwidth, knownpdf=dnorm, add.plot=FALSE, width = 3 , col.dens = NULL, col.hist = '#1f77b4', title = NULL , title.size = 15 , title.x = 0.5 , title.y = 0.95, xlab = "t" , xlab.size = 15 , xtick.size = 15, ylab = "Density" , ylab.size = 15 , ytick.size = 15, legend.text = "Densities" , legend.text.size = 15 , legend.size = 15)
x |
An object of class "spEMN01" as returned by spEMsymlocN01 |
bw |
Bandwidth for weighted kernel density estimation. |
knownpdf |
The known density of component 1, default to |
add.plot |
Set to TRUE to add to an existing plot. |
width |
Line width. |
col.dens |
Color of density lines. Number of colors specified needs to be consistent with number of components. |
col.hist |
Color of histogram. |
title |
Text of the main title. |
title.size |
Size of the main title. |
title.x |
Horsizontal position of the main title. |
title.y |
Vertical posotion of the main title. |
xlab |
Label of X-axis. |
xlab.size |
Size of the lable of X-axis. |
xtick.size |
Size of tick lables of X-axis. |
ylab |
Label of Y-axis. |
ylab.size |
Size of the lable of Y-axis. |
ytick.size |
Size of tick lables of Y-axis. |
legend.text |
Title of legend. |
legend.text.size |
Size of the legend title. |
legend.size |
Size of legend. |
A plot of the density of the mixture
Didier Chauveau
Chauveau, D., Saby, N., Orton, T. G., Lemercier B., Walter, C. and Arrouys, D. Large-scale simultaneous hypothesis testing in soil monitoring: A semi-parametric mixture approach, preprint (2013).
## Probit transform of p-values ## from a Beta-Uniform mixture model ## comparion of parametric and semiparametric EM fit ## Note: in actual situations n=thousands set.seed(50) n=300 # nb of multiple tests m=2 # 2 mixture components a=c(1,0.1); b=c(1,1); lambda=c(0.6,0.4) # parameters z=sample(1:m, n, rep=TRUE, prob = lambda) p <- rbeta(n, shape1 = a[z], shape2 = b[z]) # p-values o <- order(p) cpd <- cbind(z,p)[o,] # sorted complete data, z=1 if H0, 2 if H1 p <- cpd[,2] # sorted p-values y <- qnorm(p) # probit transform of the pvalues # gaussian EM fit with component 1 constrained to N(0,1) s1 <- normalmixEM(y, mu=c(0,-4), mean.constr = c(0,NA), sd.constr = c(1,NA)) s2 <- spEMsymlocN01(y, mu0 = c(0,-3)) # spEM with N(0,1) fit plotly_spEMN01(s2 , add.plot = FALSE)
## Probit transform of p-values ## from a Beta-Uniform mixture model ## comparion of parametric and semiparametric EM fit ## Note: in actual situations n=thousands set.seed(50) n=300 # nb of multiple tests m=2 # 2 mixture components a=c(1,0.1); b=c(1,1); lambda=c(0.6,0.4) # parameters z=sample(1:m, n, rep=TRUE, prob = lambda) p <- rbeta(n, shape1 = a[z], shape2 = b[z]) # p-values o <- order(p) cpd <- cbind(z,p)[o,] # sorted complete data, z=1 if H0, 2 if H1 p <- cpd[,2] # sorted p-values y <- qnorm(p) # probit transform of the pvalues # gaussian EM fit with component 1 constrained to N(0,1) s1 <- normalmixEM(y, mu=c(0,-4), mean.constr = c(0,NA), sd.constr = c(1,NA)) s2 <- spEMsymlocN01(y, mu0 = c(0,-3)) # spEM with N(0,1) fit plotly_spEMN01(s2 , add.plot = FALSE)
plotly
.
This is an updated version of plotspRMM
function. For technical details, please refer to plotspRMM.
plotly_spRMM(sem, tmax = NULL, width = 3 , col = '#1f77b4', cex = 3, title.size = 15 , title.x = 0.5 , title.y = 0.95, xlab.size = 15 , xtick.size=15 , ylab.size = 15 , ytick.size=15)
plotly_spRMM(sem, tmax = NULL, width = 3 , col = '#1f77b4', cex = 3, title.size = 15 , title.x = 0.5 , title.y = 0.95, xlab.size = 15 , xtick.size=15 , ylab.size = 15 , ytick.size=15)
sem |
An object returned by |
tmax |
The max time for |
width |
Width of lines. |
col |
Color of lines. |
cex |
Size of dots. |
title.size |
Size of the main title. |
title.x |
Horizontal position of the main title. |
title.y |
Vertical position of the main title. |
xlab.size |
Size of the label of X-axis. |
xtick.size |
Size of the tick of X-axis. |
ylab.size |
Size of the label of Y-axis. |
ytick.size |
Size of the tick of Y-axis. |
The four plots returned.
Didier Chauveau
Bordes, L., and Chauveau, D. (2016), Stochastic EM algorithms for parametric and semiparametric mixture models for right-censored lifetime data, Computational Statistics, Volume 31, Issue 4, pages 1513-1538. https://link.springer.com/article/10.1007/s00180-016-0661-7
Related functions: spRMM_SEM
, plotspRMM
.
Other models and algorithms for censored lifetime data
(name convention is model_algorithm):
expRMM_EM
,
weibullRMM_SEM
.
## Not run: n=500 # sample size m=2 # nb components lambda=c(0.4, 0.6) # parameters meanlog=3; sdlog=0.5; scale=0.1 set.seed(12) # simulate a scaled mixture of lognormals x <- rlnormscalemix(n, lambda, meanlog, sdlog, scale) cs=runif(n,20,max(x)+400) # Censoring (uniform) and incomplete data t <- apply(cbind(x,cs),1,min) d <- 1*(x <= cs) tauxc <- 100*round( 1-mean(d),3) cat(tauxc, "percents of data censored.\n") c0 <- c(25, 180) # data-driven initial centers (visible modes) sc0 <- 25/180 # and scaling s <- spRMM_SEM(t, d, scaling = sc0, centers = c0, bw = 15, maxit = 100) plotly_spRMM(s) # default summary(s) # S3 method for class "spRMM" ## End(Not run)
## Not run: n=500 # sample size m=2 # nb components lambda=c(0.4, 0.6) # parameters meanlog=3; sdlog=0.5; scale=0.1 set.seed(12) # simulate a scaled mixture of lognormals x <- rlnormscalemix(n, lambda, meanlog, sdlog, scale) cs=runif(n,20,max(x)+400) # Censoring (uniform) and incomplete data t <- apply(cbind(x,cs),1,min) d <- 1*(x <= cs) tauxc <- 100*round( 1-mean(d),3) cat(tauxc, "percents of data censored.\n") c0 <- c(25, 180) # data-driven initial centers (visible modes) sc0 <- 25/180 # and scaling s <- spRMM_SEM(t, d, scaling = sc0, centers = c0, bw = 15, maxit = 100) plotly_spRMM(s) # default summary(s) # S3 method for class "spRMM" ## End(Not run)
plotly
This is an updated version of plotweibullRMM
function by using plotly
function. For technical details, please refer to plotweibullRMM
.
plotly_weibullRMM(a, title=NULL, rowstyle=TRUE, subtitle=NULL, width = 3 , col = NULL , title.size = 15 , title.x = 0.5 , title.y = 0.95, xlab = "Iterations" , xlab.size = 15 , xtick.size = 15, ylab = "Estimates" , ylab.size = 15 , ytick.size = 15, legend.size = 15)
plotly_weibullRMM(a, title=NULL, rowstyle=TRUE, subtitle=NULL, width = 3 , col = NULL , title.size = 15 , title.x = 0.5 , title.y = 0.95, xlab = "Iterations" , xlab.size = 15 , xtick.size = 15, ylab = "Estimates" , ylab.size = 15 , ytick.size = 15, legend.size = 15)
a |
An object returned by |
title |
The title of the plot, set to some default value if |
rowstyle |
Window organization, for plots in rows (the default) or columns. |
subtitle |
A subtitle for the plot, set to some default value if |
width |
Line width. |
col |
Color of lines. Number of colors specified needs to be consistent with number of components. |
title.size |
Size of the main title. |
title.x |
Horsizontal position of the main title. |
title.y |
Vertical posotion of the main title. |
xlab |
Label of X-axis. |
xlab.size |
Size of the lable of X-axis. |
xtick.size |
Size of tick lables of X-axis. |
ylab |
Label of Y-axis. |
ylab.size |
Size of the lable of Y-axis. |
ytick.size |
Size of tick lables of Y-axis. |
legend.size |
Size of legend. |
The plot returned.
Didier Chauveau
Bordes, L., and Chauveau, D. (2016), Stochastic EM algorithms for parametric and semiparametric mixture models for right-censored lifetime data, Computational Statistics, Volume 31, Issue 4, pages 1513-1538. https://link.springer.com/article/10.1007/s00180-016-0661-7
Related functions:
weibullRMM_SEM
, summary.mixEM
, plotweibullRMM
.
Other models and algorithms for censored lifetime data
(name convention is model_algorithm):
expRMM_EM
,
spRMM_SEM
.
n = 500 # sample size m = 2 # nb components lambda=c(0.4, 0.6) shape <- c(0.5,5); scale <- c(1,20) # model parameters set.seed(321) x <- rweibullmix(n, lambda, shape, scale) # iid ~ weibull mixture cs=runif(n,0,max(x)+10) # iid censoring times t <- apply(cbind(x,cs),1,min) # censored observations d <- 1*(x <= cs) # censoring indicator ## set arbitrary or "reasonable" (e.g., data-driven) initial values l0 <- rep(1/m,m); sh0 <- c(1, 2); sc0 <- c(2,10) # Stochastic EM algorithm a <- weibullRMM_SEM(t, d, lambda = l0, shape = sh0, scale = sc0, maxit = 200) summary(a) # Parameters estimates etc plotly_weibullRMM(a , legend.size = 20) # plot of St-EM sequences
n = 500 # sample size m = 2 # nb components lambda=c(0.4, 0.6) shape <- c(0.5,5); scale <- c(1,20) # model parameters set.seed(321) x <- rweibullmix(n, lambda, shape, scale) # iid ~ weibull mixture cs=runif(n,0,max(x)+10) # iid censoring times t <- apply(cbind(x,cs),1,min) # censored observations d <- 1*(x <= cs) # censoring indicator ## set arbitrary or "reasonable" (e.g., data-driven) initial values l0 <- rep(1/m,m); sh0 <- c(1, 2); sc0 <- c(2,10) # Stochastic EM algorithm a <- weibullRMM_SEM(t, d, lambda = l0, shape = sh0, scale = sc0, maxit = 200) summary(a) # Parameters estimates etc plotly_weibullRMM(a , legend.size = 20) # plot of St-EM sequences
Returns plots of the sequences of scalar parameter
estimates along iterations from an object of class npEM
.
## S3 method for class 'npEM' plotseq(x, ...)
## S3 method for class 'npEM' plotseq(x, ...)
x |
an object of class |
... |
further parameters that are passed to |
plotseq.npEM
returns a figure with one plot for each component
proportion, and, in the case of spEMsymloc
, one plot for each
component mean.
Benaglia, T., Chauveau, D., and Hunter, D. R. (2009), An EM-like algorithm for semi- and non-parametric estimation in multivariate mixtures, Journal of Computational and Graphical Statistics (to appear).
Bordes, L., Chauveau, D., and Vandekerkhove, P. (2007), An EM algorithm for a semiparametric mixture model, Computational Statistics and Data Analysis, 51: 5429-5443.
plot.npEM
, rnormmix
,
npEM
, spEMsymloc
## Example from a normal location mixture n <- 200 set.seed(100) lambda <- c(1/3,2/3) mu <- c(0, 4); sigma<-rep(1, 2) x <- rnormmix(n, lambda, mu, sigma) b <- spEMsymloc(x, mu0=c(-1, 2), stochastic=FALSE) plotseq(b) bst <- spEMsymloc(x, mu0=c(-1, 2), stochastic=TRUE) plotseq(bst)
## Example from a normal location mixture n <- 200 set.seed(100) lambda <- c(1/3,2/3) mu <- c(0, 4); sigma<-rep(1, 2) x <- rnormmix(n, lambda, mu, sigma) b <- spEMsymloc(x, mu0=c(-1, 2), stochastic=FALSE) plotseq(b) bst <- spEMsymloc(x, mu0=c(-1, 2), stochastic=TRUE) plotseq(bst)
Function for plotting various results from an object returned by spRMM_SEM
, a Stochastic EM algorithm for semiparametric scaled
mixture of randomly right censored lifetime data.
Four plots of sequences of estimates along iterations, survival and density estimates
(see reference below).
plotspRMM(sem, tmax = NULL)
plotspRMM(sem, tmax = NULL)
sem |
An object returned by |
tmax |
The max time for |
The four plots returned
Didier Chauveau
Bordes, L., and Chauveau, D. (2016), Stochastic EM algorithms for parametric and semiparametric mixture models for right-censored lifetime data, Computational Statistics, Volume 31, Issue 4, pages 1513-1538. https://link.springer.com/article/10.1007/s00180-016-0661-7
Related functions: spRMM_SEM
.
Other models and algorithms for censored lifetime data
(name convention is model_algorithm):
expRMM_EM
,
weibullRMM_SEM
.
# See example(spRMM_SEM)
# See example(spRMM_SEM)
Function for plotting sequences of estimates along iterations, from an object returned by weibullRMM_SEM
, a Stochastic EM algorithm for mixture of Weibull
distributions with randomly right censored data (see reference below).
plotweibullRMM(a, title = NULL, rowstyle = TRUE, subtitle = NULL, ...)
plotweibullRMM(a, title = NULL, rowstyle = TRUE, subtitle = NULL, ...)
a |
An object returned by |
title |
The title of the plot, set to some default value if |
rowstyle |
Window organization, for plots in rows (the default) or columns. |
subtitle |
A subtitle for the plot, set to some default value if |
... |
Other parameters (such as |
The plot returned
Didier Chauveau
Bordes, L., and Chauveau, D. (2016), Stochastic EM algorithms for parametric and semiparametric mixture models for right-censored lifetime data, Computational Statistics, Volume 31, Issue 4, pages 1513-1538. https://link.springer.com/article/10.1007/s00180-016-0661-7
Related functions:
weibullRMM_SEM
, summary.mixEM
.
Other models and algorithms for censored lifetime data
(name convention is model_algorithm):
expRMM_EM
,
spRMM_SEM
.
n = 500 # sample size m = 2 # nb components lambda=c(0.4, 0.6) shape <- c(0.5,5); scale <- c(1,20) # model parameters set.seed(321) x <- rweibullmix(n, lambda, shape, scale) # iid ~ weibull mixture cs=runif(n,0,max(x)+10) # iid censoring times t <- apply(cbind(x,cs),1,min) # censored observations d <- 1*(x <= cs) # censoring indicator ## set arbitrary or "reasonable" (e.g., data-driven) initial values l0 <- rep(1/m,m); sh0 <- c(1, 2); sc0 <- c(2,10) # Stochastic EM algorithm a <- weibullRMM_SEM(t, d, lambda = l0, shape = sh0, scale = sc0, maxit = 200) summary(a) # Parameters estimates etc plotweibullRMM(a) # default plot of St-EM sequences
n = 500 # sample size m = 2 # nb components lambda=c(0.4, 0.6) shape <- c(0.5,5); scale <- c(1,20) # model parameters set.seed(321) x <- rweibullmix(n, lambda, shape, scale) # iid ~ weibull mixture cs=runif(n,0,max(x)+10) # iid censoring times t <- apply(cbind(x,cs),1,min) # censored observations d <- 1*(x <= cs) # censoring indicator ## set arbitrary or "reasonable" (e.g., data-driven) initial values l0 <- rep(1/m,m); sh0 <- c(1, 2); sc0 <- c(2,10) # Stochastic EM algorithm a <- weibullRMM_SEM(t, d, lambda = l0, shape = sh0, scale = sc0, maxit = 200) summary(a) # Parameters estimates etc plotweibullRMM(a) # default plot of St-EM sequences
Returns EM algorithm output for mixtures of Poisson regressions with arbitrarily many components.
poisregmixEM(y, x, lambda = NULL, beta = NULL, k = 2, addintercept = TRUE, epsilon = 1e-08, maxit = 10000, verb = FALSE)
poisregmixEM(y, x, lambda = NULL, beta = NULL, k = 2, addintercept = TRUE, epsilon = 1e-08, maxit = 10000, verb = FALSE)
y |
An n-vector of response values. |
x |
An nxp matrix of predictors. See |
lambda |
Initial value of mixing proportions. Entries should sum to
1. This determines number of components. If NULL, then |
beta |
Initial value of |
k |
Number of components. Ignored unless |
addintercept |
If TRUE, a column of ones is appended to the x matrix before the value of p is calculated. |
epsilon |
The convergence criterion. |
maxit |
The maximum number of iterations. |
verb |
If TRUE, then various updates are printed during each iteration of the algorithm. |
poisregmixEM
returns a list of class mixEM
with items:
x |
The predictor values. |
y |
The response values. |
lambda |
The final mixing proportions. |
beta |
The final Poisson regression coefficients. |
loglik |
The final log-likelihood. |
posterior |
An nxk matrix of posterior probabilities for observations. |
all.loglik |
A vector of each iteration's log-likelihood. |
restarts |
The number of times the algorithm restarted due to unacceptable choice of initial values. |
ft |
A character vector giving the name of the function. |
McLachlan, G. J. and Peel, D. (2000) Finite Mixture Models, John Wiley and Sons, Inc.
Wang, P., Puterman, M. L., Cockburn, I. and Le, N. (1996) Mixed Poisson Regression Models with Covariate Dependent Rates, Biometrics, 52(2), 381–400.
## EM output for data generated from a 2-component model. set.seed(100) beta <- matrix(c(1, .5, .7, -.8), 2, 2) x <- runif(50, 0, 10) xbeta <- cbind(1, x)%*%beta w <- rbinom(50, 1, .5) y <- w*rpois(50, exp(xbeta[, 1]))+(1-w)*rpois(50, exp(xbeta[, 2])) out <- poisregmixEM(y, x, verb = TRUE, epsilon = 1e-03) out
## EM output for data generated from a 2-component model. set.seed(100) beta <- matrix(c(1, .5, .7, -.8), 2, 2) x <- runif(50, 0, 10) xbeta <- cbind(1, x)%*%beta w <- rbinom(50, 1, .5) y <- w*rpois(50, exp(xbeta[, 1]))+(1-w)*rpois(50, exp(xbeta[, 2])) out <- poisregmixEM(y, x, verb = TRUE, epsilon = 1e-03) out
Returns a 2x2 matrix of plots summarizing the posterior intercept and slope terms in a mixture of random effects regression with arbitrarily many components.
post.beta(y, x, p.beta, p.z)
post.beta(y, x, p.beta, p.z)
y |
A list of N response trajectories with (possibly) varying dimensions of
length |
x |
A list of N predictor values of dimension |
p.beta |
A list of N 2xk matrices giving the posterior intercept and slope values from the output of an EM algorithm. |
p.z |
An Nxk matrix of posterior membership probabilities from the output of an EM algorithm. |
This is primarily used for within plot.mixEM
.
post.beta
returns a 2x2 matrix of plots giving:
(1 , 1)
|
The data plotted on the x-y axes with all posterior regression lines. |
(1 , 2)
|
The data plotted on the x-y axes with most probable posterior regression lines. |
(2 , 1)
|
A beta-space plot of all posterior regression coefficients. |
(1 , 1)
|
A beta-space plot of most probable posterior regression coefficients. |
Young, D. S. and Hunter, D. R. (2015) Random Effects Regression Mixtures for Analyzing Infant Habituation, Journal of Applied Statistics, 42(7), 1421–1441.
## Not run: ## EM output for simulated data from 2-component mixture of random effects. data(RanEffdata) set.seed(100) x <- lapply(1:length(RanEffdata), function(i) matrix(RanEffdata[[i]][, 2:3], ncol = 2)) x <- x[1:20] y <- lapply(1:length(RanEffdata), function(i) matrix(RanEffdata[[i]][, 1], ncol = 1)) y <- y[1:20] lambda <- c(0.45, 0.55) mu <- matrix(c(0, 4, 100, 12), 2, 2) sigma <- 2 R <- list(diag(1, 2), diag(1, 2)) em.out <- regmixEM.mixed(y, x, sigma = sigma, arb.sigma = FALSE, lambda = lambda, mu = mu, R = R, addintercept.random = FALSE, epsilon = 1e-02, verb = TRUE) ## Obtaining the 2x2 matrix of plots. x.ran <- lapply(1:length(x), function(i) x[[i]][, 2]) p.beta <- em.out$posterior.beta p.z <- em.out$posterior.z post.beta(y, x.ran, p.beta = p.beta, p.z = p.z) ## End(Not run)
## Not run: ## EM output for simulated data from 2-component mixture of random effects. data(RanEffdata) set.seed(100) x <- lapply(1:length(RanEffdata), function(i) matrix(RanEffdata[[i]][, 2:3], ncol = 2)) x <- x[1:20] y <- lapply(1:length(RanEffdata), function(i) matrix(RanEffdata[[i]][, 1], ncol = 1)) y <- y[1:20] lambda <- c(0.45, 0.55) mu <- matrix(c(0, 4, 100, 12), 2, 2) sigma <- 2 R <- list(diag(1, 2), diag(1, 2)) em.out <- regmixEM.mixed(y, x, sigma = sigma, arb.sigma = FALSE, lambda = lambda, mu = mu, R = R, addintercept.random = FALSE, epsilon = 1e-02, verb = TRUE) ## Obtaining the 2x2 matrix of plots. x.ran <- lapply(1:length(x), function(i) x[[i]][, 2]) p.beta <- em.out$posterior.beta p.z <- em.out$posterior.z post.beta(y, x.ran, p.beta = p.beta, p.z = p.z) ## End(Not run)
print
method for class mvnpEM
.
## S3 method for class 'mvnpEM' print(x, ...)
## S3 method for class 'mvnpEM' print(x, ...)
x |
an object of class |
... |
Additional arguments to |
print.mvnpEM
prints the elements of an mvnpEM
object
without printing the data or the posterior probabilities.
(These may still be accessed as x$data
and x$posteriors
.)
print.mvnpEM
returns (invisibly) the full value of x
itself,
including the data
and posteriors
elements.
mvnpEM
,
plot.mvnpEM
summary.mvnpEM
# Example as in Chauveau and Hoang (2015) with 6 coordinates ## Not run: m=2; r=6; blockid <-c(1,1,2,2,3,3) # 3 bivariate blocks # generate some data x ... a <- mvnpEM(x, mu0=2, blockid, samebw=F) # adaptive bandwidth print(a) ## End(Not run)
# Example as in Chauveau and Hoang (2015) with 6 coordinates ## Not run: m=2; r=6; blockid <-c(1,1,2,2,3,3) # 3 bivariate blocks # generate some data x ... a <- mvnpEM(x, mu0=2, blockid, samebw=F) # adaptive bandwidth print(a) ## End(Not run)
print
method for class npEM
.
## S3 method for class 'npEM' print(x, ...)
## S3 method for class 'npEM' print(x, ...)
x |
an object of class |
... |
Additional arguments to |
print.npEM
prints the elements of an npEM
object
without printing the data or the posterior probabilities.
(These may still be accessed as x$data
and x$posteriors
.)
print.npEM
returns (invisibly) the full value of x
itself,
including the data
and posteriors
elements.
data(Waterdata) set.seed(100) ## Not run: npEM(Waterdata[,3:10], 3, bw=4, verb=FALSE) # Assume indep but not iid
data(Waterdata) set.seed(100) ## Not run: npEM(Waterdata[,3:10], 3, bw=4, verb=FALSE) # Assume indep but not iid
This data set was generated from a 2-component mixture of regressions with random effects.
data(RanEffdata)
data(RanEffdata)
This data set consists of a list with 100 25x3 matrices. The first column is the response variable, the second column is a column of 1's and the last column is the predictor variable.
Produce a confidence or credible region for regression lines based on a sample of bootstrap beta values or posterior beta values. The beta parameters are the intercept and slope from a simple linear regression.
regcr(beta, x, em.beta = NULL, em.sigma = NULL, alpha = .05, nonparametric = FALSE, plot = FALSE, xyaxes = TRUE, ...)
regcr(beta, x, em.beta = NULL, em.sigma = NULL, alpha = .05, nonparametric = FALSE, plot = FALSE, xyaxes = TRUE, ...)
beta |
An nx2 matrix of regression parameters. The first column gives the intercepts and the second column gives the slopes. |
x |
An n-vector of the predictor variable which is necessary when nonparametric = TRUE. |
em.beta |
The estimates for beta required when obtaining confidence regions. This is required for performing the standardization necessary when obtaining nonparametric confidence regions. |
em.sigma |
The estimates for the regression standard deviation required when obtaining confidence regions. This is required for performing the standardization necessary when obtaining nonparametric confidence regions. |
alpha |
The proportion of the beta sample to remove. In other words, 1-alpha is the level of the credible region. |
nonparametric |
If nonparametric = TRUE, then the region is based on the convex hull of the remaining beta after trimming, which is accomplished using a data depth technique. If nonparametric = FALSE, then the region is based on the asymptotic normal approximation. |
plot |
If plot = TRUE, lines are added to the existing plot. The type of plot created depends on the value of xyaxes. |
xyaxes |
If xyaxes = TRUE and plot = TRUE, then a confidence or credible region for the regression lines is plotted on the x-y axes, presumably overlaid on a scatterplot of the data. If xyaxes = FALSE and plot = TRUE, the (convex) credible region for the regression line is plotted on the beta, or intercept-slope, axes, presumably overlaid on a scatterplot of beta. |
... |
Graphical parameters passed to |
regcr
returns a list containing the following items:
boundary |
A matrix of points in beta, or intercept-slope, space arrayed along the boundary of the confidence or credible region. |
upper |
A matrix of points in x-y space arrayed along the upper confidence or credible limit for the regression line. |
lower |
A matrix of points in x-y space arrayed along the lower confidence or credible limit for the regression line. |
## Nonparametric credible regions fit to NOdata. data(NOdata) attach(NOdata) set.seed(100) beta <- matrix(c(1.3, -0.1, 0.6, 0.1), 2, 2) sigma <- c(.02, .05) MH.out <- regmixMH(Equivalence, NO, beta = beta, s = sigma, sampsize = 2500, omega = .0013) attach(data.frame(MH.out$theta)) beta.c1 <- cbind(beta0.1[2400:2499], beta1.1[2400:2499]) beta.c2 <- cbind(beta0.2[2400:2499], beta1.2[2400:2499]) plot(NO, Equivalence) regcr(beta.c1, x = NO, nonparametric = TRUE, plot = TRUE, col = 2) regcr(beta.c2, x = NO, nonparametric = TRUE, plot = TRUE, col = 3)
## Nonparametric credible regions fit to NOdata. data(NOdata) attach(NOdata) set.seed(100) beta <- matrix(c(1.3, -0.1, 0.6, 0.1), 2, 2) sigma <- c(.02, .05) MH.out <- regmixMH(Equivalence, NO, beta = beta, s = sigma, sampsize = 2500, omega = .0013) attach(data.frame(MH.out$theta)) beta.c1 <- cbind(beta0.1[2400:2499], beta1.1[2400:2499]) beta.c2 <- cbind(beta0.2[2400:2499], beta1.2[2400:2499]) plot(NO, Equivalence) regcr(beta.c1, x = NO, nonparametric = TRUE, plot = TRUE, col = 2) regcr(beta.c2, x = NO, nonparametric = TRUE, plot = TRUE, col = 3)
Returns EM algorithm output for mixtures of multiple regressions with arbitrarily many components.
regmixEM(y, x, lambda = NULL, beta = NULL, sigma = NULL, k = 2, addintercept = TRUE, arbmean = TRUE, arbvar = TRUE, epsilon = 1e-08, maxit = 10000, verb = FALSE)
regmixEM(y, x, lambda = NULL, beta = NULL, sigma = NULL, k = 2, addintercept = TRUE, arbmean = TRUE, arbvar = TRUE, epsilon = 1e-08, maxit = 10000, verb = FALSE)
y |
An n-vector of response values. |
x |
An nxp matrix of predictors. See |
lambda |
Initial value of mixing proportions. Entries should sum to
1. This determines number of components. If NULL, then |
beta |
Initial value of |
sigma |
A vector of standard deviations. If NULL, then 1/ |
k |
Number of components. Ignored unless all of |
addintercept |
If TRUE, a column of ones is appended to the x matrix before the value of p is calculated. |
arbmean |
If TRUE, each mixture component is assumed to have a different set of regression coefficients
(i.e., the |
arbvar |
If TRUE, each mixture component is assumed to have a different |
epsilon |
The convergence criterion. |
maxit |
The maximum number of iterations. |
verb |
If TRUE, then various updates are printed during each iteration of the algorithm. |
regmixEM
returns a list of class mixEM
with items:
x |
The set of predictors (which includes a column of 1's if |
y |
The response values. |
lambda |
The final mixing proportions. |
beta |
The final regression coefficients. |
sigma |
The final standard deviations. If |
scale |
If |
loglik |
The final log-likelihood. |
posterior |
An nxk matrix of posterior probabilities for observations. |
all.loglik |
A vector of each iteration's log-likelihood. |
restarts |
The number of times the algorithm restarted due to unacceptable choice of initial values. |
ft |
A character vector giving the name of the function. |
de Veaux, R. D. (1989), Mixtures of Linear Regressions, Computational Statistics and Data Analysis 8, 227-245.
Hurn, M., Justel, A. and Robert, C. P. (2003) Estimating Mixtures of Regressions, Journal of Computational and Graphical Statistics 12(1), 55–79.
McLachlan, G. J. and Peel, D. (2000) Finite Mixture Models, John Wiley and Sons, Inc.
## EM output for NOdata. data(NOdata) attach(NOdata) set.seed(100) em.out <- regmixEM(Equivalence, NO, verb = TRUE, epsilon = 1e-04) em.out[3:6]
## EM output for NOdata. data(NOdata) attach(NOdata) set.seed(100) em.out <- regmixEM(Equivalence, NO, verb = TRUE, epsilon = 1e-04) em.out[3:6]
Returns output for one step of an EM algorithm output for mixtures of multiple regressions where the mixing proportions are estimated locally.
regmixEM.lambda(y, x, lambda = NULL, beta = NULL, sigma = NULL, k = 2, addintercept = TRUE, arbmean = TRUE, arbvar = TRUE, epsilon = 1e-8, maxit = 10000, verb = FALSE)
regmixEM.lambda(y, x, lambda = NULL, beta = NULL, sigma = NULL, k = 2, addintercept = TRUE, arbmean = TRUE, arbvar = TRUE, epsilon = 1e-8, maxit = 10000, verb = FALSE)
y |
An n-vector of response values. |
x |
An nxp matrix of predictors. See |
lambda |
An nxk matrix of initial local values of mixing proportions.
Entries should sum to 1. This determines number of components.
If NULL, then |
beta |
Initial value of |
sigma |
k-vector of initial global values of standard deviations.
If NULL, then |
k |
The number of components. Ignored unless all of |
addintercept |
If TRUE, a column of ones is appended to the x matrix before the value of p is calculated. |
arbmean |
If TRUE, each mixture component is assumed to have a different set of regression coefficients
(i.e., the |
arbvar |
If TRUE, each mixture component is assumed to have a different |
epsilon |
The convergence criterion. |
maxit |
The maximum number of iterations. |
verb |
If TRUE, then various updates are printed during each iteration of the algorithm. |
Primarily used within regmixEM.loc
.
regmixEM.lambda
returns a list of class mixEM
with items:
x |
The set of predictors (which includes a column of 1's if |
y |
The response values. |
lambda |
The inputted mixing proportions. |
beta |
The final regression coefficients. |
sigma |
The final standard deviations. If |
scale |
If |
loglik |
The final log-likelihood. |
posterior |
An nxk matrix of posterior probabilities for observations. |
all.loglik |
A vector of each iteration's log-likelihood. |
restarts |
The number of times the algorithm restarted due to unacceptable choice of initial values. |
ft |
A character vector giving the name of the function. |
## Compare a 2-component and 3-component fit to NOdata. data(NOdata) attach(NOdata) set.seed(100) out1 <- regmixEM.lambda(Equivalence, NO) out2 <- regmixEM.lambda(Equivalence, NO, k = 3) c(out1$loglik, out2$loglik)
## Compare a 2-component and 3-component fit to NOdata. data(NOdata) attach(NOdata) set.seed(100) out1 <- regmixEM.lambda(Equivalence, NO) out2 <- regmixEM.lambda(Equivalence, NO, k = 3) c(out1$loglik, out2$loglik)
Iterative algorithm returning EM algorithm output for mixtures of multiple regressions where the mixing proportions are estimated locally.
regmixEM.loc(y, x, lambda = NULL, beta = NULL, sigma = NULL, k = 2, addintercept = TRUE, kern.l = c("Gaussian", "Beta", "Triangle", "Cosinus", "Optcosinus"), epsilon = 1e-08, maxit = 10000, kernl.g = 0, kernl.h = 1, verb = FALSE)
regmixEM.loc(y, x, lambda = NULL, beta = NULL, sigma = NULL, k = 2, addintercept = TRUE, kern.l = c("Gaussian", "Beta", "Triangle", "Cosinus", "Optcosinus"), epsilon = 1e-08, maxit = 10000, kernl.g = 0, kernl.h = 1, verb = FALSE)
y |
An n-vector of response values. |
x |
An nxp matrix of predictors. See |
lambda |
An nxk matrix of initial local values of mixing proportions.
Entries should sum to 1. This determines number of components.
If NULL, then |
beta |
Initial global values of |
sigma |
A k-vector of initial global values of standard deviations.
If NULL, then |
k |
Number of components. Ignored unless all of |
addintercept |
If TRUE, a column of ones is appended to the x matrix before the value of p is calculated. |
kern.l |
The type of kernel to use in the local estimation of |
epsilon |
The convergence criterion. |
maxit |
The maximum number of iterations. |
kernl.g |
A shape parameter required for the symmetric beta kernel for local estimation of |
kernl.h |
The bandwidth controlling the size of the window used in the local estimation of lambda around x. |
verb |
If TRUE, then various updates are printed during each iteration of the algorithm. |
regmixEM.loc
returns a list of class mixEM
with items:
x |
The set of predictors (which includes a column of 1's if |
y |
The response values. |
lambda.x |
The final local mixing proportions. |
beta |
The final global regression coefficients. |
sigma |
The final global standard deviations. |
loglik |
The final log-likelihood. |
posterior |
An nxk matrix of posterior probabilities for observations. |
all.loglik |
A vector of each iteration's log-likelihood. |
restarts |
The number of times the algorithm restarted due to unacceptable choice of initial values. |
ft |
A character vector giving the name of the function. |
## Compare a 2-component and 3-component fit to NOdata. data(NOdata) attach(NOdata) set.seed(100) out1 <- regmixEM.loc(Equivalence, NO, kernl.h = 2, epsilon = 1e-02, verb = TRUE) out2 <- regmixEM.loc(Equivalence, NO, kernl.h = 2, k = 3, epsilon = 1e-02, verb = TRUE) c(out1$loglik, out2$loglik)
## Compare a 2-component and 3-component fit to NOdata. data(NOdata) attach(NOdata) set.seed(100) out1 <- regmixEM.loc(Equivalence, NO, kernl.h = 2, epsilon = 1e-02, verb = TRUE) out2 <- regmixEM.loc(Equivalence, NO, kernl.h = 2, k = 3, epsilon = 1e-02, verb = TRUE) c(out1$loglik, out2$loglik)
Returns EM algorithm output for mixtures of multiple regressions with random effects and an option to incorporate fixed effects and/or AR(1) errors.
regmixEM.mixed(y, x, w = NULL, sigma = NULL, arb.sigma = TRUE, alpha = NULL, lambda = NULL, mu = NULL, rho = NULL, R = NULL, arb.R = TRUE, k = 2, ar.1 = FALSE, addintercept.fixed = FALSE, addintercept.random = TRUE, epsilon = 1e-08, maxit = 10000, verb = FALSE)
regmixEM.mixed(y, x, w = NULL, sigma = NULL, arb.sigma = TRUE, alpha = NULL, lambda = NULL, mu = NULL, rho = NULL, R = NULL, arb.R = TRUE, k = 2, ar.1 = FALSE, addintercept.fixed = FALSE, addintercept.random = TRUE, epsilon = 1e-08, maxit = 10000, verb = FALSE)
y |
A list of N response trajectories with (possibly) varying dimensions of
length |
x |
A list of N design matrices of dimensions |
w |
A list of N known explanatory variables having dimensions |
sigma |
A vector of standard deviations. If NULL, then |
arb.sigma |
If TRUE, then |
alpha |
A q-vector of unknown regression parameters for the fixed effects. If NULL and |
lambda |
Initial value of mixing proportions for the assumed mixture structure on the regression coefficients.
Entries should sum to 1. This determines number of components. If NULL, then |
mu |
A pxk matrix of the mean for the mixture components of the random regression coefficients. If NULL, then the columns
of |
rho |
An Nxk matrix giving initial values for the correlation term in an AR(1) process. If NULL, then these values are simulated from a uniform distribution on the interval (-1, 1). |
R |
A list of N pxp covariance matrices for the mixture components of the random regression coefficients. If NULL, then each matrix is random from a standard Wishart distribution according to a binning method done on the data. |
arb.R |
If TRUE, then |
k |
Number of components. Ignored unless |
ar.1 |
If TRUE, then an AR(1) process on the error terms is included. The default is FALSE. |
addintercept.fixed |
If TRUE, a column of ones is appended to the matrices in w. |
addintercept.random |
If TRUE, a column of ones is appended to the matrices in x before p is calculated. |
epsilon |
The convergence criterion. |
maxit |
The maximum number of iterations. |
verb |
If TRUE, then various updates are printed during each iteration of the algorithm. |
regmixEM
returns a list of class mixEM
with items:
x |
The predictor values corresponding to the random effects. |
y |
The response values. |
w |
The predictor values corresponding to the (optional) fixed effects. |
lambda |
The final mixing proportions. |
mu |
The final mean vectors. |
R |
The final covariance matrices. |
sigma |
The final component error standard deviations. |
alpha |
The final regression coefficients for the fixed effects. |
rho |
The final error correlation values if an AR(1) process is included. |
loglik |
The final log-likelihood. |
posterior.z |
An Nxk matrix of posterior membership probabilities. |
posterior.beta |
A list of N pxk matrices giving the posterior regression coefficient values. |
all.loglik |
A vector of each iteration's log-likelihood. |
restarts |
The number of times the algorithm restarted due to unacceptable choice of initial values. |
ft |
A character vector giving the name of the function. |
Xu, W. and Hedeker, D. (2001) A Random-Effects Mixture Model for Classifying Treatment Response in Longitudinal Clinical Trials, Journal of Biopharmaceutical Statistics, 11(4), 253–273.
Young, D. S. and Hunter, D. R. (2015) Random Effects Regression Mixtures for Analyzing Infant Habituation, Journal of Applied Statistics, 42(7), 1421–1441.
## EM output for simulated data from 2-component mixture of random effects. data(RanEffdata) set.seed(100) x <- lapply(1:length(RanEffdata), function(i) matrix(RanEffdata[[i]][, 2:3], ncol = 2)) x <- x[1:20] y <- lapply(1:length(RanEffdata), function(i) matrix(RanEffdata[[i]][, 1], ncol = 1)) y <- y[1:20] lambda <- c(0.45, 0.55) mu <- matrix(c(0, 4, 100, 12), 2, 2) sigma <- 2 R <- list(diag(1, 2), diag(1, 2)) em.out <- regmixEM.mixed(y, x, sigma = sigma, arb.sigma = FALSE, lambda = lambda, mu = mu, R = R, addintercept.random = FALSE, epsilon = 1e-02, verb = TRUE) em.out[4:10]
## EM output for simulated data from 2-component mixture of random effects. data(RanEffdata) set.seed(100) x <- lapply(1:length(RanEffdata), function(i) matrix(RanEffdata[[i]][, 2:3], ncol = 2)) x <- x[1:20] y <- lapply(1:length(RanEffdata), function(i) matrix(RanEffdata[[i]][, 1], ncol = 1)) y <- y[1:20] lambda <- c(0.45, 0.55) mu <- matrix(c(0, 4, 100, 12), 2, 2) sigma <- 2 R <- list(diag(1, 2), diag(1, 2)) em.out <- regmixEM.mixed(y, x, sigma = sigma, arb.sigma = FALSE, lambda = lambda, mu = mu, R = R, addintercept.random = FALSE, epsilon = 1e-02, verb = TRUE) em.out[4:10]
Return Metropolis-Hastings (M-H) algorithm output for mixtures of multiple regressions with arbitrarily many components.
regmixMH(y, x, lambda = NULL, beta = NULL, s = NULL, k = 2, addintercept = TRUE, mu = NULL, sig = NULL, lam.hyp = NULL, sampsize = 1000, omega = 0.01, thin = 1)
regmixMH(y, x, lambda = NULL, beta = NULL, s = NULL, k = 2, addintercept = TRUE, mu = NULL, sig = NULL, lam.hyp = NULL, sampsize = 1000, omega = 0.01, thin = 1)
y |
An n-vector of response values. |
x |
An nxp matrix of predictors. See |
lambda |
Initial value of mixing proportions. Entries should sum to
1. This determines number of components. If NULL, then |
beta |
Initial value of |
s |
k-vector of standard deviations. If NULL, then |
k |
Number of components. Ignored unless all of |
addintercept |
If TRUE, a column of ones is appended to the x matrix before the value of p is calculated. |
mu |
The prior hyperparameter of same size as |
sig |
The prior hyperparameter of same size as |
lam.hyp |
The prior hyperparameter of length |
sampsize |
Size of posterior sample returned. |
omega |
Multiplier of step size to control M-H acceptance rate. Values closer to zero result in higher acceptance rates, generally. |
thin |
Lag between parameter vectors that will be kept. |
regmixMH
returns a list of class mixMCMC
with items:
x |
A nxp matrix of the predictors. |
y |
A vector of the responses. |
theta |
A ( |
k |
The number of components. |
Hurn, M., Justel, A. and Robert, C. P. (2003) Estimating Mixtures of Regressions, Journal of Computational and Graphical Statistics 12(1), 55–79.
## M-H algorithm for NOdata with acceptance rate about 40%. data(NOdata) attach(NOdata) set.seed(100) beta <- matrix(c(1.3, -0.1, 0.6, 0.1), 2, 2) sigma <- c(.02, .05) MH.out <- regmixMH(Equivalence, NO, beta = beta, s = sigma, sampsize = 2500, omega = .0013) MH.out$theta[2400:2499,]
## M-H algorithm for NOdata with acceptance rate about 40%. data(NOdata) attach(NOdata) set.seed(100) beta <- matrix(c(1.3, -0.1, 0.6, 0.1), 2, 2) sigma <- c(.02, .05) MH.out <- regmixMH(Equivalence, NO, beta = beta, s = sigma, sampsize = 2500, omega = .0013) MH.out$theta[2400:2499,]
Assess the number of components in a mixture of regressions model using the Akaike's information criterion (AIC), Schwartz's Bayesian information criterion (BIC), Bozdogan's consistent AIC (CAIC), and Integrated Completed Likelihood (ICL).
regmixmodel.sel(x, y, w = NULL, k = 2, type = c("fixed", "random", "mixed"), ...)
regmixmodel.sel(x, y, w = NULL, k = 2, type = c("fixed", "random", "mixed"), ...)
x |
An nxp matrix (or list) of predictors. If an intercept is required, then |
y |
An n-vector (or list) of response values. |
w |
An optional list of fixed effects predictors for type "mixed" or "random". |
k |
The maximum number of components to assess. |
type |
The type of regression mixture to use. If "fixed", then a mixture of regressions with fixed effects will be used. If "random", then a mixture of regressions where the random effects regression coefficients are assumed to come from a mixture will be used. If "mixed", the mixture structure used is the same as "random", except a coefficient of fixed effects is also assumed. |
... |
Additional arguments passed to the EM algorithm used for calculating the type of regression mixture specified
in |
regmixmodel.sel
returns a matrix of the AIC, BIC, CAIC, and ICL values along with the winner (i.e., the highest
value given by the model selection criterion) for various types of regression mixtures.
Biernacki, C., Celeux, G. and Govaert, G. (2000) Assessing a Mixture Model for Clustering with the Integrated Completed Likelihood, IEEE Transactions on Pattern Analysis and Machine Intelligence 22(7), 719–725.
Bozdogan, H. (1987) Model Selection and Akaike's Information Criterion (AIC): The General Theory and its Analytical Extensions, Psychometrika 52, 345–370.
## Assessing the number of components for NOdata. data(NOdata) attach(NOdata) set.seed(100) regmixmodel.sel(x = NO, y = Equivalence, k = 3, type = "fixed")
## Assessing the number of components for NOdata. data(NOdata) attach(NOdata) set.seed(100) regmixmodel.sel(x = NO, y = Equivalence, k = 3, type = "fixed")
Returns EM algorithm output for mixtures of normals with repeated measurements and arbitrarily many components.
repnormmixEM(x, lambda = NULL, mu = NULL, sigma = NULL, k = 2, arbmean = TRUE, arbvar = TRUE, epsilon = 1e-08, maxit = 10000, verb = FALSE)
repnormmixEM(x, lambda = NULL, mu = NULL, sigma = NULL, k = 2, arbmean = TRUE, arbvar = TRUE, epsilon = 1e-08, maxit = 10000, verb = FALSE)
x |
An mxn matrix of data. The columns correspond to the subjects and the rows correspond to the repeated measurements. |
lambda |
Initial value of mixing proportions. Entries should sum to
1. This determines number of components. If NULL, then |
mu |
A k-vector of component means. If NULL, then |
sigma |
A vector of standard deviations. If NULL, then |
k |
Number of components. Ignored unless all of |
arbmean |
If TRUE, then the component densities are allowed to have different |
arbvar |
If TRUE, then the component densities are allowed to have different |
epsilon |
The convergence criterion. |
maxit |
The maximum number of iterations. |
verb |
If TRUE, then various updates are printed during each iteration of the algorithm. |
repnormmixEM
returns a list of class mixEM
with items:
x |
The raw data. |
lambda |
The final mixing proportions. |
mu |
The final mean parameters. |
sigma |
The final standard deviations. If |
scale |
If |
loglik |
The final log-likelihood. |
posterior |
An nxk matrix of posterior probabilities for observations. |
all.loglik |
A vector of each iteration's log-likelihood. |
restarts |
The number of times the algorithm restarted due to unacceptable choice of initial values. |
ft |
A character vector giving the name of the function. |
Hettmansperger, T. P. and Thomas, H. (2000) Almost Nonparametric Inference for Repeated Measures in Mixture Models, Journal of the Royals Statistical Society, Series B 62(4) 811–825.
## EM output for the water-level task data set. data(Waterdata) set.seed(100) water <- t(as.matrix(Waterdata[,3:10])) em.out <- repnormmixEM(water, k = 2, verb = TRUE, epsilon = 1e-03) em.out
## EM output for the water-level task data set. data(Waterdata) set.seed(100) water <- t(as.matrix(Waterdata[,3:10])) em.out <- repnormmixEM(water, k = 2, verb = TRUE, epsilon = 1e-03) em.out
Assess the number of components in a mixture model with normal components and repeated measures using the Akaike's information criterion (AIC), Schwartz's Bayesian information criterion (BIC), Bozdogan's consistent AIC (CAIC), and Integrated Completed Likelihood (ICL).
repnormmixmodel.sel(x, k = 2, ...)
repnormmixmodel.sel(x, k = 2, ...)
x |
An mxn matrix of observations. The rows correspond to the repeated measures and the columns correspond to the subject. |
k |
The maximum number of components to assess. |
... |
Additional arguments passed to |
repnormmixmodel.sel
returns a matrix of the AIC, BIC, CAIC, and ICL values along with the winner (i.e., the highest
value given by the model selection criterion) for a mixture of normals with repeated measures.
Biernacki, C., Celeux, G., and Govaert, G. (2000). Assessing a Mixture Model for Clustering with the Integrated Completed Likelihood. IEEE Transactions on Pattern Analysis and Machine Intelligence, 22(7):719-725.
Bozdogan, H. (1987). Model Selection and Akaike's Information Criterion (AIC): The General Theory and its Analytical Extensions. Psychometrika, 52:345-370.
## Assessing the number of components for the water-level task data set. data(Waterdata) water<-t(as.matrix(Waterdata[,3:10])) set.seed(100) out <- repnormmixmodel.sel(water, k = 3, epsilon = 5e-01) out
## Assessing the number of components for the water-level task data set. data(Waterdata) water<-t(as.matrix(Waterdata[,3:10])) set.seed(100) out <- repnormmixmodel.sel(water, k = 3, epsilon = 5e-01) out
Simulate from a mixture of univariate exponential distributions.
rexpmix(n, lambda = 1, rate = 1)
rexpmix(n, lambda = 1, rate = 1)
n |
Number of cases to simulate. |
lambda |
Vector of mixture probabilities, with length equal to |
rate |
Vector of component rates. |
rexpmix
returns an -vector sampled from an
-component
mixture of univariate exponential distributions.
rnormmix
, rmvnormmix
for Gaussian mixtures,
rweibullmix
for mixture of Weibull distributions.
## Generate data from a 2-component mixture of exponentials. n=300 # sample size m=2 # nb components lambda=c(1/3, 2/3); rate = c(1,1/10) # parameters set.seed(1234) x <- rexpmix(n, lambda, rate) # iid ~ exp mixture ## histogram of the simulated data. hist(x, col=8)
## Generate data from a 2-component mixture of exponentials. n=300 # sample size m=2 # nb components lambda=c(1/3, 2/3); rate = c(1,1/10) # parameters set.seed(1234) x <- rexpmix(n, lambda, rate) # iid ~ exp mixture ## histogram of the simulated data. hist(x, col=8)
Simulate from a multiviate normal distribution
rmvnorm(n, mu=NULL, sigma=NULL)
rmvnorm(n, mu=NULL, sigma=NULL)
n |
Number of vectors to simulate |
mu |
mean vector |
sigma |
covariance matrix, assumed symmetric and nonnegative definite |
This function uses an eigen
decomposition assuming sigma
is symmetric.
In particular, the upper triangle of sigma
is ignored.
An matrix in which each row is an independently
generated realization from the desired multivariate normal distribution
Simulate from a mixture of multivariate zero-correlation normal distributions
rmvnormmix(n, lambda=1, mu=0, sigma=1)
rmvnormmix(n, lambda=1, mu=0, sigma=1)
n |
Number of cases to simulate. |
lambda |
Vector of mixture probabilities with length equal to |
mu |
Matrix of means of dimensions |
sigma |
Matrix of standard deviations, same dimensions as |
It is possible to generate univariate standard normal random variables using the default values (but why bother?). The case of conditionally iid coordinates is covered by the situation in which all columns in mu and sigma are identical.
rmvnormmix
returns an matrix in which each row is
a sample from one of the components of a mixture of zero-correlation
multivariate normals. The mixture structure
induces nonzero correlations among the coordinates.
##Generate data from a 2-component mixture of trivariate normals. set.seed(100) n <- 200 lambda <- rep(1, 2)/2 mu <- matrix(2*(1:6), 2, 3) sigma <- matrix(1,2,3) mydata<-rmvnormmix(n, lambda, mu, sigma) ## Now check to see if we can estimate mixture densities well: title <- paste("Does this resemble N(", mu[1,], ",1) and N(", mu[2,],",1)?", sep="") plot(npEM(mydata, 2), title=title)
##Generate data from a 2-component mixture of trivariate normals. set.seed(100) n <- 200 lambda <- rep(1, 2)/2 mu <- matrix(2*(1:6), 2, 3) sigma <- matrix(1,2,3) mydata<-rmvnormmix(n, lambda, mu, sigma) ## Now check to see if we can estimate mixture densities well: title <- paste("Does this resemble N(", mu[1,], ",1) and N(", mu[2,],",1)?", sep="") plot(npEM(mydata, 2), title=title)
Simulate from a mixture of univariate normal distributions.
rnormmix(n, lambda=1, mu=0, sigma=1)
rnormmix(n, lambda=1, mu=0, sigma=1)
n |
Number of cases to simulate. |
lambda |
Vector of mixture probabilities, with length equal to |
mu |
Vector of means. |
sigma |
Vector of standard deviations. |
This function simply calls rmvnormmix
.
rnormmix
returns an -vector sampled from an
-component
mixture of univariate normal distributions.
##Generate data from a 2-component mixture of normals. set.seed(100) n <- 500 lambda <- rep(1, 2)/2 mu <- c(0, 5) sigma <- rep(1, 2) mixnorm.data <- rnormmix(n, lambda, mu, sigma) ##A histogram of the simulated data. hist(mixnorm.data)
##Generate data from a 2-component mixture of normals. set.seed(100) n <- 500 lambda <- rep(1, 2)/2 mu <- c(0, 5) sigma <- rep(1, 2) mixnorm.data <- rnormmix(n, lambda, mu, sigma) ##A histogram of the simulated data. hist(mixnorm.data)
This data set involves assessing children longitudinally at 6 age points from ages 4 through 18 years for the rod and frame task. This task sits the child in a darkened room in front of a luminous square frame tilted at 28 degrees on its axis to the left or right. Centered inside the frame was a luminous rod also tilted 28 degrees to the left or right. The child's task was to adjust the rod to the vertical position and the absolute deviation from the vertical (in degrees) was the measured response.
data(RodFramedata)
data(RodFramedata)
This data frame consists of 140 children (the rows). Column 1 is the subject number and column 2 is the sex (0=MALE and 1=FEMALE). Columns 3 through 26 give the 8 responses at each of the ages 4, 5, and 7. Columns 27 through 56 give the 10 responses at each of the ages 11, 14, and 18. A value of 99 denotes missing data.
Thomas, H. and Dahlin, M. P. (2005) Individual Development and Latent Groups: Analytical Tools for Interpreting Heterogeneity, Developmental Review 25(2), 133–154.
This data set involves normally developing children 9 years of age presented with two visual simuli on a computer monitor. The left image is the target stimuli and on the right is either an exact copy or a mirror image of the target stimuli. The child must press one key if it is a copy or another key if it is a mirror image. The data consists of the reaction times (RT) of the 197 children who provided correct responses for all 6 task trials.
data(RTdata)
data(RTdata)
This data frame consists of 197 children (the rows) and their 6 responses (the columns) to the stimulus presented. The response (RT) is recorded in milliseconds.
Cruz-Medina, I. R., Hettmansperger, T. P. and Thomas, H. (2004) Semiparametric Mixture Models and Repeated Measures: The Multinomial Cut Point Model, Applied Statistics 53(3), 463–474.
Miller, C. A., Kail, R., Leonard, L. B. and Tomblin, J. B. (2001) Speed of Processing in Children with Specific Language Impairment, Journal of Speech, Language, and Hearing Research 44(2), 416–433.
This data set involves normally developing children 9 years of age presented visual simuli on a computer monitor. There are three different experimental conditions, according to the length of the delay after which the stimulus was displayed on the screen. Each subject experienced each condition eight times, and these 24 trials were given in random order. These data give the 82 children for whom there are complete measurements among over 200 total subjects.
data(RTdata2)
data(RTdata2)
This data frame consists of 82 children (the rows) and their 24 responses (the columns) to the stimulus presented. The response is recorded in milliseconds. The columns are not in the order in which the stimuli were presented to the children; rather, they are arranged into three blocks of eight columns each so that each eight-column block contains only trials from one of the three conditions.
Miller, C. A., Kail, R., Leonard, L. B. and Tomblin, J. B. (2001) Speed of Processing in Children with Specific Language Impairment, Journal of Speech, Language, and Hearing Research 44(2), 416–433.
Simulate from a mixture of univariate Weibull distributions.
rweibullmix(n, lambda = 1, shape = 1, scale = 1)
rweibullmix(n, lambda = 1, shape = 1, scale = 1)
n |
Number of cases to simulate. |
lambda |
Vector of mixture probabilities, with length equal to |
shape |
Vector of component shapes. |
scale |
Vector of component scales. |
rexpmix
returns an -vector sampled from an
-component
mixture of univariate Weibull distributions.
rnormmix
and rmvnormmix
for Gaussian mixtures,
rexpmix
for mixture of exponentials.
n = 500 # sample size m = 2 # nb components lambda=c(0.4, 0.6) shape <- c(0.5,5); scale <- c(1,20) # model parameters set.seed(321) x <- rweibullmix(n, lambda, shape, scale) # iid ~ weibull mixture ## histogram of the simulated data. hist(x, col=8)
n = 500 # sample size m = 2 # nb components lambda=c(0.4, 0.6) shape <- c(0.5,5); scale <- c(1,20) # model parameters set.seed(321) x <- rweibullmix(n, lambda, shape, scale) # iid ~ weibull mixture ## histogram of the simulated data. hist(x, col=8)
Returns ECM algorithm output for mixtures of multiple regressions with changepoints and arbitrarily many components.
segregmixEM(y, x, lambda = NULL, beta = NULL, sigma = NULL, k = 2, seg.Z, psi, psi.locs = NULL, delta = NULL, epsilon = 1e-08, maxit = 10000, verb = FALSE, max.restarts = 15)
segregmixEM(y, x, lambda = NULL, beta = NULL, sigma = NULL, k = 2, seg.Z, psi, psi.locs = NULL, delta = NULL, epsilon = 1e-08, maxit = 10000, verb = FALSE, max.restarts = 15)
y |
An n-vector of response values. |
x |
An nxp matrix of predictors. Note that this model assumes the presence of an intercept. |
lambda |
Initial value of mixing proportions. Entries should sum to
1. This determines number of components. If NULL, then |
beta |
Initial value of |
sigma |
A vector of standard deviations. If NULL, then 1/ |
k |
Number of components. Ignored unless all of |
seg.Z |
A list of length |
psi |
A kxp matrix specifying the number of changepoints for each predictor in each component. See below for more details. |
psi.locs |
A list of length |
delta |
An optional list of values quantifying the amount of separation at each changepoint if assuming discontinuities at the changepoints. This has the same dimensions as |
epsilon |
The convergence criterion. |
maxit |
The maximum number of iterations. |
verb |
If TRUE, then various updates are printed during each iteration of the algorithm. |
max.restarts |
The number of times to try restarting the ECM algorithm if estimation problems occur - such as choice of poor initial values or a poorly chosen changepoint structure. |
seg.Z
is defined as a list of right-hand side linear model formulas that are used to identify which predictors have changepoints in each component. For example, suppose you have a dataframe with three predictors: V1
, V2
, V3
. Suppose now that you wish to model a 3-component mixture of regressions with changepoints structure such that the first component has changepoints in V1 and V2, the second component has changepoints in V3
, and the third component has no changepoints. Then you would define seg.Z = list(~V1+V2, ~V3, NULL)
. Note that you MUST place the variables in order with respect to how they appear in the predictor matrix x
.
psi
is a kxp matrix specifying the number of changepoints for each predictor in each component. For the example given above, suppose there are three changepoints for V1
, two changepoints for V2
, and four changepoints for V3
. Then you would define psi = rbind(c(3, 2, 0), c(0, 0, 4), c(0, 0, 0))
.
psi.locs
is a list of length k whose elements give the initial locations of the changepoints for each component. Each element of the list must have length equal to the total number of changepoints for that component's regression equation. For the example given above, in component 1, assume that the three changepoints for V1
are at 3, 7, and 10 and the two changepoints for V1
are at 5, 20, and 30. In component 2, assume that the four changepoints for V3
are at 2, 4, 6, and 8. Then you would define psi.locs = list(c(3, 7, 10, 5, 20, 30), c(2, 4, 6, 8), NULL)
. Note that the order of the changepoints is determined by first sorting the predictors by how they appear in the formulas in seg.Z
and then sorting in increasing order within each predictor.
segregmixEM
returns a list of class segregmixEM
with items:
x |
The set of predictors. |
y |
The response values. |
lambda |
The final mixing proportions. |
beta |
The final regression coefficients. |
sigma |
The final standard deviations. |
seg.Z |
The list of right-hand side formulas as defined by the user. |
psi.locs |
A list of length k with the final estimates for the changepoint locations. |
delta |
A list of the delta values that were optionally specified by the user. |
loglik |
The final log-likelihood. |
posterior |
An nxk matrix of posterior probabilities for observations. |
all.loglik |
A vector of each iteration's log-likelihood. |
restarts |
The number of times the algorithm restarted due to unacceptable choice of initial values. |
ft |
A character vector giving the name of the function. |
As of version 0.4.6, this more general function has replaced the now defunct regmixEM.chgpt
and associated internal functions.
Young, D. S. (2014) Mixtures of Regressions with Changepoints, Statistics and Computing, 24(2), 265–281.
## Not run: ## Simulated example. set.seed(100) x <- 1:20 y1 <- 3 + x + rnorm(20) y2 <- 3 - x - 5*(x - 15)*(x > 15) + rnorm(20) y <- c(y1, y2) x <- c(x, x) set.seed(100) be <- list(c(3, -1, -5), c(3, 1)) s <- c(1, 1) psi.locs <- list(comp.1 = list(x = 15), comp.2 = NULL) out <- segregmixEM(y, cbind(1,x), verb = TRUE, k = 2, beta = be, sigma = s, lambda = c(1, 1)/2, seg.Z = list(~x, NULL), psi = rbind(1, 0), psi.locs = psi.locs, epsilon = 0.9) z <- seq(0, 21, len = 40) plot(x, y, col = apply(out$post, 1, which.max) + 1, pch = 19, cex.lab = 1.4, cex = 1.4) b <- out$beta d <- out$psi.locs lines(z, b[[1]][1] + b[[1]][2] * z + b[[1]][3] * (z - d[[1]][[1]]) * (z > d[[1]][[1]]) , col = 2, lwd = 2) lines(z, b[[2]][1] + b[[2]][2] * z, col = 3, lwd = 2) abline(v = out$psi.locs[[1]][1], col = 2, lty = 2) ## End(Not run) ## Not run: ## Example using the NOdata. data(NOdata) attach(NOdata) set.seed(100) be <- list(c(1.30, -0.13, 0.08), c(0.56, 0.09)) s <- c(0.02, 0.04) psi.locs <- list(comp.1 = list(NO = 1.57), comp.2 = NULL) out <- segregmixEM(Equivalence, cbind(NO), verb = TRUE, k = 2, beta = be, sigma = s, lambda = c(1, 1)/2, seg.Z = list(~NO, NULL), psi = rbind(1, 0), psi.locs = psi.locs, epsilon = 0.1) z <- seq(0, 5, len = 1000) plot(NOdata, col = apply(out$post, 1, which.max) + 1, pch = 19, cex.lab = 1.4, cex = 1.4, ylab = "Equivalence Ratio") b <- out$beta d <- out$psi.locs lines(z, b[[1]][1] + b[[1]][2] * z + b[[1]][3] * (z - d[[1]][[1]]) * (z > d[[1]][[1]]) , col = 2, lwd = 2) lines(z, b[[2]][1] + b[[2]][2] * z, col = 3, lwd = 2) abline(v = out$psi.locs[[1]][1], col = 2, lty = 2) detach(NOdata) ## End(Not run)
## Not run: ## Simulated example. set.seed(100) x <- 1:20 y1 <- 3 + x + rnorm(20) y2 <- 3 - x - 5*(x - 15)*(x > 15) + rnorm(20) y <- c(y1, y2) x <- c(x, x) set.seed(100) be <- list(c(3, -1, -5), c(3, 1)) s <- c(1, 1) psi.locs <- list(comp.1 = list(x = 15), comp.2 = NULL) out <- segregmixEM(y, cbind(1,x), verb = TRUE, k = 2, beta = be, sigma = s, lambda = c(1, 1)/2, seg.Z = list(~x, NULL), psi = rbind(1, 0), psi.locs = psi.locs, epsilon = 0.9) z <- seq(0, 21, len = 40) plot(x, y, col = apply(out$post, 1, which.max) + 1, pch = 19, cex.lab = 1.4, cex = 1.4) b <- out$beta d <- out$psi.locs lines(z, b[[1]][1] + b[[1]][2] * z + b[[1]][3] * (z - d[[1]][[1]]) * (z > d[[1]][[1]]) , col = 2, lwd = 2) lines(z, b[[2]][1] + b[[2]][2] * z, col = 3, lwd = 2) abline(v = out$psi.locs[[1]][1], col = 2, lty = 2) ## End(Not run) ## Not run: ## Example using the NOdata. data(NOdata) attach(NOdata) set.seed(100) be <- list(c(1.30, -0.13, 0.08), c(0.56, 0.09)) s <- c(0.02, 0.04) psi.locs <- list(comp.1 = list(NO = 1.57), comp.2 = NULL) out <- segregmixEM(Equivalence, cbind(NO), verb = TRUE, k = 2, beta = be, sigma = s, lambda = c(1, 1)/2, seg.Z = list(~NO, NULL), psi = rbind(1, 0), psi.locs = psi.locs, epsilon = 0.1) z <- seq(0, 5, len = 1000) plot(NOdata, col = apply(out$post, 1, which.max) + 1, pch = 19, cex.lab = 1.4, cex = 1.4, ylab = "Equivalence Ratio") b <- out$beta d <- out$psi.locs lines(z, b[[1]][1] + b[[1]][2] * z + b[[1]][3] * (z - d[[1]][[1]]) * (z > d[[1]][[1]]) , col = 2, lwd = 2) lines(z, b[[2]][1] + b[[2]][2] * z, col = 3, lwd = 2) abline(v = out$psi.locs[[1]][1], col = 2, lty = 2) detach(NOdata) ## End(Not run)
Returns semiparametric EM algorithm output (Benaglia et al, 2009) for mixtures of multivariate (repeated measures) data where the coordinates of a row (case) in the data matrix are assumed to be independent, conditional on the mixture component (subpopulation) from which they are drawn. For now, this algorithm only implements model (4.7) in Benaglia et al, in which each component and block has exactly the same (nonparametric) shape and they differ only by location and scale.
spEM(x, mu0, blockid = 1:ncol(x), bw = bw.nrd0(as.vector(as.matrix(x))), constbw = TRUE, h = bw, eps = 1e-8, maxiter = 500, stochastic = FALSE, verb = TRUE)
spEM(x, mu0, blockid = 1:ncol(x), bw = bw.nrd0(as.vector(as.matrix(x))), constbw = TRUE, h = bw, eps = 1e-8, maxiter = 500, stochastic = FALSE, verb = TRUE)
x |
An |
mu0 |
Either an |
blockid |
A vector of length |
bw |
Bandwidth for density estimation, equal to the standard deviation
of the kernel density. By default, a simplistic application of the
default |
constbw |
Logical: If |
h |
Alternative way to specify the bandwidth, to provide backward compatibility. |
eps |
Tolerance limit for declaring algorithm convergence. Convergence
is declared whenever the maximum change in any coordinate of the
|
maxiter |
The maximum number of iterations allowed, for both
stochastic and non-stochastic versions;
for non-stochastic algorithms ( |
stochastic |
Flag, if FALSE (the default), runs the non-stochastic version
of the npEM algorithm, as in Benaglia et al (2009). Set to TRUE to
run a stochastic version which simulates the posteriors at each
iteration, and runs for |
verb |
If TRUE, print updates for every iteration of the algorithm as it runs |
spEM
returns a list of class spEM
with the following items:
data |
The raw data (an |
posteriors |
An |
bandwidth |
If |
blockid |
Same as the |
lambda |
The sequence of mixing proportions over iterations. |
lambdahat |
The final mixing proportions if |
mu |
The sequence of location parameters over iterations. |
muhat |
The final location parameters if |
sigma |
The sequence of scale parameters over iterations. |
sigmahat |
The final scale parameters if |
loglik |
The sequence of log-likelihoods over iterations. |
Benaglia, T., Chauveau, D., and Hunter, D. R., An EM-like algorithm for semi- and non-parametric estimation in multivariate mixtures, Journal of Computational and Graphical Statistics, 18, 505-526, 2009.
Benaglia, T., Chauveau, D. and Hunter, D.R. Bandwidth Selection in an EM-like algorithm for nonparametric multivariate mixtures. Nonparametric Statistics and Mixture Models: A Festschrift in Honor of Thomas P. Hettmansperger. World Scientific Publishing Co., pages 15-27, 2011.
Bordes, L., Chauveau, D., and Vandekerkhove, P., An EM algorithm for a semiparametric mixture model, Computational Statistics and Data Analysis, 51: 5429-5443, 2007.
plot.spEM
, normmixrm.sim
, spEMsymloc
,
npEM
, plotseq.npEM
## Not run: ## simulate a 2-component gaussian mixture with 3 iid repeated measures set.seed(100) mu <- matrix(c(0, 15), 2, 3) sigma <- matrix(c(1, 5), 2, 3) x <- rmvnormmix(300, lambda = c(.4,.6), mu = mu, sigma = sigma) ## apply spEM with or without an iterative bandwidth selection d <- spEM(x, mu0 = 2, blockid = rep(1,3), constbw = FALSE) d2 <- spEM(x, mu0 = 2, blockid = rep(1,3), constbw = TRUE) plot(d, xlim=c(-10, 40), ylim = c(0, .16), xlab = "", breaks = 30, cex.lab=1.5, cex.axis=1.5, addlegend=FALSE) plot(d2, newplot=FALSE, addlegend=FALSE, lty=2) ## End(Not run)
## Not run: ## simulate a 2-component gaussian mixture with 3 iid repeated measures set.seed(100) mu <- matrix(c(0, 15), 2, 3) sigma <- matrix(c(1, 5), 2, 3) x <- rmvnormmix(300, lambda = c(.4,.6), mu = mu, sigma = sigma) ## apply spEM with or without an iterative bandwidth selection d <- spEM(x, mu0 = 2, blockid = rep(1,3), constbw = FALSE) d2 <- spEM(x, mu0 = 2, blockid = rep(1,3), constbw = TRUE) plot(d, xlim=c(-10, 40), ylim = c(0, .16), xlab = "", breaks = 30, cex.lab=1.5, cex.axis=1.5, addlegend=FALSE) plot(d2, newplot=FALSE, addlegend=FALSE, lty=2) ## End(Not run)
Returns semiparametric EM algorithm output (Bordes et al, 2007, and Benaglia et al, 2009) for location mixtures of univariate data and symmetric component density.
spEMsymloc(x, mu0, bw = bw.nrd0(x), h=bw, eps = 1e-8, maxiter = 100, stochastic = FALSE, verbose = FALSE)
spEMsymloc(x, mu0, bw = bw.nrd0(x), h=bw, eps = 1e-8, maxiter = 100, stochastic = FALSE, verbose = FALSE)
x |
A vector of length |
mu0 |
Either a vector specifying the initial
centers for the kmeans function, and from which the number of
component is obtained, or an integer |
bw |
Bandwidth for density estimation, equal to the standard deviation of the kernel density. |
h |
Alternative way to specify the bandwidth, to provide backward compatibility. |
eps |
Tolerance limit for declaring algorithm convergence. Convergence
is declared before |
maxiter |
The maximum number of iterations allowed, for both
stochastic and non-stochastic versions;
for non-stochastic algorithms ( |
stochastic |
Flag, if FALSE (the default), runs the non-stochastic version
of the algorithm, as in Benaglia et al (2009). Set to TRUE to
run a stochastic version which simulates the posteriors at each
iteration (as in Bordes et al, 2007),
and runs for |
verbose |
If TRUE, print updates for every iteration of the algorithm as it runs |
spEMsymloc
returns a list of class npEM
with the following items:
data |
The raw data (an |
posteriors |
An |
bandwidth |
Same as the |
lambda |
The sequence of mixing proportions over iterations. |
lambdahat |
The final estimate for mixing proportions if |
mu |
the sequence of component means over iterations. |
muhat |
the final estimate of component means if |
symmetric |
Flag indicating that the kernel density estimate is using a symmetry assumption. |
Benaglia, T., Chauveau, D., and Hunter, D. R., An EM-like algorithm for semi- and non-parametric estimation in multivariate mixtures, Journal of Computational and Graphical Statistics, 18, 505-526, 2009.
Benaglia, T., Chauveau, D., Hunter, D. R., and Young, D. mixtools: An R package for analyzing finite mixture models. Journal of Statistical Software, 32(6):1-29, 2009.
Bordes, L., Chauveau, D., and Vandekerkhove, P. (2007), An EM algorithm for a semiparametric mixture model, Computational Statistics and Data Analysis, 51: 5429-5443.
plot.npEM
, rnormmix
,
npEM
, spEMsymlocN01
, plotseq.npEM
## Example from a normal location mixture set.seed(100) n <- 200 lambda <- c(1/3,2/3) mu <- c(0, 4); sigma<-rep(1, 2) x <- rnormmix(n, lambda, mu, sigma) out.stoc <- spEMsymloc(x, mu0=c(-1, 2), stochastic=TRUE) out.nonstoc <- spEMsymloc(x, mu0=c(-1, 2))
## Example from a normal location mixture set.seed(100) n <- 200 lambda <- c(1/3,2/3) mu <- c(0, 4); sigma<-rep(1, 2) x <- rnormmix(n, lambda, mu, sigma) out.stoc <- spEMsymloc(x, mu0=c(-1, 2), stochastic=TRUE) out.nonstoc <- spEMsymloc(x, mu0=c(-1, 2))
Return semiparametric EM-like algorithm output for a 2-components
mixture model with one component set to Normal(0,1), and the other component
being a unspecified but symmetric density with a location parameter.
This model is tailored to
FDR estimation on probit transform (qnorm
) of p-values arising from multiple testing.
spEMsymlocN01(x, mu0 = 2, bw = bw.nrd0(x), h=bw, eps = 1e-8, maxiter = 100, verbose = FALSE, plotf = FALSE)
spEMsymlocN01(x, mu0 = 2, bw = bw.nrd0(x), h=bw, eps = 1e-8, maxiter = 100, verbose = FALSE, plotf = FALSE)
x |
A vector of length n consisting of the data, probit transform of pvalues, preferably sorted. |
mu0 |
Starting value of vector of component means. If not set then the initial value
is randomly generated by a |
bw |
Bandwidth for weighted kernel density estimation. |
h |
Alternative way to specify the bandwidth, to provide backward compatibility. |
eps |
Tolerance limit for declaring algorithm convergence.
Convergence is declared before |
maxiter |
The maximum number of iterations allowed; convergence
may be declared before |
verbose |
If TRUE, print updates for every iteration of the algorithm as it runs. |
plotf |
If TRUE, plots successive updates of the nonparametric density estimate over iterations. Mostly for testing purpose. |
This algorithm is a specific version of semiparametric EM-like algorithm
similar in spirit to spEMsymloc
, but specialized for FDR estimation on
probit transform (qnorm
) of p-values in multiple testing framework.
In this model, component 1 corresponds to the individuals under the null hypothesis,
i.e. theoretically normal(0,1) distributed, whereas component 2 corresponds to individuals in the
alternative hypothesis, with typically very small p-values and consequently
negative values for probit(p) data. This model only assumes
that these individuals come from an unspecified but symmetric density with a location parameter,
as in Bordes and Vandekerkhove (2010) and Chauveau et al. (2014).
spEMsymlocN01
returns a list of class spEMN01
with the following items:
data |
The raw data (an |
posteriors |
An |
bandwidth |
Same as the |
lambda |
The sequence of mixing proportions over iterations. |
lambdahat |
The final estimate for mixing proportions. |
mu |
the sequence of second component mean over iterations. |
muhat |
the final estimate of second component mean. |
symmetric |
Flag indicating that the kernel density estimate is using a symmetry assumption. |
Didier Chauveau
Bordes, L. and Vandekerkhove, P. (2010). Semiparametric two-component mixture model with a known component: an asymptotically normal estimator. Mathematical Methods of Statistics, 19(1):22-41
Chauveau, D., Saby, N., Orton, T. G., Lemercier B., Walter, C. and Arrouys, D. (2014) Large-scale simultaneous hypothesis testing in monitoring carbon content from french soil database: A semi-parametric mixture approach. Geoderma 219-220 (2014): 117-124.
spEMsymloc
, normalmixEM
,
npEM
, plot.spEMN01
,
plotFDR
## Probit transform of p-values ## from a Beta-Uniform mixture model ## comparion of parametric and semiparametric EM fit ## Note: in actual situations n=thousands set.seed(50) n=300 # nb of multiple tests m=2 # 2 mixture components a=c(1,0.1); b=c(1,1); lambda=c(0.6,0.4) # parameters z=sample(1:m, n, rep=TRUE, prob = lambda) p <- rbeta(n, shape1 = a[z], shape2 = b[z]) # p-values o <- order(p) cpd <- cbind(z,p)[o,] # sorted complete data, z=1 if H0, 2 if H1 p <- cpd[,2] # sorted p-values y <- qnorm(p) # probit transform of the pvalues # gaussian EM fit with component 1 constrained to N(0,1) s1 <- normalmixEM(y, mu=c(0,-4), mean.constr = c(0,NA), sd.constr = c(1,NA)) s2 <- spEMsymlocN01(y, mu0 = c(0,-3)) # spEM with N(0,1) fit hist(y, freq = FALSE, col = 8, main = "histogram of probit(pvalues)") plot(s2, add.plot = TRUE, lwd = 2) # Exemples of plot capabilities # Note: posteriors must be ordered by p for plot.FDR # plotFDR(s1$post) # when true complete data not observed # plotFDR(s1$post, s2$post) # comparing 2 strategies plotFDR(s1$post, s2$post, lg1 = "normalmixEM", lg2 = "spEMsymlocN01", complete.data = cpd) # with true FDR computed from z
## Probit transform of p-values ## from a Beta-Uniform mixture model ## comparion of parametric and semiparametric EM fit ## Note: in actual situations n=thousands set.seed(50) n=300 # nb of multiple tests m=2 # 2 mixture components a=c(1,0.1); b=c(1,1); lambda=c(0.6,0.4) # parameters z=sample(1:m, n, rep=TRUE, prob = lambda) p <- rbeta(n, shape1 = a[z], shape2 = b[z]) # p-values o <- order(p) cpd <- cbind(z,p)[o,] # sorted complete data, z=1 if H0, 2 if H1 p <- cpd[,2] # sorted p-values y <- qnorm(p) # probit transform of the pvalues # gaussian EM fit with component 1 constrained to N(0,1) s1 <- normalmixEM(y, mu=c(0,-4), mean.constr = c(0,NA), sd.constr = c(1,NA)) s2 <- spEMsymlocN01(y, mu0 = c(0,-3)) # spEM with N(0,1) fit hist(y, freq = FALSE, col = 8, main = "histogram of probit(pvalues)") plot(s2, add.plot = TRUE, lwd = 2) # Exemples of plot capabilities # Note: posteriors must be ordered by p for plot.FDR # plotFDR(s1$post) # when true complete data not observed # plotFDR(s1$post, s2$post) # comparing 2 strategies plotFDR(s1$post, s2$post, lg1 = "normalmixEM", lg2 = "spEMsymlocN01", complete.data = cpd) # with true FDR computed from z
Returns parameter estimates for finite mixtures of linear regressions with unspecified error structure. Based on Hunter and Young (2012).
spregmix(lmformula, bw = NULL, constbw = FALSE, bwmult = 0.9, z.hat = NULL, symm = TRUE, betamethod = "LS", m = ifelse(is.null(z.hat), 2, ncol(z.hat)), epsilon = 1e-04, maxit = 1000, verbose = FALSE, ...)
spregmix(lmformula, bw = NULL, constbw = FALSE, bwmult = 0.9, z.hat = NULL, symm = TRUE, betamethod = "LS", m = ifelse(is.null(z.hat), 2, ncol(z.hat)), epsilon = 1e-04, maxit = 1000, verbose = FALSE, ...)
lmformula |
Formula for a linear model, in the same format used by
|
bw |
Initial bandwidth value. If NULL, this will be chosen automatically by the algorithm. |
constbw |
Logical: If TRUE, the bandwidth is held constant throughout the algorithm; if FALSE, it adapts at each iteration according to the rules given in Hunter and Young (2012). |
bwmult |
Whenever it is updated automatically,
the bandwidth is equal to |
z.hat |
Initial nxm matrix of posterior probabilities. If NULL, this
is initialized randomly. As long as a parametric estimation method like least
squares is used to estimate |
symm |
Logical: If TRUE, the error density is assumed symmetric about zero. If FALSE, it is not. WARNING: If FALSE, the intercept parameter is not uniquely identifiable if it is included in the linear model. |
betamethod |
Method of calculating beta coefficients in the M-step. Current possible values are "LS" for least-squares; "L1" for least absolute deviation; "NP" for fully nonparametric; and "transition" for a transition from least squares to fully nonparametric. If something other than these four possibilities is used, then "NP" is assumed. For details of these methods, see Hunter and Young (2012). |
m |
Number of components in the mixture. |
epsilon |
Convergence is declared if the largest change in any lambda or
beta coordinate is smaller than |
maxit |
The maximum number of iterations; if convergence is never declared
based on comparison with |
verbose |
Logical: If TRUE, then various updates are printed during each iteration of the algorithm. |
... |
Additional parameters passed to the
|
regmixEM
returns a list of class npEM
with items:
x |
The set of predictors (which includes a column of 1's if |
y |
The response values. |
lambda |
The mixing proportions for every iteration in the form of a matrix with m columns and (#iterations) rows |
beta |
The final regression coefficients. |
posterior |
An nxm matrix of posterior probabilities for observations. |
np.stdev |
Nonparametric estimate of the standard deviation, as given in Hunter and Young (2012) |
bandwidth |
Final value of the bandwidth |
density.x |
Points at which the error density is estimated |
density.y |
Values of the error density at the points |
symmetric |
Logical: Was the error density assumed symmetric? |
loglik |
A quantity similar to a log-likelihood, computed just like a standard loglikelihood would be, conditional on the component density functions being equal to the final density estimates. |
ft |
A character vector giving the name of the function. |
Hunter, D. R. and Young, D. S. (2012) Semi-parametric Mixtures of Regressions, Journal of Nonparametric Statistics 24(1): 19-38.
Scott, D. W. (1992) Multivariate Density Estimation, John Wiley & Sons Inc., New York.
Silverman, B. W. (1986). Density Estimation for Statistics and Data Analysis, Chapman & Hall, London.
data(tonedata) ## By default, the bandwidth will adapt and the error density is assumed symmetric set.seed(100) a=spregmix(tuned~stretchratio, bw=.2, data=tonedata, verb=TRUE) ## Look at the sp mixreg solution: plot(tonedata) abline(a=a$beta[1,1],b=a$beta[2,1], col=2) abline(a=a$beta[1,2],b=a$beta[2,2], col=3) ## Look at the nonparametric KD-based estimate of the error density, ## constrained to be zero-symmetric: plot(xx<-a$density.x, yy<-a$density.y, type="l") ## Compare to a normal density with mean 0 and NP-estimated stdev: z <- seq(min(xx), max(xx), len=200) lines(z, dnorm(z, sd=sqrt((a$np.stdev)^2+a$bandwidth^2)), col=2, lty=2) # Add bandwidth^2 to variance estimate to get estimated var of KDE ## Now add the sp mixreg estimate without assuming symmetric errors: b=spregmix(tuned~stretchratio, bw=.2, , symm=FALSE, data=tonedata, verb=TRUE) lines(b$density.x, b$density.y, col=3)
data(tonedata) ## By default, the bandwidth will adapt and the error density is assumed symmetric set.seed(100) a=spregmix(tuned~stretchratio, bw=.2, data=tonedata, verb=TRUE) ## Look at the sp mixreg solution: plot(tonedata) abline(a=a$beta[1,1],b=a$beta[2,1], col=2) abline(a=a$beta[1,2],b=a$beta[2,2], col=3) ## Look at the nonparametric KD-based estimate of the error density, ## constrained to be zero-symmetric: plot(xx<-a$density.x, yy<-a$density.y, type="l") ## Compare to a normal density with mean 0 and NP-estimated stdev: z <- seq(min(xx), max(xx), len=200) lines(z, dnorm(z, sd=sqrt((a$np.stdev)^2+a$bandwidth^2)), col=2, lty=2) # Add bandwidth^2 to variance estimate to get estimated var of KDE ## Now add the sp mixreg estimate without assuming symmetric errors: b=spregmix(tuned~stretchratio, bw=.2, , symm=FALSE, data=tonedata, verb=TRUE) lines(b$density.x, b$density.y, col=3)
Stochastic EM algorithm for semiparametric scaled mixture for randomly right censored data.
spRMM_SEM(t, d = NULL, lambda = NULL, scaling = NULL, centers = 2, kernelft = triang_wkde, bw = rep(bw.nrd0(t),length(t)), averaged = TRUE, epsilon = 1e-08, maxit = 100, batchsize = 1, verb = FALSE)
spRMM_SEM(t, d = NULL, lambda = NULL, scaling = NULL, centers = 2, kernelft = triang_wkde, bw = rep(bw.nrd0(t),length(t)), averaged = TRUE, epsilon = 1e-08, maxit = 100, batchsize = 1, verb = FALSE)
t |
A vector of |
d |
The vector of censoring indication, where 1 means observed lifetime data, and 0 means censored lifetime data. |
lambda |
Initial value of mixing proportions.
If |
scaling |
Initial value of scaling between components,
set to 1 if |
centers |
initial centers for initial call to kmeans for initialization. |
kernelft |
. |
bw |
Bandwidth in the kernel hazard estimates. |
averaged |
averaged. |
epsilon |
Tolerance limit. |
maxit |
The number of iterations allowed. |
batchsize |
The batchsize (see reference below). |
verb |
If TRUE, print updates for every iteration of the algorithm as it runs |
spRMM_SEM
returns a list of class "spRMM"
with the following items:
t |
The input data. |
d |
The input censoring indicator. |
lambda |
The estimates for the mixing proportions. |
scaling |
The estimates for the components scaling. |
posterior |
An |
loglik |
The (pseudo) log-likelihood value at convergence of the algorithm. |
all.loglik |
The sequence of log-likelihood values over iterations. |
all.lambda |
The sequence of mixing proportions over iterations. |
all.scaling |
The sequence of scaling parameter over iterations. |
meanpost |
Posterior probabilities averaged over iterations. |
survival |
Kaplan-Meier last iteration estimate (a |
hazard |
Hazard rate last iteration estimate evaluated at |
final.t |
Last iteration unscaled sample (see reference). |
s.hat |
Kaplan-Meier average estimate. |
t.hat |
Ordered unscaled sample, for testing purpose. |
avg.od |
For testing purpose only. |
hazard.hat |
Hazard rate average estimate on |
batch.t |
Batch sample (not ordered), see reference. |
batch.d |
Associated event indicators just |
sumNaNs |
Internal control of numerical stability. |
ft |
A character vector giving the name of the function. |
Didier Chauveau
Bordes, L., and Chauveau, D. (2016), Stochastic EM algorithms for parametric and semiparametric mixture models for right-censored lifetime data, Computational Statistics, Volume 31, Issue 4, pages 1513-1538. https://link.springer.com/article/10.1007/s00180-016-0661-7
Related functions:
plotspRMM
,
summary.spRMM
.
Other models and algorithms for censored lifetime data
(name convention is model_algorithm):
expRMM_EM
,
weibullRMM_SEM
.
## Not run: n=500 # sample size m=2 # nb components lambda=c(0.4, 0.6) # parameters meanlog=3; sdlog=0.5; scale=0.1 set.seed(12) # simulate a scaled mixture of lognormals x <- rlnormscalemix(n, lambda, meanlog, sdlog, scale) cs=runif(n,20,max(x)+400) # Censoring (uniform) and incomplete data t <- apply(cbind(x,cs),1,min) d <- 1*(x <= cs) tauxc <- 100*round( 1-mean(d),3) cat(tauxc, "percents of data censored.\n") c0 <- c(25, 180) # data-driven initial centers (visible modes) sc0 <- 25/180 # and scaling s <- spRMM_SEM(t, d, scaling = sc0, centers = c0, bw = 15, maxit = 100) plotspRMM(s) # default summary(s) # S3 method for class "spRMM" ## End(Not run)
## Not run: n=500 # sample size m=2 # nb components lambda=c(0.4, 0.6) # parameters meanlog=3; sdlog=0.5; scale=0.1 set.seed(12) # simulate a scaled mixture of lognormals x <- rlnormscalemix(n, lambda, meanlog, sdlog, scale) cs=runif(n,20,max(x)+400) # Censoring (uniform) and incomplete data t <- apply(cbind(x,cs),1,min) d <- 1*(x <= cs) tauxc <- 100*round( 1-mean(d),3) cat(tauxc, "percents of data censored.\n") c0 <- c(25, 180) # data-driven initial centers (visible modes) sc0 <- 25/180 # and scaling s <- spRMM_SEM(t, d, scaling = sc0, centers = c0, bw = 15, maxit = 100) plotspRMM(s) # default summary(s) # S3 method for class "spRMM" ## End(Not run)
summary
method for class mixEM
.
## S3 method for class 'mixEM' summary(object, digits=6, ...)
## S3 method for class 'mixEM' summary(object, digits=6, ...)
object |
an object of class |
digits |
Significant digits for printing values |
... |
further arguments passed to |
summary.mixEM
prints parameter estimates for
each component of a fitted mixture model.
The estimates printed vary with the type of model.
The function summary.mixEM
prints the final loglikelihood
value at the solution as well as a matrix of values for each component
that could include:
lambda |
The estimated mixing weights |
mu |
The estimated mean parameters |
sigma |
The estimated standard deviations |
theta |
The estimated multinomial parameters |
beta |
The estimated regression parameters |
normalmixEM
,
logisregmixEM
,
multmixEM
,
mvnormalmixEM
,
poisregmixEM
,
regmixEM
,
regmixEM.lambda
,
regmixEM.loc
,
regmixEM.mixed
,
regmixEM.chgpt
,
repnormmixEM
,
expRMM_EM
,
weibullRMM_SEM
data(faithful) attach(faithful) set.seed(100) out <- normalmixEM(waiting, mu=c(50,80), sigma=c(5,5), lambda=c(.5,.5)) summary(out)
data(faithful) attach(faithful) set.seed(100) out <- normalmixEM(waiting, mu=c(50,80), sigma=c(5,5), lambda=c(.5,.5)) summary(out)
summary
method for class mvnpEM
.
## S3 method for class 'mvnpEM' summary(object, ...) ## S3 method for class 'summary.mvnpEM' print(x, digits=3, ...)
## S3 method for class 'mvnpEM' summary(object, ...) ## S3 method for class 'summary.mvnpEM' print(x, digits=3, ...)
object , x
|
an object of class |
digits |
Significant digits for printing values |
... |
further arguments passed to or from other methods. |
summary.mvnpEM
prints means and variances of each block for
each component. These quantities might not be part of the model, but they
are estimated nonparametrically based on the posterior probabilities and the
data.
The function summary.mvnpEM
returns a list of type summary.mvnpEM
with the following components:
n |
The number of observations |
m |
The number of mixture components |
B |
The number of blocks |
blockid |
The block ID (from 1 through B) for each of the coordinates
of the multivariate observations. The |
means |
A |
variances |
Same as |
Benaglia, T., Chauveau, D., and Hunter, D. R. (2009), An EM-like algorithm for semi- and non-parametric estimation in multivariate mixtures, Journal of Computational and Graphical Statistics, 18(2), 505–526.
Chauveau, D., and Hoang, V. T. L. (2015), Nonparametric mixture models with conditionally independent multivariate component densities, Preprint under revision. https://hal.archives-ouvertes.fr/hal-01094837
# Example as in Chauveau and Hoang (2015) with 6 coordinates ## Not run: m=2; r=6; blockid <-c(1,1,2,2,3,3) # 3 bivariate blocks # generate some data x ... a <- mvnpEM(x, mu0=2, blockid, samebw=F) # adaptive bandwidth plot(a) # this S3 method produces 6 plots of univariate marginals summary(a) ## End(Not run)
# Example as in Chauveau and Hoang (2015) with 6 coordinates ## Not run: m=2; r=6; blockid <-c(1,1,2,2,3,3) # 3 bivariate blocks # generate some data x ... a <- mvnpEM(x, mu0=2, blockid, samebw=F) # adaptive bandwidth plot(a) # this S3 method produces 6 plots of univariate marginals summary(a) ## End(Not run)
summary
method for class npEM
.
## S3 method for class 'npEM' summary(object, ...) ## S3 method for class 'summary.npEM' print(x, digits=3, ...)
## S3 method for class 'npEM' summary(object, ...) ## S3 method for class 'summary.npEM' print(x, digits=3, ...)
object , x
|
an object of class |
digits |
Significant digits for printing values |
... |
further arguments passed to or from other methods. |
summary.npEM
prints means and variances of each block for
each component. These quantities might not be part of the model, but they
are estimated nonparametrically based on the posterior probabilities and the
data.
The function summary.npEM
returns a list of type summary.npEM
with the following components:
n |
The number of observations |
m |
The number of mixture components |
B |
The number of blocks |
blockid |
The block ID (from 1 through B) for each of the coordinates
of the multivariate observations. The |
means |
A |
variances |
Same as |
Benaglia, T., Chauveau, D., and Hunter, D. R. (2009), An EM-like algorithm for semi- and non-parametric estimation in multivariate mixtures, Journal of Computational and Graphical Statistics, 18(2), 505–526.
data(Waterdata) set.seed(100) ## Not run: a <- npEM(Waterdata[,3:10], 3, bw=4) # Assume indep but not iid summary(a) b <- npEM(Waterdata[,3:10], 3, bw=4, blockid=rep(1,8)) # Now assume iid summary(b) ## End(Not run)
data(Waterdata) set.seed(100) ## Not run: a <- npEM(Waterdata[,3:10], 3, bw=4) # Assume indep but not iid summary(a) b <- npEM(Waterdata[,3:10], 3, bw=4, blockid=rep(1,8)) # Now assume iid summary(b) ## End(Not run)
summary
method for class spRMM
.
## S3 method for class 'spRMM' summary(object, digits = 6, ...)
## S3 method for class 'spRMM' summary(object, digits = 6, ...)
object |
an object of class |
digits |
Significant digits for printing values |
... |
Additional parameters passed to |
summary.spRMM
prints scalar parameter estimates for
a fitted mixture model: each component weight and the scaling factor, see reference below.
The functional (nonparametric) estimates of survival and hazard rate funcions can be obtained
using plotspRMM
.
The function summary.spRMM
prints the final loglikelihood
value at the solution as well as The estimated mixing weights and the scaling parameter.
Didier Chauveau
Bordes, L., and Chauveau, D. (2016), Stochastic EM algorithms for parametric and semiparametric mixture models for right-censored lifetime data, Computational Statistics, Volume 31, Issue 4, pages 1513-1538. https://link.springer.com/article/10.1007/s00180-016-0661-7
Function for plotting functional (nonparametric) estimates:
plotspRMM
.
Other models and algorithms for censored lifetime data
(name convention is model_algorithm):
expRMM_EM
,
weibullRMM_SEM
.
# See example(spRMM_SEM)
# See example(spRMM_SEM)
Return ECM algorithm output for a specific case of a three-component tau equivalence model
tauequivnormalmixEM (x, lambda = NULL, mu = NULL, sigma = NULL, k = 3, mean.constr = NULL, sd.constr = NULL, gparam = NULL, epsilon = 1e-08, maxit = 10000, maxrestarts=20, verb = FALSE, fast=FALSE, ECM = TRUE, arbmean = TRUE, arbvar = TRUE)
tauequivnormalmixEM (x, lambda = NULL, mu = NULL, sigma = NULL, k = 3, mean.constr = NULL, sd.constr = NULL, gparam = NULL, epsilon = 1e-08, maxit = 10000, maxrestarts=20, verb = FALSE, fast=FALSE, ECM = TRUE, arbmean = TRUE, arbvar = TRUE)
x |
A vector of length n consisting of the data,
passed directly to |
lambda |
Initial value of mixing proportions,
passed directly to |
mu |
Starting value of vector of component means for algorithm,
passed directly to |
sigma |
Starting value of vector of component standard deviations
for algorithm, passed directly to |
k |
Number of components, passed directly to |
mean.constr |
If non-NULL, this parameter is
passed directly to |
sd.constr |
Deprecated. |
gparam |
This argument is passed directly to |
epsilon |
The convergence criterion. Convergence is declared when the change in the observed data log-likelihood increases by less than epsilon. |
maxit |
The maximum number of iterations. |
maxrestarts |
The maximum number of restarts allowed in case of a problem with the particular starting values chosen due to one of the variance estimates getting too small (each restart uses randomly chosen starting values). It is well-known that when each component of a normal mixture may have its own mean and variance, the likelihood has no maximizer; in such cases, we hope to find a "nice" local maximum with this algorithm instead, but occasionally the algorithm finds a "not nice" solution and one of the variances goes to zero, driving the likelihood to infinity. |
verb |
If TRUE, then various updates are printed during each iteration of the algorithm. |
fast |
If TRUE and k==2 and arbmean==TRUE, then use
|
ECM |
logical: Should this algorithm be an ECM algorithm in the sense
of Meng and Rubin (1993)? If FALSE, the algorithm is a true EM algorithm;
if TRUE, then every half-iteration alternately updates the means conditional
on the variances or the variances conditional on the means, with an extra
E-step in between these updates. For |
arbmean |
Deprecated. |
arbvar |
Deprecated. |
The tauequivnormalmixEM
function is merely a wrapper for the
normalmixMMlc
function.
# This is the standard EM algorithm for normal mixtures that maximizes
# the conditional expected complete-data
# log-likelihood at each M-step of the algorithm.
# If desired, the
# EM algorithm may be replaced by an ECM algorithm (see ECM
argument)
# that alternates between maximizing with respect to the mu
# and lambda
while holding sigma
fixed, and maximizing with
# respect to sigma
and lambda
while holding mu
# fixed. In the case where arbmean
is FALSE
# and arbvar
is TRUE
, there is no closed-form EM algorithm,
# so the ECM option is forced in this case.
normalmixEM
returns a list of class mixEM
with items:
x |
The raw data. |
lambda |
The final mixing proportions. |
mu |
The final mean parameters. |
sigma |
The final standard deviation(s) |
scale |
Scale factor for the component standard deviations, if applicable. |
loglik |
The final log-likelihood. |
posterior |
An nxk matrix of posterior probabilities for observations. |
all.loglik |
A vector of each iteration's log-likelihood. This vector includes both the initial and the final values; thus, the number of iterations is one less than its length. |
restarts |
The number of times the algorithm restarted due to unacceptable choice of initial values. |
ft |
A character vector giving the name of the function. |
Thomas, H., Lohaus, A., and Domsch, H. (2011) Stable Unstable Reliability Theory, British Journal of Mathematical and Statistical Psychology 65(2): 201-221.
Meng, X.-L. and Rubin, D. B. (1993) Maximum Likelihood Estimation Via the ECM Algorithm: A General Framework, Biometrika 80(2): 267-278.
normalmixMMlc
, normalmixEM
, mvnormalmixEM
, normalmixEM2comp
## Analyzing synthetic data as in the tau equivalent model ## From Thomas et al (2011), see also Chauveau and Hunter (2013) ## a 3-component mixture of normals with linear constraints. lbd <- c(0.6,0.3,0.1); m <- length(lbd) sigma <- sig0 <- sqrt(c(1,9,9)) # means constaints mu = M beta M <- matrix(c(1,1,1,0,1,-1), 3, 2) beta <- c(1,5) # unknown constained mean mu0 <- mu <- as.vector(M %*% beta) # linear constraint on the inverse variances pi = A.g A <- matrix(c(1,1,1,0,1,0), m, 2, byrow=TRUE) iv0 <- 1/(sig0^2) g0 <- c(iv0[2],iv0[1] - iv0[2]) # gamma^0 init # simulation and EM fits set.seed(40); n=100; x <- rnormmix(n,lbd,mu,sigma) s <- normalmixEM(x,mu=mu0,sigma=sig0,maxit=2000) # plain EM # EM with var and mean linear constraints sc <- normalmixMMlc(x, lambda=lbd, mu=mu0, sigma=sig0, mean.lincstr=M, var.lincstr=A, gparam=g0) # Using tauequivnormalmixEM function to call normalmixMMlc tau <- tauequivnormalmixEM (x, lambda=lbd, mu=mu0, gparam=g0) # plot and compare both estimates dnormmixt <- function(t, lam, mu, sig){ m <- length(lam); f <- 0 for (j in 1:m) f <- f + lam[j]*dnorm(t,mean=mu[j],sd=sig[j]) f} t <- seq(min(x)-2, max(x)+2, len=200) hist(x, freq=FALSE, col="lightgrey", ylim=c(0,0.3), ylab="density",main="") lines(t, dnormmixt(t, lbd, mu, sigma), col="darkgrey", lwd=2) # true lines(t, dnormmixt(t, s$lambda, s$mu, s$sigma), lty=2) lines(t, dnormmixt(t, sc$lambda, sc$mu, sc$sigma), col=1, lty=3) lines(t, dnormmixt(t, tau$lambda, tau$mu, tau$sigma), col=2, lty=4) legend("topleft", c("true","plain EM","constr EM", "Tau Equiv"), col=c("darkgrey",1,1,2), lty=c(1,2,3,4), lwd=c(2,1,1,1))
## Analyzing synthetic data as in the tau equivalent model ## From Thomas et al (2011), see also Chauveau and Hunter (2013) ## a 3-component mixture of normals with linear constraints. lbd <- c(0.6,0.3,0.1); m <- length(lbd) sigma <- sig0 <- sqrt(c(1,9,9)) # means constaints mu = M beta M <- matrix(c(1,1,1,0,1,-1), 3, 2) beta <- c(1,5) # unknown constained mean mu0 <- mu <- as.vector(M %*% beta) # linear constraint on the inverse variances pi = A.g A <- matrix(c(1,1,1,0,1,0), m, 2, byrow=TRUE) iv0 <- 1/(sig0^2) g0 <- c(iv0[2],iv0[1] - iv0[2]) # gamma^0 init # simulation and EM fits set.seed(40); n=100; x <- rnormmix(n,lbd,mu,sigma) s <- normalmixEM(x,mu=mu0,sigma=sig0,maxit=2000) # plain EM # EM with var and mean linear constraints sc <- normalmixMMlc(x, lambda=lbd, mu=mu0, sigma=sig0, mean.lincstr=M, var.lincstr=A, gparam=g0) # Using tauequivnormalmixEM function to call normalmixMMlc tau <- tauequivnormalmixEM (x, lambda=lbd, mu=mu0, gparam=g0) # plot and compare both estimates dnormmixt <- function(t, lam, mu, sig){ m <- length(lam); f <- 0 for (j in 1:m) f <- f + lam[j]*dnorm(t,mean=mu[j],sd=sig[j]) f} t <- seq(min(x)-2, max(x)+2, len=200) hist(x, freq=FALSE, col="lightgrey", ylim=c(0,0.3), ylab="density",main="") lines(t, dnormmixt(t, lbd, mu, sigma), col="darkgrey", lwd=2) # true lines(t, dnormmixt(t, s$lambda, s$mu, s$sigma), lty=2) lines(t, dnormmixt(t, sc$lambda, sc$mu, sc$sigma), col=1, lty=3) lines(t, dnormmixt(t, tau$lambda, tau$mu, tau$sigma), col=2, lty=4) legend("topleft", c("true","plain EM","constr EM", "Tau Equiv"), col=c("darkgrey",1,1,2), lty=c(1,2,3,4), lwd=c(2,1,1,1))
Performs a likelihood ratio test of a location (or scale) normal or regression mixture versus the more general model. For a normal mixture, the alternative hypothesis is that each component has its own mean and variance, whereas the null is that all means (in the case of a scale mixture) or all variances (in the case of a location mixture) are equal. This test is asymptotically chi-square with degrees of freedom equal to k-1, where k is the number of components.
test.equality(y, x = NULL, arbmean = TRUE, arbvar = FALSE, mu = NULL, sigma = NULL, beta = NULL, lambda = NULL, ...)
test.equality(y, x = NULL, arbmean = TRUE, arbvar = FALSE, mu = NULL, sigma = NULL, beta = NULL, lambda = NULL, ...)
y |
The responses for |
x |
The predictors for |
arbmean |
If FALSE, then a scale mixture analysis is performed for |
arbvar |
If FALSE, then a location mixture analysis is performed for |
mu |
An optional vector for starting values (under the null hypothesis) for |
sigma |
An optional vector for starting values (under the null hypothesis) for |
beta |
An optional matrix for starting values (under the null hypothesis) for |
lambda |
An optional vector for starting values (under the null hypothesis) for |
... |
Additional arguments passed to the various EM algorithms for the mixture of interest. |
test.equality
returns a list with the following items:
chi.sq |
The chi-squared test statistic. |
df |
The degrees of freedom for the chi-squared test statistic. |
p.value |
The p-value corresponding to this likelihood ratio test. |
## Should a location mixture be used for the Old Faithful data? data(faithful) attach(faithful) set.seed(100) test.equality(y = waiting, arbmean = FALSE, arbvar = TRUE)
## Should a location mixture be used for the Old Faithful data? data(faithful) attach(faithful) set.seed(100) test.equality(y = waiting, arbmean = FALSE, arbvar = TRUE)
Performs a likelihood ratio test of either common variance terms between the response trajectories in a mixture of random (or mixed) effects regressions or for common variance-covariance matrices for the random effects mixture distribution.
test.equality.mixed(y, x, w=NULL, arb.R = TRUE, arb.sigma = FALSE, lambda = NULL, mu = NULL, sigma = NULL, R = NULL, alpha = NULL, ...)
test.equality.mixed(y, x, w=NULL, arb.R = TRUE, arb.sigma = FALSE, lambda = NULL, mu = NULL, sigma = NULL, R = NULL, alpha = NULL, ...)
y |
The responses for |
x |
The predictors for the random effects in |
w |
The predictors for the (optional) fixed effects in |
arb.R |
If FALSE, then a test for different variance-covariance matrices for the random effects mixture is performed. |
arb.sigma |
If FALSE, then a test for different variance terms between the response trajectories is performed. |
lambda |
A vector of mixing proportions (under the null hypothesis) with same purpose as outlined in |
mu |
A matrix of the means (under the null hypothesis) with same purpose as outlined in |
sigma |
A vector of standard deviations (under the null hypothesis) with same purpose as outlined in |
R |
A list of covariance matrices (under the null hypothesis) with same purpose as outlined in |
alpha |
An optional vector of fixed effects regression coefficients (under the null hypothesis) with same purpose as outlined
in |
... |
Additional arguments passed to |
test.equality.mixed
returns a list with the following items:
chi.sq |
The chi-squared test statistic. |
df |
The degrees of freedom for the chi-squared test statistic. |
p.value |
The p-value corresponding to this likelihood ratio test. |
##Test of equal variances in the simulated data set. data(RanEffdata) set.seed(100) x<-lapply(1:length(RanEffdata), function(i) matrix(RanEffdata[[i]][, 2:3], ncol = 2)) x<-x[1:15] y<-lapply(1:length(RanEffdata), function(i) matrix(RanEffdata[[i]][, 1], ncol = 1)) y<-y[1:15] out<-test.equality.mixed(y, x, arb.R = TRUE, arb.sigma = FALSE, epsilon = 1e-1, verb = TRUE, maxit = 50, addintercept.random = FALSE) out
##Test of equal variances in the simulated data set. data(RanEffdata) set.seed(100) x<-lapply(1:length(RanEffdata), function(i) matrix(RanEffdata[[i]][, 2:3], ncol = 2)) x<-x[1:15] y<-lapply(1:length(RanEffdata), function(i) matrix(RanEffdata[[i]][, 1], ncol = 1)) y<-y[1:15] out<-test.equality.mixed(y, x, arb.R = TRUE, arb.sigma = FALSE, epsilon = 1e-1, verb = TRUE, maxit = 50, addintercept.random = FALSE) out
The tone perception data stem
from an experiment of Cohen (1980) and have been analyzed in de Veaux
(1989) and Viele and Tong (2002). The dataset and this documentation file
were copied from the fpc package by Christian Hennig.
A pure fundamental tone was played to a
trained musician. Electronically generated overtones were added, determined
by a stretching ratio of stretchratio
. stretchratio=2.0
corresponds to the harmonic pattern
usually heard in traditional definite pitched instruments. The musician was
asked to tune an adjustable tone to the octave above the fundamental tone.
tuned
gives the ratio of the adjusted tone to the fundamental,
i.e. tuned=2.0
would be the correct tuning for all
stretchratio
-values.
The data analyzed here belong to 150 trials
with the same musician. In the original study, there were four further
musicians.
data(tonedata)
data(tonedata)
A data frame with 2 variables, stretchratio
and
tuned
, and 150 cases.
Christian Hennig
Original source: Cohen, E. A. (1980), Inharmonic tone perception. Unpublished Ph.D. dissertation, Stanford University
R source: Hennig, Christian (2010), fpc: Flexible procedures for clustering, R package version 2.0-2. https://cran.r-project.org/package=fpc
de Veaux, R. D. (1989), Mixtures of Linear Regressions, Computational Statistics and Data Analysis 8, 227-245.
Viele, K. and Tong, B. (2002), Modeling with Mixtures of Linear Regressions, Statistics and Computing 12, 315-330.
This data set arises from the water-level task proposed by the Swiss psychologist Jean Piaget to assess children's understanding of the physical world. This involves presenting a child with a rectangular vessel with a cap, affixed to a wall, that can be tilted (like the minute hand of a clock) to point in any direction. A separate disk with a water line indicated on it, which can similarly be spun so that the water line may assume any desired angle with the horizontal, is positioned so that by spinning this disk, the child subject may make the hypothetical surface of water inside the vessel assume any desired orientation. For each of eight different orientations of the vessel, corresponding to the clock angles at 1:00, 2:00, 4:00, 5:00, 7:00, 8:00, 10:00, and 11:00, the child subject is asked to position the water level as it would appear in reality if water were in the vessel. The measurement is the acute angle with the horizontal, in degrees, assumed by the water line after it is positioned by the child. A sign is attached to the measurement to indicate whether the line slopes up (positive) or down (negative) from left to right. Thus, each child has 8 repeated measurements, one for each vessel angle, and the range of possible values are from -90 to 90.
The setup of the experiment, along with a photograph of the testing apparatus,
is given by Thomas and Jamison (1975). A more detailed analysis using a
subset of 405 of the original 579 subjects is given
by Thomas and Lohaus (1993); further analyses using the functions in
mixtools
are given by Benaglia et al (2008) and Levine et al (2011),
among others.
There are two versions of the dataset included in mixtools
. The full
dataset, called WaterdataFull
, has 579 individuals. The dataset
called Waterdata
is a subset of 405 individuals, comprising all children
aged 11 years or more and omitting any individuals with any observations equal
to 100, which in this context indicates a missing value (since all of the degree
measurements should be in the range from -90 to +90, 100 is not a possible value).
data(Waterdata)
data(Waterdata)
These data frames consist of 405 or 579 rows, one row for each child. There are ten columns: The age (in years) and sex (where 1=male and 0=female) are given for each individual along with the degree of deviation from the horizontal for 8 specified clock-hour orientations (11, 4, 2, 7, 10, 5, 1, and 8 o'clock, in order).
Benaglia, T., Chauveau, D., and Hunter, D.R. (2009), An EM-Like Algorithm for Semi- and Non-Parametric Estimation in Multivariate Mixtures, Journal of Computational and Graphical Statistics, 18: 505-526.
Levine, M., Hunter, D.R., and Chauveau, D. (2011), Maximum Smoothed Likelihood for Multivariate Mixtures, Biometrika, 98(2): 403-416.
Thomas, H. and Jamison, W. (1975), On the Acquisition of Understanding that Still Water is Horizontal, Merrill-Palmer Quarterly of Behavior and Development, 21(1): 31-44.
Thomas, H. and Lohaus, A. (1993), Modeling Growth and Individual Differences in Spatial Tasks, University of Chicago Press, Chicago, available on JSTOR.
Parametric Stochastic EM (St-EM) algorithm for univariate finite mixture of Weibull distributions with randomly right censored data.
weibullRMM_SEM(x, d = NULL, lambda = NULL, shape = NULL, scale = NULL, k = 2, maxit = 200, maxit.survreg = 200, epsilon = 1e-03, averaged = TRUE, verb = FALSE)
weibullRMM_SEM(x, d = NULL, lambda = NULL, shape = NULL, scale = NULL, k = 2, maxit = 200, maxit.survreg = 200, epsilon = 1e-03, averaged = TRUE, verb = FALSE)
x |
A vector of |
d |
The vector of censoring indication, where 1 means observed lifetime data, and 0 means censored lifetime data. |
lambda |
Initial value of mixing proportions.
If |
shape |
Initial value of Weibull component shapes,
all set to 1 if |
scale |
Initial value of Weibull component scales,
all set to 1 if |
k |
Number of components of the mixture. |
maxit |
The number of iterations allowed, since for St-EM algorithms convergence
is not based on stabilization, exactly |
maxit.survreg |
The number of iterations allowed in the computations of the
MLE for censored weibull data from the |
epsilon |
Tolerance parameter used in the numerical computations of the
MLE for censored weibull data by |
averaged |
The way of updating parameters at each iteration: if |
verb |
If TRUE, print updates for every iteration of the algorithm as it runs |
This St-EM algorithm calls functions from the survival
package to compute
parametric MLE for censored weibull data.
weibullRMM_SEM
returns a list of class "mixEM" with the following items:
x |
The input data. |
d |
The input censoring indicator. |
lambda |
The estimates for the mixing proportions. |
scale |
The estimates for the Weibull component scales. |
shape |
The estimates for the Weibull component shapes. |
loglik |
The log-likelihood value at convergence of the algorithm. |
posterior |
An |
all.loglik |
The sequence of log-likelihoods over iterations. |
all.lambda |
The sequence of mixing proportions over iterations. |
all.scale |
The sequence of component scales over iterations. |
all.shape |
The sequence of component shapes over iterations. |
ft |
A character vector giving the name of the function called. |
Didier Chauveau
Bordes, L., and Chauveau, D. (2016), Stochastic EM algorithms for parametric and semiparametric mixture models for right-censored lifetime data, Computational Statistics, Volume 31, Issue 4, pages 1513-1538. https://link.springer.com/article/10.1007/s00180-016-0661-7
Related functions:
plotweibullRMM
, summary.mixEM
.
Other models and algorithms for censored lifetime data
(name convention is model_algorithm):
expRMM_EM
,
spRMM_SEM
.
n = 500 # sample size m = 2 # nb components lambda=c(0.4, 0.6) shape <- c(0.5,5); scale <- c(1,20) # model parameters set.seed(321) x <- rweibullmix(n, lambda, shape, scale) # iid ~ weibull mixture cs=runif(n,0,max(x)+10) # iid censoring times t <- apply(cbind(x,cs),1,min) # censored observations d <- 1*(x <= cs) # censoring indicator ## set arbitrary or "reasonable" (e.g., data-driven) initial values l0 <- rep(1/m,m); sh0 <- c(1, 2); sc0 <- c(2,10) # Stochastic EM algorithm a <- weibullRMM_SEM(t, d, lambda = l0, shape = sh0, scale = sc0, maxit = 200) summary(a) # Parameters estimates etc plotweibullRMM(a) # plot of St-EM sequences plot(a, which=2) # or equivalently, S3 method for "mixEM" object
n = 500 # sample size m = 2 # nb components lambda=c(0.4, 0.6) shape <- c(0.5,5); scale <- c(1,20) # model parameters set.seed(321) x <- rweibullmix(n, lambda, shape, scale) # iid ~ weibull mixture cs=runif(n,0,max(x)+10) # iid censoring times t <- apply(cbind(x,cs),1,min) # censored observations d <- 1*(x <= cs) # censoring indicator ## set arbitrary or "reasonable" (e.g., data-driven) initial values l0 <- rep(1/m,m); sh0 <- c(1, 2); sc0 <- c(2,10) # Stochastic EM algorithm a <- weibullRMM_SEM(t, d, lambda = l0, shape = sh0, scale = sc0, maxit = 200) summary(a) # Parameters estimates etc plotweibullRMM(a) # plot of St-EM sequences plot(a, which=2) # or equivalently, S3 method for "mixEM" object
Evaluates a weighted kernel density estimate, using a Gaussian kernel, at a specified vector of points.
wkde(x, u=x, w=rep(1, length(x)), bw=bw.nrd0(as.vector(x)), sym=FALSE)
wkde(x, u=x, w=rep(1, length(x)), bw=bw.nrd0(as.vector(x)), sym=FALSE)
x |
Data |
u |
Points at which density is to be estimated |
w |
Weights (same length as |
bw |
Bandwidth |
sym |
Logical: Symmetrize about zero? |
A vector of the same length as u
Benaglia, T., Chauveau, D., and Hunter, D. R. (2009), An EM-like algorithm for semi- and non-parametric estimation in multivariate mixtures, Journal of Computational and Graphical Statistics, 18, 505-526.
Benaglia, T., Chauveau, D., Hunter, D. R., and Young, D. (2009), mixtools: An R package for analyzing finite mixture models. Journal of Statistical Software, 32(6):1-29.
# Mixture with mv gaussian model set.seed(100) m <- 2 # no. of components r <- 3 # no. of repeated measures (coordinates) lambda <- c(0.4, 0.6) mu <- matrix(c(0, 0, 0, 4, 4, 6), m, r, byrow=TRUE) # means sigma <- matrix(rep(1, 6), m, r, byrow=TRUE) # stdevs centers <- matrix(c(0, 0, 0, 4, 4, 4), 2, 3, byrow=TRUE) # initial centers for est blockid = c(1,1,2) # block structure of coordinates n = 100 x <- rmvnormmix(n, lambda, mu, sigma) # simulated data a <- npEM(x, centers, blockid, eps=1e-8, verb=FALSE) par(mfrow=c(2,2)) u <- seq(min(x), max(x), len=200) for(j in 1:2) { for(b in 1:2) { xx <- as.vector(x[,a$blockid==b]) wts <- rep(a$post[,j], length.out=length(xx)) bw <- a$bandwidth title <- paste("j =", j, "and b =", b) plot(u, wkde(xx, u, wts, bw), type="l", main=title) } }
# Mixture with mv gaussian model set.seed(100) m <- 2 # no. of components r <- 3 # no. of repeated measures (coordinates) lambda <- c(0.4, 0.6) mu <- matrix(c(0, 0, 0, 4, 4, 6), m, r, byrow=TRUE) # means sigma <- matrix(rep(1, 6), m, r, byrow=TRUE) # stdevs centers <- matrix(c(0, 0, 0, 4, 4, 4), 2, 3, byrow=TRUE) # initial centers for est blockid = c(1,1,2) # block structure of coordinates n = 100 x <- rmvnormmix(n, lambda, mu, sigma) # simulated data a <- npEM(x, centers, blockid, eps=1e-8, verb=FALSE) par(mfrow=c(2,2)) u <- seq(min(x), max(x), len=200) for(j in 1:2) { for(b in 1:2) { xx <- as.vector(x[,a$blockid==b]) wts <- rep(a$post[,j], length.out=length(xx)) bw <- a$bandwidth title <- paste("j =", j, "and b =", b) plot(u, wkde(xx, u, wts, bw), type="l", main=title) } }
Functions to compute weighted quantiles and the weighted interquartile range.
wquantile(wt = rep(1,length(x)), x, probs, already.sorted = FALSE, already.normalized = FALSE) wIQR(wt = rep(1,length(x)), x, already.sorted = FALSE, already.normalized = FALSE)
wquantile(wt = rep(1,length(x)), x, probs, already.sorted = FALSE, already.normalized = FALSE) wIQR(wt = rep(1,length(x)), x, already.sorted = FALSE, already.normalized = FALSE)
wt |
Vector of weights |
x |
Vector of data, same length as |
probs |
Numeric vector of probabilities with values in [0,1]. |
already.sorted |
If FALSE, sort |
already.normalized |
If FALSE, normalize |
wquantile
uses the findInterval
function. wIQR
calls the wquantile
function.
Returns the sample quantiles or interquartile range of a discrete distribution with
support points x
and corresponding probability masses wt
IQR(1:10) wIQR(x=1:10) # Note: Different algorithm than IQR function wIQR(1:10,1:10) # Weighted quartiles are now 4 and 8
IQR(1:10) wIQR(x=1:10) # Note: Different algorithm than IQR function wIQR(1:10,1:10) # Weighted quartiles are now 4 and 8