1 Introduction
On very large datasets the computing time of the classical expectationmaximization (EM) algorithm (Dempster et al., 1977) as well as its variants such as MCEM, SAEM, MCMCSAEM and others is long, since all data are used in every iteration. To circumvent this problem, a bunch of EMtype algorithms have been proposed, namely various minibatch (Neal and Hinton, 1999; Liang and Klein, 2009; Karimi et al., 2018; Nguyen et al., 2019) and online (Titterington, 1984; Lange, 1995; Cappé and Moulines, 2009; Cappé, 2011) versions of the EM algorithm. They all consist in using only a part of the observations during one iteration in order to accelerate convergence. While online algorithms process a single observation per iteration handled in the order of arrival, minibatch algorithms use larger, randomly chosen subsets of observations. The size of these subsets of data is called minibatch size. Choosing large minibatch sizes entails long computing times, while very small minibatch sizes as well as online algorithms may result in a loss of accuracy of the algorithm. Hence, an optimal minibatch size would achieve a compromise between accuracy and computing time. However this issue is generally overlooked.
In this article, we propose a minibatch version of the MCMCSAEM algorithm (Delyon et al., 1999; Kuhn and Lavielle, 2004). The original MCMCSAEM algorithm is a powerful alternative to EM when the Estep is intractable. This is particularly interesting for nonlinear models or nonGaussian models, where the unobserved data cannot be simulated exactly from the conditional distribution. Moreover, the MCMCSAEM algorithm is also more computing efficient than the MCMCEM algorithm, since only a single instance of the latent variable is sampled at every iteration of the algorithm. Nevertheless, when the dimension of the latent variable is huge, the sampling step can be time consuming. From this point of view the here proposed minibatch version is computationally more efficient than the original MCMCSAEM, since at each iteration only a small proportion of the latent variable is simulated and only the corresponding data are visited. For curved exponential models, we prove almostsure convergence of the sequence of estimates generated by the minibatch MCMCSAEM algorithm under classical conditions as the number of iterations of the algorithm increases. Moreover, we assess in simulation experiments the influence of the minibatch size on the speedup of the convergence in various models and highlight that an appropriate choice of the minibatch size results in an important gain. We also present insights on the effect of the minibatch size on the limit distribution of the estimates. Numerical results illustrate our findings.
2 Latent variable model and algorithm
This section first presents the general latent variable model and the required assumptions. Then the original MCMCSAEM algorithm is described, before presenting the new minibatch version of the MCMCSAEM algorithm.
2.1 Model and assumptions
Consider the common latent variable model with observed (incomplete) data and latent (unobserved) variable . Denote the dimension of the latent variable . In many latent variable models, corresponds to the number of observations, but it is not necessary that and have the same size or that each observation depends only on a single latent component .
Denote
the model parameter of the joint distribution of the complete data
. In what follows, omitting all dependencies in the observations , which are considered as fixed realizations in the analysis, we assume that the completedata likelihood function has the following form(1) 
where denotes the scalar product,
denotes a vector of sufficient statistics of the model with values in
and and are functions on . The posterior distribution of the latent variables given the observations, is denoted by .2.2 Description of MCMCSAEM algorithm
The original MCMCSAEM algorithm proposed by Kuhn and Lavielle (2004) consists in replacing the Estep in the EMalgorithm by a simulation step combined with a stochastic approximation step. Here, we focus on a version where the MCMC part is a MetropolisHastingswithinGibbs algorithm (Robert and Casella, 2004). More precisely, at iteration , the following three steps are performed.
Simulation step
A new realization of the latent variable is sampled from an ergodic Markov transition kernel , whose stationary distribution is the posterior distribution . In practice, one iteration of a MetropolisHastingswithinGibbs algorithm is used. We consider a collection of symmetric random walk Metropolis kernels defined on and acting only on the th coordinate, see Fort et al. (2003). More precisely, let be the canonical basis of . Then, for each starting from the vector , the proposal in the direction of is given by , where is sampled from a symmetric increment density
. This proposal is then accepted with probability
Stochastic approximation step
The approximation of the sufficient statistic is updated by
(2) 
where is a decreasing sequence of positive stepsizes such that and . That is, the current approximation of the sufficient statistic is a weighted mean of its previous value and the value obtained by the current simulation step.
Maximization step
The model parameter is updated by
with where
Depending on the model the maximization problem may not have a closedform solution.
2.3 Minibatch MCMCSAEM algorithm
When is large, the simulation step can be very timeconsuming. Indeed, simulating all components of the latent variable at every iteration is costly in time. Thus, according to the spirit of other minibatch algorithms, updating only a part of the latent components may speed up the convergence of the algorithm. With this idea in mind, denote the (mean) proportion of components of the latent variable that are updated during one iteration.
Minibatch simulation step
In the minibatch version of the MCMCSAEM algorithm the simulation step consists of two parts. First, select the indices of the components of the latent variable that will be updated. That is, we sample the number
of indices from a binomial distribution Bin
and then select randomly indices among without replacement. Denote this set of selected indices at iteration . Next, at iteration , instead of sampling all the components , we may sample only the components with index from the Markov kernel using one iteration of a MetropolisHastingswithinGibbs algorithm.Stochastic approximation and maximization step
These two steps are identical to the original SAEM algorithm. However, in many models the evaluation of the sufficient statistic can be performed by an update of a small part of the components of its previous value . This may be computationally more efficient than computing naively.
Initialization
Initial values , and for the model parameter, the sufficient statistic and the latent variable, respectively, have to be chosen by the user or at random.
See Algorithm 1 for a complete description of the algorithm.
3 Convergence of the algorithm
In this section we show that the minibatch MCMCSAEM algorithm converges as the number of iterations increase under classical conditions (which are the ones that ensure convergence of the original MCMCSAEM algorithm). Indeed, compared to classical MCMCSAEM algorithm, the minibatch version involves an additional stochastic part that comes from the selection of indexes of the latent variable to be updated. This additional randomness is ruled by the value of .
3.1 Equivalent descriptions
The above description of the simulation step is convenient for achieving maximal computing efficiency. We now focus on an equivalent framework that underlines the fact that our minibatch algorithm is a special case of the MCMCSAEM classical algorithm. For two kernels , we denote their composition by
With this notation at hand, the MetropolisHastingswithinGibbs relies on the kernel . Now, in the minibatch simulation step, the update of only a part of the latent variable is equivalent to generating latent vectors according to the Markov kernel defined as the following recursive composition
(3) 
where denotes the Dirac measure at . That is, is the composition of mixtures of the original kernels and a deterministic component with mixing weight . In other words, the minibatch MCMCSAEM algorithm corresponds to the original MCMCSAEM algorithm with a particular choice of the transition kernel. Note that it can also be seen as a mixture over different trajectories (the choice of indexes to be updated) of MetropolisHastingswithinGibbs kernels acting on a subpart of the latent vector .
A third description of the algorithm will be appropriate for the analysis of its theoretical properties. In order to stress the role of the minibatch procedure, we denote by the sequence of latent variables obtained by the minibatch algorithm with minibatch size . In the th minibatch simulation step, for each
, we sample a Bernoulli random variable
with parameter . This is an indicator of whether the latent variable is updated at iteration . Next, we sample a realization from the transition kernel and set(4) 
When , the sequence generated by the batch algorithm is a Markov chain with transition kernel .
Fort et al. (2003) establish results on the geometric ergodicity of hybrid samplers and in particular for the randomscan Gibbs sampler. The latter is defined as , where each is a kernel on acting only on the th component. More generally the randomscan Gibbs sampler may be defined as where
is a probability distribution. This means that at each step of the algorithm, only one component
is drawn from the probability distribution and then updated. These probabilities may be chosen uniformly () or for e.g. can help focusing on a component that is more difficult to simulate. We generalize their results to a setup where at each step , we rely on a kernel iterated from a randomscan Gibbs sampler as follows(5)  
where denotes the identity kernel . Note that this is not exactly the kernel corresponding to the algorithm described above, as here this one could formally update the same component more than once in one step of the algorithm. Nonetheless, we neglect this effect and establish our result for the algorithm corresponding to this kernel.
3.2 Assumptions and result
Assume that the random variables are defined on the same probability space . We denote the increasing family of algebras generated by the random variables . Assume that the following regularity conditions on the model hold.
 (M1)

The parameter space is an open subset of . The complete data likelihood function is given by
where is a continuous function on taking its values in an open subset of . Moreover, the convex hull of is included in and for all

(M2) The functions and are twice continuously differentiable on .

(M3) The function defined as is continuously differentiable on .

(M4) The observeddata loglikelihood function defined as is continuously differentiable on and

(M5) Define as There exists a continuously differentiable function , such that
We now introduce the usually required conditions for proving convergence of the SAEM procedure.

(SAEM1) For all in , , and .

(SAEM2) The functions and are times differentiable, where we recall that is an open subset of .
For any , we define and its expectation with respect to the posterior distribution denoted by . For any denote . We consider the following additional assumptions.
 (H1)

There exists a constant such that
In addition, there exits such that is a compact set.
 (H2)

The family of symmetric densities is such that there exist some constants and (for ) such that whenever .
 (H3)

There are constants and with such that
and, for any sequence with , we may extract a subsequence with the property that, for some , all and all
 (H4)

There exist , and such that for all ,
To state our convergence result, we consider the version of the algorithm with truncation on random boundaries studied by Andrieu et al. (2005). This additional projection step ensures in particular the stability of the algorithm for the theoretical analysis and is only a technical tool for the proof without any practical consequences.
Theorem 1.
Assume (M1)–(M5), (SAEM1)–(SAEM2) and (H1)–(H4). Then, for all , the sequence generated by the minibatch MCMCSAEM algorithm with corresponding Markov kernel converges towards the set of critical points of the observed likelihood as the number of iterations increases.
Proof of Theorem 1.
The proof consists of two steps. First, we prove the convergence of the sequence of sufficient statistics towards the set of zeros of function using Theorem 5.5 in Andrieu et al. (2005). Then, in a second step, following the usual reasoning for EMtype algorithms, described for instance in Delyon et al. (1999), we deduce that the sequence converges to the set of critical points of the observed data loglikelihood .
First step.
In order to apply Theorem 5.5 in Andrieu et al. (2005), we need to establish that their conditions (A1) to (A4) are satisfied. In what follows, (A1) to (A4) always refer to the conditions stated in Andrieu et al. (2005). First, note that under our assumptions (H1), (M1)–(M5) and (SAEM2), condition (A1) is satisfied. Indeed, this is a consequence of Lemma 2 in Delyon et al. (1999). To establish (A2) and (A3), as suggested in Andrieu et al. (2005), we establish their drift conditions (DRI), see Proposition 6.1 in Andrieu et al. (2005). We first focus on establishing (DRI1) in Andrieu et al. (2005). To this aim, we rely on Fort et al. (2003) that establishes results for the randomscan Metropolis sampler. In their context, they consider a sampler . We generalize their results to our setup according to (5). Following the lines of the proof of Theorem 2 in Fort et al. (2003), we can show that Equations (6.1) and (6.3) appearing in the drift condition (DRI1) in Andrieu et al. (2005) are satisfied as soon as (H2)(H3) hold. Indeed following the strategy developed in Allassonnière et al. (2010), we first establish Equations (6.1) and (6.3) using a drift function depending on namely where is given in (H4). Then we define the common drift function as follow. Let and given in (H4) and define . Then for any compact , there exist two positive constants and such that for all and for all , we get . We then establish Equations (6.1) and (6.3) for this drift function . Moreover, using Proposition 1 and Proposition 2 in Fort et al. (2003) we obtain that Equation (6.2) in (DRI1) from Andrieu et al. (2005) holds. Under assumption (H4) we have the first part of (DRI2) in Andrieu et al. (2005). The second part of the same equation is true with in our case. Finally, (DRI3) in Andrieu et al. (2005) is true in our context with because is twice continuously differentiable, thus Lipschitz on any compact set. To prove this, we decompose the space in the acceptance and rejection regions and consider the integral over four sets leading to four different expressions of the acceptance ratio (see for example proof of Lemma 4.7 in Fort et al., 2015). This concludes that (DRI) and therefore (A2)–(A3) in Andrieu et al. (2005) are satisfied. Notice that (SAEM1) ensures (A4). This concludes the first step of the proof.
Second step.
As the function is continuous, the second step is immediate by applying Lemma 2 in Delyon et al. (1999). ∎
4 Experiments
We carry out various simulation experiments in a frailty model, a nonlinear mixed effects model and a Bayesian deformable template model to illustrate the performance of the proposed minibatch MCMCSAEM algorithm and the potential gain in efficiency.
4.1 Frailty model in survival analysis
In survival analysis the frailty model is an extension of the wellknown Cox model (Duchateau and Janssen, 2008). Indeed, the hazard rate function in the frailty model includes an additional random effect called frailty to account for unexplained heterogeneity.
Model
We observe survival times measured over groups with measurements per group, and covariates . The latent random variables correspond to the frailty terms, each component being the frailty of one group. The
’s are supposed to be i.i.d. with centered Gaussian distribution with variance
. We denote by the baseline hazard function. Here we choose for the Weibull function given bywith and . The conditional hazard rate of observation given the frailty is given by
where . Thus the unknown model parameter is . In practical applications the main interest lies in the estimation of the regression parameter .
Likelihood
In the frailty model the conditional survival function is given by
In other words, the conditional distribution of the survival time given is the Weibull distribution with scale and shape . For the conditional and the complete likelihood function we obtain
where
denotes the density of the standard normal distribution.
The complete likelihood may be written in the form (1), with sufficient statistics where and for .
Simulation step
In the simulation step we use the following sampling procedure. Let be the subset of indices of latent variable components that have to be updated at iteration . For each , we use a MetropolisHastings procedure: first, draw a candidate from the normal distribution , then compute the logarithm of the acceptance ratio given by
Then, for a realization
of the uniform distribution
, we set if , and otherwise.Stochastic approximation step
Compute the stochastic approximation of the sufficient statistics according to Eq. (2), that is, for , compute
where the sequence is chosen as for and otherwise.
Maximization step
The maximization of is explicit in and , with solutions given by
The update of and are done by Newton’s method, as these maximizations are not explicit.
Numerical results
In a simulation study we consider the frailty model with parameters fixed to , and . We set and . The covariates are drawn independently from the uniform distribution for every dataset.
In the first setting, 500 datasets are generated to which the minibatch MCMCSAEM with random initial values and different minibatch sizes is applied, that is, with varying values of the proportion . Figure 1 shows the evolution of the precision of the mean of the estimates of parameter component as a function of the number of passes through the data. That means that the value 10 on the axis, for example, corresponds to 10 iterations in the batch MCMCSAEM () and to 100 iterations in the minibatch MCMCSAEM with . That is, parameter estimators are compared when the different algorithms have visited (approximately) the same amount of data, or, to put it differently, when the algorithms have generated the same number of latent components .
Figure 1 a) shows 80%confidence bands for parameter component
, and graph b) shows the corresponding evolution of the logarithm of the mean squared error. Obviously, for all algorithms estimation improves when the number of passes through the data increases. Moreover, it is clear from both graphs that the rate of convergence depends extremely on the minibatch size. Indeed, at (almost) any number of passes through the data, the mean squared error and the width of the confidence intervals is increasing in the proportion
. In this sense, the best choice that achieves the fastest convergence is a minibatch size corresponding to the smallest value of , here . From graph b) we see that convergence seems to be attained after only three passes through the data when , while almost 30 are required when . For larger minibatch sizes convergence is even much slower.In the second setting, we aim at studying the asymptotic behavior of the estimates, when the algorithms are supposed to have converged. Here, estimates of the different algorithms are compared at a fixed large number of iterations. That is, after iterations the batch algorithm has visited the whole dataset times, while the minibatch version with e.g. proportion has only visited half as much information. To compute the variance of the estimates that is only due to the stochasticity of the algorithm, we fix a dataset and run the minibatch MCMCSAEM algorithm 500 times using different initial values. We choose iterations and consider proportions varying in . Figure 2 illustrates the limit distributions of the estimates of for different minibatch sizes, which seem to be Gaussians centered at the same value but with varying variances. Table 1 gives the corresponding standard deviation of the estimates of , which is clearly decreasing in . This is expected as more data are visited for larger . These results give some insight into the asymptotic behavior of the algorithms and in particular into the impact of the minibatch size on the limit distribution, which is generally overlooked in the literature.
4.2 Nonlinear mixed model for pharmacokinetic study
In this section we consider a classical onecompartment model used in clinical pharmacokinetic studies. The model presented in Davidian and Giltinan (1995) serves to analyze the kinetic of the drug theophylline used in therapy for respiratory diseases. For and , we define
where the observation is the measure of drug concentration on individual at time . The drug dose administered to individual is denoted . The parameters for individual are the volume of the central compartment, the constant of the drug absorption rate, and the drug’s clearance . The random measurement error is denoted by and supposed to have a centered normal distribution with variance . For the individual parameters , and lognormal distributions are considered given by
where are independent following a centered normal distribution with variance . Then the model parameters are .
In a simulation study we estimate the parameters by both the original MCMCSAEM algorithm (see Kuhn and Lavielle, 2005, for implementation details) and our minibatch version. More precisely, one dataset is generated with the following values: . The dose is constant, equal to . The times are such that for all . Then we perform repetitions of the estimation task by using the minibatch MCMCSAEM algorithm with . We set for and otherwise.
The results for parameter are shown in Figures 3 and 4. The other results are similar and therefore omitted. Figure 3 presents the mean parameter estimate sequence with respect to the number of passes through the dataset as in Figure 1 for different values of the proportion . We observe that the smaller the proportion , the faster the sequence of estimates converges, in particular during the first passes through the data. Figure 4 presents for different values of the proportion the estimates of the empirical standard deviation with respect to the number of iterations. We observe that as the number of iterations increases, the standard deviations are lower than for higher values of . This illustrates in particular that including more data in the inference task leads to more accurate estimation results. This is indeed very intuitive. Therefore choosing an optimal value for remains to achieve a tradeoff between speeding up the convergence and involving enough data in the process to get accurate estimates.
4.3 Deformable template model for image analysis
We consider the dense deformation template model introduced in Allassonnière et al. (2007). Such models allow to represent observed images as deformations of a given common reference image called template. We deal with the formulation proposed in Allassonnière et al. (2010). We observe gray level images denoted by . Each image is defined on a grid of pixels where for each , is the location of pixel in a domain . We assume that each image derives from the deformation of a common unknown template , which is a function from to . Furthermore we assume for each image the existence of an unobserved deformation field such that
where are distributed with respect to a normalized Gaussian distribution and denotes the variance. To formulate this complex problem in a simpler parametric way, the template and the deformation are supposed to have parametric forms as follows. Given a set of landmarks denoted by which covers the domain , the template function is parametrized by coefficients through
and is a fixed known kernel. Likewise, for another fixed set of landmarks , the deformation field is given by
where and again, is a fixed known kernel. The variables are the latent variables of this model and are assumed to follow a centered Gaussian distribution with covariance matrix .
We refer to Allassonnière et al. (2010) for further details on the model and for the implemention of the MCMCSAEM algorithm. We estimate all model parameters, namely , with both algorithms  the batch and the minibatch with . For the first experiment we use images from the United States Postal Service database. We present the results obtained on digit as an illustrating example in Figure 5. We observe that during the first iterations of the algorithms the convergence is sped up by using the minibatch version leading to a more contrasted and accurate template estimate after three passes through the dataset.
In the second experiment we assess the effect of the minibatch version in the asymptotic behavior of the algorithm. Therefore we run the batch version on images of the dataset and the minibatch version with on images of the United States Postal Service database, both during iterations to reach asymptotic behavior. Note that choosing a small minibatch size allows us to increase the number of images in the input, while the computing time of both algorithms is of the same order. To assess simultaneously the accuracy of the obtained estimates for the template and the deformation, we generate new samples from the model using both estimates of and . These results are presented in Figure 6. We observe that the samples of the second row look more like usual handwritten digits as those of the database than the ones of the first row, highlighting that both template and deformation are better estimated by the minibatch version performed on images of the dataset. This shows that given a constraint on the computing time, more accuracy can be obtained by using the minibatch MCMCSAEM instead of the original algorithm.
5 Conclusion
The proposed minibatch version of the MCMCSAEM algorithm has a good theoretical foundation, as it is shown to be almost surely convergent whenever the original algorithm is. Moreover, in the simulations carried out in different models, the new algorithm turns out to achieve an important speedup of convergence during the first passes through the data compared to the original algorithm. This opens the way to possibly drastic reductions of the computing time, or to increase accuracy by processing larger datasets and keeping the computing time fixed. Furthermore, our investigations on the limit distribution of the sequence of estimates generated by the algorithm yields: the larger the minibatch size, the more concentrated is the limit distribution.
These results encourage the development of other minibatch EMtype algorithms, as for example for the variational EM algorithm, in order to achieve similar speedups in further models and applications.
Finally, the question about an optimal choice of the minibatch size arises naturally. According to our findings, an optimal value of the proportion must simultaneously achieve two objectives: speeding up the convergence to drastically reduce the computing time, while estimating accurately model parameters. Therefore, it is of great interest to develop an empirical criterion leading to an optimal choice of the minibatch size by operating a tradeoff between accuracy and computing time.
All programs are available on request from the authors.
Acknowledgement
Work partly supported by the grant ANR18CE020010 of the French National Research Agency ANR (project EcoNet).
References
 Allassonnière et al. (2007) Allassonnière, S., Y. Amit, and A. Trouvé (2007). Toward a coherent statistical framework for dense deformable template estimation. J. Roy. Statist. Soc. Ser. B 69, 3–29.
 Allassonnière et al. (2010) Allassonnière, S., E. Kuhn, and A. Trouvé (2010). Construction of Bayesian deformable models via a stochastic approximation algorithm: A convergence study. Bernoulli 16(3), 641–678.
 Andrieu et al. (2005) Andrieu, C., E. Moulines, and P. Priouret (2005). Stability of stochastic approximation under verifiable conditions. SIAM J. Control Optim. 44(1), 283–312.

Cappé (2011)
Cappé, O. (2011).
Online EM algorithm for hidden Markov models.
Journal Computational and Graphical Statistics 20(3), 728–749.  Cappé and Moulines (2009) Cappé, O. and E. Moulines (2009). Online expectationmaximization algorithm for latent data models. J. Roy. Statist. Soc. Ser. B 71(3), 593–613.
 Davidian and Giltinan (1995) Davidian, M. and D. M. Giltinan (1995). Nonlinear Models for Repeated Measurement Data. Chapman & Hall/CRC Press.
 Delyon et al. (1999) Delyon, B., M. Lavielle, and E. Moulines (1999). Convergence of a stochastic approximation version of the EM algorithm. Ann. Statist. 27(1), 94–128.
 Dempster et al. (1977) Dempster, A. P., N. M. Laird, and D. B. Rubin (1977). Maximum likelihood from incomplete data via the EM algorithm. J. Roy. Statist. Soc. Ser. B 39(1), 1–38.
 Duchateau and Janssen (2008) Duchateau, L. and P. Janssen (2008). The frailty model. New York, NY: Springer.
 Fort et al. (2015) Fort, G., B. Jourdain, E. Kuhn, T. Lelièvre, and G. Stoltz (2015). Convergence of the wanglandau algorithm. Mathematics of Computation 84(295), 2297–2327.
 Fort et al. (2003) Fort, G., E. Moulines, G. O. Roberts, and J. S. Rosenthal (2003). On the geometric ergodicity of hybrid samplers. Journal of Applied Probability 40, 123–146.
 Karimi et al. (2018) Karimi, B., M. Lavielle, and E. Moulines (2018). On the convergence properties of the minibatch EM and MCEM algorithms. Unpublished.
 Kuhn and Lavielle (2004) Kuhn, E. and M. Lavielle (2004). Coupling a stochastic approximation version of EM with an MCMC procedure. ESAIM: P&S 8, 115–131.
 Kuhn and Lavielle (2005) Kuhn, E. and M. Lavielle (2005). Maximum likelihood estimation in nonlinear mixed effects models. Comput Stat Data An. 49(4), 1020–1038.
 Lange (1995) Lange, K. (1995, 01). A gradient algorithm locally equivalent to the EM algorithm. J. Roy. Statist. Soc. Ser. B 2(57), 425–437.
 Liang and Klein (2009) Liang, P. and D. Klein (2009). Online EM for unsupervised models. In Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics, NAACL ’09, pp. 611–619. Association for Computational Linguistics.
 Neal and Hinton (1999) Neal, R. M. and G. E. Hinton (1999). A view of the EM algorithm that justifies incremental, sparse, and other variants. In M. I. Jordan (Ed.), Learning in Graphical Models, pp. 355–368. Cambridge, MA, USA: MIT Press.
 Nguyen et al. (2019) Nguyen, H., F. Forbes, and G. McLachlan (2019). Minibatch learning of exponential family finite mixture models. Technical report, arXiv:1902.03335.
 Robert and Casella (2004) Robert, C. P. and G. Casella (2004). Monte Carlo statistical methods (Second ed.). Springer Texts in Statistics. SpringerVerlag, New York.
 Titterington (1984) Titterington, D. M. (1984). Recursive parameter estimation using incomplete data. J. Roy. Statist. Soc. Ser. B 2(46), 257–267.
Comments
There are no comments yet.