BISP5
Fifth Workshop on
BAYESIAN INFERENCE IN STOCHASTIC PROCESSES
Valencia (Spain), June 14-16, 2007
 


Abstracts of invited contributed talks


Nicolas Chopin* and Elisa Varini: Likelihood inference for continuous-time hidden Markov models
We consider continuous-time models where the observed process depends on an unobserved jump Markov Process. We develop a sequential Monte Carlo algorithm for filtering and smoothing this latent process, and computing the likelihood pointwise. We show how this algorithm can be rao-blackwellised, in such a way that the Monte Carlo evaluated likelihood has smaller variance, is a smooth function of the parameter, and can be differentiated. This makes it possible to perform maximum likelihood estimation, using standard maximisation algorithms. We also derive a Monte Carlo EM algorithm, based on the Rao-Blackwelised particle algorithm. The focus of the paper is on models where the observed process is Poisson, but we discuss how to extend the approach to other situations, such as switching diffusions processes. We illustrate our approach with a seismological application.



Pierpaolo De Blasi* and Nils Lid Hjort: Bayesian survival analysis in proportional hazard models with logistic relative risk
The traditional Cox proportional hazards regression model uses an exponential relative risk function. We argue that under various plausible scenarios, the relative risk part of the model should be bounded, suggesting also that the traditional model often might overdramatise the hazard rate assessment for individuals with unusual covariates. This motivates our working with proportional hazards models where the relative risk function takes a logistic form. We provide frequentist methods, based on the partial likelihood, and then go on to semiparametric Bayesian constructions. These involve a Beta process for the cumulative baseline hazard function and any prior with a density, for example that dictated by a Jeffreys type argument, for the regression coefficients. The posterior is derived using machinery for L\'evy processes, and a simulation recipe is devised for sampling from the posterior distribution of any quantity. Our methods are illustrated on real data. A Bernstein-von Mises theorem is reached for our class of semiparametric priors, guaranteeing asymptotic normality of the posterior processes.



Paul Fearnhead: Bayesian analysis of the structure of GC content in the Human Genome
We consider inference for multiple changepoint models where there is an underlying Markov dependence between segments. We show how iid samples can be drawn from the posterior distribution for a class of such models. The models we consider have been motivated by an application in inferring structure in GC content in the human genome. We show that our Bayesian approach performs better than existing methods at segmenting the human genome into what are known as "Isochores", and also in producing smoothed estimates of GC content that correlate with other biological processes.



Gonzalo García-Donato*, Jesús Palomo and Rui Paulo: Validation of computer models with multivariate functional outputs
The use of mathematically based computer models for the study of scientific and engineering processes is becoming very popular in the last years. Usually, these models are designed to emulate expensive physical experiments. The key question in evaluating a computer model is: Does the computer model adequately represents reality? A six-step process for computer model validation is set out in Bayarri et al. (2005), based on comparison of computer model runs with field data of the process being modeled. More recently, Bayarri et al. (2007) use this methodology to validate computer models with irregular functional output, using a wavelet representation of the functions. In this work, we extend these ideas to handle computer models producing multiple related functional outputs. Implementation of this strategy is considered in an application involving functional data arising from road load dynamics of vehicles.



Andrew Golightly* and Darren J. Wilkinson: Bayesian inference for nonlinear multivariate diffusion processes
It is well known that likelihood inference for arbitrary nonlinear diffusion processes observed at discrete times is problematic since closed form transition densities are rarely available. One popular treatment of the problem involves the introduction of latent data points between every pair of observations to allow an Euler-Maruyama approximation of the unknown transition densities to become accurate. Markov chain Monte Carlo (MCMC) methods can then be used to sample the posterior distribution of latent data and model parameters. However, naive schemes suffer from a mixing problem that worsens with the degree of augmentation. We therefore implement a global MCMC scheme that uses a change of variables to overcome this problem. The methodology is applied to the estimation of parameters governing the diffusion approximation of a simple prokaryotic auto-regulatory gene network, using incomplete data that is subject to error.



Lancelot F. James and John W. Lau*: A class of generalized hyperbolic continuous time integrated stochastic volatility likelihood models
This paper discusses and analyzes a class of likelihood models which are based on two distributional innovations in financial models for stock returns. That is, the notion that the marginal distribution of aggregate returns of log-stock prices are well approximated by generalized hyperbolic distributions, and that volatility clustering can be handled by specifying the integrated volatility as a random process such as that proposed in a recent series of papers by Barndorff-Nielsen and Shephard (BNS). The BNS models produce likelihoods for aggregate returns which can be viewed as a subclass of latent regression models where one has n conditionally independent Normal random variables whose mean and variance are representable as linear functionals of a common unobserved Poisson random measure. James (2005) recently obtains an exact analysis for such models yielding expressions of the likelihood in terms of quite tractable Fourier-Cosine integrals. Here, our idea is to analyze a class of likelihoods, which can be used for similar purposes, but where the latent regression models are based on n conditionally independent models with distributions belonging to a subclass of the generalized hyperbolic distributions and whose corresponding parameters are representable as linear functionals of a common unobserved Poisson random measure. Our models are perhaps most closely related to the Normal inverse Gaussian/GARCH/A-PARCH models of Brandorff-Nielsen (1997) and Jensen and Lunde (2001), where in our case the GARCH component is replaced by quantities such as INT-OU processes. It is seen that, importantly, such likelihood models exhibit quite different features structurally. One nice feature of the model is that it allows for more flexibility in terms of modelling of external regression parameters.



Paloma Botella-Rocamora, Antonio López-Quílez and Miguel-Ángel Martínez-Beneito*: Linking spatio-temporal disease mapping information
Disease mapping techniques have became an important tool in epidemiology during the last few years. These techniques, many times based in the bayesian paradigm, have repeatedly shown in the epidemiological literature the role of specific risk factors for health problems at a reasonable cost. Although several proposals have been successfully stated, in order to make dependent spatial or temporal neighboring observations, there is not a similar agreement on how to link information in both space and time simultaneously in the disease mapping context. This work proposes an autoregressive approach to spatio-temporal disease mapping fusing ideas from autoregressive time series, in order to link information in time; and spatial modelling, to link information in space at every location. In fact, every risk estimate in every region is made to depend on such estimation in the previous period at the same time that it depends on risk estimates on contiguous regions. The former model will be applied to the spatio-temporal description of lung cancer in women in Comunitat Valenciana from 1987 to 2004, with a yearly disaggregation.



Alicia Quirós and Raquel Montes Diez*: Bayesian Inference for Detecting Brain Activity in Fmri
We are concerned with modelling and analysing fMRI data. fMRI is a non-invasive technique for obtaining pixel images of brain activity in response to specified periods of stimulation, actions or cognitive processes. A series of images is obtained over time under two different conditions, and regions of activity are detected by observing differences in blood magnetism due to haemodynamic response. In this work we propose a spatio-temporal model for detecting brain activity in functional magnetic resonance images. The model makes no assumptions about the shape or form of activated areas, except that they emit higher signals in response to a stimulus than non-activated areas do, and that they form connected regions as characterised by Markov Random Field (MRF) prior distributions. Due to the model complexity, we use Markov Chain Monte Carlo (MCMC) methods to make inference over the parameters. A simulated study is used to check the model applicability and sensitivity. The Bayesian spatial prior distributions provide a framework for detecting active regions much as a neurologist might; based on posterior evidence over a wide range of spatial scales, simultaneously considering the level of the voxel magnitudes along with the size of the activated area. A single spatio-temporal Bayesian model allows to obtain more information about the corresponding magnetic resonance study. In spite of the higher computational cost, a spatio-temporal model improves the inference ability since it takes into account not only the uncertainty in the spatial dimension but also in the temporal one.



Raquel Prado: Sequential estimation of features associated with states of mental alertness in EEG signals
Mental fatigue is one of the main causes of human performance failures that can lead to accidents in vehicle operation, air traffic control and space missions. Automatic detection of early signs of mental fatigue that trigger appropriate countermeasures is key for increasing safety and human performance. EEG signals recorded in human subjects who performed continuous mental arithmetic for a period of 90-180 minutes are studied. Several tests confirmed that individuals were alert prior to the experiment and showed signs of severe fatigue after the experiment ended. Analyses of the EEG signals have shown that changes in frontal theta and parietal alpha EEG rhythms over time can be associated with two or more states of mental alertness. Mixtures of structured autoregressive models are used to represent different brain states over time. Sequential Bayesian Monte Carlo methods are used for posterior estimation.



Fabio Rigat: Parallel hierarchical sampling
Markov chain Monte Carlo (MCMC) is currently the most popular tool to compute approximate posterior inferences for Bayesian models. MCMC algorithms using multiple chains such as parallel tempering (PT) ensure better mixing with respect to singe-chain samplers when the posterior distribution of interest is multimodal. This talk introduces the parallel hierarchical sampler (PHS) as a generalisatiom of parallel tempering. First the PHS transition kernel is proved to dominate that of PT in the sense of Peskun. Second, a method to update the temperatures indexing the chains used within the PHS sampler is illustrated. The precision of the PHS sampler is also compared empirically with those of the single-chain Metropolis-Hastings algorithm and that of PT for three Bayesian model selection problems, namely Gaussian clustering, covariate selection for the Normal linear regression model and the selection of the structure for a survival CART model.



Marc Suchard: Phylogenetic repeated measures models via a Brownian diffusion process
Studies of gene expression profiles in response to external perturbation generate repeated measures data that generally follow non-linear curves. To explore the evolution of such profiles across a gene family, we introduce phylogenetic repeated measures (PR) models. These models draw strength from two forms of correlation in the data. Through gene duplication, the family's evolutionary relatedness induces the first form. The second is the correlation across time- points within taxonic units, individual genes in this example. We borrow a Brownian diffusion process along a given phylogenetic tree to account for the relatedness and co-opt a repeated measures framework to model the latter. Through simulation studies, we demonstrate that repeated measures models outperform the previously available approaches that consider the longitudinal observations or their differences as independent and identically distributed by using deviance information criteria as Bayesian model selection tools; PR models that borrow phylogenetic information also perform better than non-phylogenetic repeated measures models when appropriate. We then analyze the evolution of gene expression in the yeast kinase family using splines to estimate non-linear behavior across three perturbation experiments. Again, the PR models outperform previous approaches and afford the prediction of ancestral expression profiles. To demonstrate PR model applicability more generally, we conclude with a discussion of diffusion process extensions for categorical data.



Zhen Wang* and Steven N. MacEachern: Bayesian inference for a distribution-valued stochastic process
We propose two nonparametric Bayesian models that are natural extensions of traditional weighted least squares analysis. One model is appropriate for error distributions in a scale family; the other for data which are totals or averages. As the work extends weighted least squares, we consider a linear mean structure and take a model of normal errors as our starting point. The two models are indistinguishable in the parametric, normal-theory case. They become different when the error distributions are non-normal. For our models, the nonparametric component relies on a smoothed Dirichlet process prior with a normal base measure. Posterior inference is made on the basis of an efficient Gibbs sampler. For the first model, latent variables are introduced to facilitate computation. We motivate and illustrate the performance of the two models with data on average stem units of apple shoots and with birthweights of 4540 boys and 4256 girls.



* denotes the speaker


Inquiries: bisp5@uv.es