|Abstracts of invited talks|
Stochastic (or spatial) processes are used in many aspects of the analysis of complex computer models. They are used to create fast emulators of the computer model if needed; they are used to model bias (or discrepancy from reality) of the computer model; and they are used to provide `random inputs' that can be of value in diagnosing model faults. In addition, many computer models themselves utilize stochastic processes in their construction. We will discuss some of the issues - both conceptual and computational - in utilizing stochastic processes for the analysis of computer models.
Stochastic compartmental models of the SEIR type are often used to make inferences on epidemic processes from partially observed data in which only removal times are available. For many epidemics, the assumption of constant removal rates is not plausible. We develop methods for models in which these rates are a time-dependent step function. A reversible jump MCMC algorithm is described that permits Bayesian inferences to be made on model parameters, particularly those associated with the step function. The method is applied to two datasets on outbreaks of smallpox and a respiratory disease. The analyses highlight the importance of allowing for time dependence by contrasting the predictive distributions for the removal times and comparing them with the observed data.
The analysis of spatial point process data has historically been plagued by computational difficulties. Likelihoods often feature intractable integrals that must be approximated in some way. The problem is exacerbated when such models are incorporated in a fully hierarchical framework, since this requires evaluation of such integrals at every iteration of the associated Markov chain Monte Carlo (MCMC) algorithm. Existing Bayesian approaches to such problems are few, and typically involve several approximations whose accuracy is difficult to evaluate. This paper offers an approach for multivariate spatial point process modeling that attempts to reduce the level of approximation while incorporating two types of covariates: those whose values are specific to the locations of the points, and those that are not. We also allow for imprecise determination of some subset of the spatial locations. We illustrate the importance and applicability of our methods using a collection of breast cancer case locations collected over the mostly rural northern part of the state of Minnesota. The observed process is bivariate in that it includes some women who opted for mastectomy, and some who opted for breast conserving surgery (BCS, or ``lumpectomy"), which is less disfiguring but requires 6 weeks of follow-up radiation therapy. The key covariate substantively (driving distance to the nearest radiation treatment facility) is spatially referenced, but many other important covariates (notably age and stage) are not. Our bivariate spatial point process approach resolves the of whether women who face long driving distances opt overmuch for mastectomy, while still properly accounting for all sources of spatial and nonspatial variability in the data.
For non-linear non-Gaussian state-space models, Bayesian inference is usually carried out using either Markov chain Monte Carlo (MCMC) for smoothing and parameter estimation or Sequential Monte Carlo (SMC) for filtering and marginal likelihood evaluation. However MCMC techniques can perform poorly for complex models it is difficult to design 'good' proposal distributions to sample the latent process. In this talk we demonstrate how it is possible to use SMC within an MCMC framework to develop a new class of efficient MCMC algorithms. These algorithms rely on the introduction of a non-standard set of auxiliary variables. The underlying idea is not limited to state-space models and also allows us to develop new algorithms to perform Bayesian inference in cases where the likelihood function admits an intractable normalizing constant.
We will present some recent work on flexible non-parametric modeling of epidemiologic data collected through the case-control sampling design. The Dirichlet process prior which is a cornerstone in Bayesian non-parametrics will be used to achieve robust inference in epidemiological problems where standard frequentist methods are not optimal due to large dimensionality or restricted assumptions on the parameter space. In particular, we will discuss the problem of retrospective modeling of case-control data for studying gene-environment interactions in a semiparametric Bayesian framework. The special feature of gene-environment interaction studies is that in many situations it is scientifically plausible to assume that the genetic and environmental exposures are independent in the underlying population. It has been established that one may exploit this constraint on the space of exposure distributions in order to derive more efficient estimation techniques than the traditional prospective logistic regression analysis (Chatterjee and Carroll, 2005). However, the efficient estimates from the retrospective likelihood may be severely biased under the violation of the independence assumption. Stratification effects present in the population could potentially introduce non-independence among genetic factors and environmental exposures. We will provide a novel semiparametric Bayesian approach to model stratification effects under the assumption of gene-environment independence. We will then propose an alternative to relax the constraint of gene-environment independence in a natural Bayesian framework to strike a compromise between efficiency and robustness. Time permitting, we will also discuss non-parametric Bayesian analysis of familial effects in family-based case-control studies. The collaborators on material related to the presented work are: Malay Ghosh (University of Florida), Nilanjan Chatterjee (National Cancer Institute), Li Zhang (Cleveland Clinic Foundation) and Samiran Sinha (Texas A & M ).
Hidden Markov models (HMMs) and related models have almost become a stapleware in statistics during the last 15-20 years, with applications in diverse areas like speech and other statistical signal processing, hydrology, financial statistics and econometrics, bioinformatics etc. Inference in HMMs is traditionally often carried out using the EM algorithm, but examples of Bayesian estimation are also frequent in the HMM literature. In this talk I will discuss some situations in which a Bayesian approach to inference in HMMs and related computational techniques are particularly useful; examples include model order selection, continuous-time HMMs and versions of HMMs in which the observed data depend on many hidden variables in an overlapping way. All these examples in some way or another originate from real-data applications, which will be displayed. Bayesian analysis of HMMs is not an off-the-shelf methodology however, and I will also illustrate some of the problems, like poor mixing, one may expect to encounter.