Multi-group Gaussian Processes
Gaussian processes (GPs) are pervasive in functional data analysis, machine learning, and spatial statistics for modeling complex dependencies. Modern scientific data sets are typically heterogeneous and often contain multiple known discrete subgroups of samples. For example, in genomics applications samples may be grouped according to tissue type or drug exposure. In the modeling process, it is desirable to leverage the similarity among groups while accounting for differences between them. While a substantial literature exists for GPs over Euclidean domains, GPs on domains suitable for multi-group data remain less explored. In this talk, I'll introduce multi-group Gaussian processes (MGGPs) define on, where is a finite set representing the group label. General methods to construct valid (positive definite) covariance functions on this domain are provided, together with algorithms for inference, estimation, and prediction. The application to gene expression data illustrates the behavior and advantages of the MGGP in the joint modeling of continuous and categorical variables.