University of Cambridge
Colloquium Tea held at 3:15 pm in 101A Crowley Hall
Title: Harnessing Extra Randomness: Replicability, Flexibility and Causality
Abstract: Many modern statistical procedures are randomized in the sense that the output is a random function of data. For example, many procedures employ data splitting, which randomly divides the dataset into disjoint parts for separate purposes. Despite their flexibility and popularity, data splitting and other constructions of randomized procedures have obvious drawbacks. First, two analyses of the same dataset may lead to different results due to the extra randomness introduced. Second, randomized procedures typically lose statistical power because the entire sample is not fully utilized. To address these drawbacks, in this talk, I will study how to properly combine the results from multiple realizations (such as through multiple data splits) of a randomized procedure. I will introduce rank-transformed subsampling as a general method for delivering large sample inference of the combined result under minimal assumptions. I will illustrate the method with three applications: (1) a “hunt-and-test” procedure for detecting cancer subtypes using high-dimensional gene expression data, (2) testing the hypothesis of no direct effect in a sequentially randomized trial and (3) calibrating cross-fit “double machine learning” confidence intervals. For these problems, our method is able to de-randomize and improve power. Moreover, in contrast to existing approaches for combining p-values, our method enjoys type-I error control that asymptotically approaches the nominal level. This new development opens up the possibility of designing procedures that explicitly randomize and de-randomize: extra randomness is introduced to make the problem easier before being removed.