Detecting novel associations in large data sets.

TitleDetecting novel associations in large data sets.
Publication TypeJournal Article
Year of Publication2011
AuthorsReshef, DN, Reshef, YA, Finucane, HK, Grossman, SR, McVean, G, Turnbaugh, PJ, Lander, ES, Mitzenmacher, M, Sabeti, PC
JournalScience
Volume334
Issue6062
Pagination1518-24
Date Published2011 Dec 16
ISSN1095-9203
KeywordsAlgorithms, Animals, Baseball, Data Interpretation, Statistical, Female, Gene Expression, Genes, Fungal, Genomics, Humans, Intestines, Male, Metagenome, Mice, Obesity, Saccharomyces cerevisiae
Abstract

Identifying interesting relationships between pairs of variables in large data sets is increasingly important. Here, we present a measure of dependence for two-variable relationships: the maximal information coefficient (MIC). MIC captures a wide range of associations both functional and not, and for functional relationships provides a score that roughly equals the coefficient of determination (R(2)) of the data relative to the regression function. MIC belongs to a larger class of maximal information-based nonparametric exploration (MINE) statistics for identifying and classifying relationships. We apply MIC and MINE to data sets in global health, gene expression, major-league baseball, and the human gut microbiota and identify known and novel relationships.

DOI10.1126/science.1205438
Alternate JournalScience
PubMed ID22174245
PubMed Central IDPMC3325791
Grant List090532 / / Wellcome Trust / United Kingdom
P50 GM068763 / GM / NIGMS NIH HHS / United States
P50 GM068763 / GM / NIGMS NIH HHS / United States
P50 GM068763-09 / GM / NIGMS NIH HHS / United States
T32 GM007753 / GM / NIGMS NIH HHS / United States
U54 GM088558 / GM / NIGMS NIH HHS / United States
U54 GM088558-03 / GM / NIGMS NIH HHS / United States
U54GM088558 / GM / NIGMS NIH HHS / United States