-
Notifications
You must be signed in to change notification settings - Fork 28
/
complex-demo.rmd
122 lines (108 loc) · 3.16 KB
/
complex-demo.rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
```{r complex-demo-init, echo=FALSE, message=F}
library(knitr)
library(Cairo)
opts_chunk$set(warning=FALSE, fig.width=6, fig.height=6, dev="CairoPNG", stop=TRUE)
```
# Normalize an 450K, EPIC and EPIC v2 dataset
## Download example data set
```{r child = 'dataset-450k-demo.rmd'}
```
```{r child = 'dataset-epic-demo.rmd'}
```
```{r child = 'dataset-epic2-demo.rmd'}
```
```{r}
path.450k <- download.450k.demo.dataset()
path.epic <- download.epic.demo.dataset()
path.epic2 <- download.epic2.demo.dataset()
```
## Normalize dataset
Create samplesheet
```{r}
library(meffil)
samplesheet.450k <- meffil.create.samplesheet(path=path.450k)
samplesheet.epic <- meffil.read.samplesheet(base=path.epic, pattern="Demo_SampleSheet.csv")
samplesheet.epic2 <- meffil.read.samplesheet(base=path.epic2, pattern="SampleSheet")
samplesheet.450k$chip <- "450k"
samplesheet.epic$chip <- "epic"
samplesheet.epic2$chip <- "epic2"
common.cols <- intersect(colnames(samplesheet.450k),colnames(samplesheet.epic))
common.cols <- intersect(common.cols, colnames(samplesheet.epic2))
samplesheet <- rbind(
samplesheet.450k[1:10,common.cols],
samplesheet.epic[,common.cols],
samplesheet.epic2[,common.cols])
```
Parameters.
```{r}
qc.file <- "complex-demo/qc-report.html"
author <- "Illumina, et al."
study <- "450k/epic/epic2 demo dataset"
number.pcs <- 2
norm.file <- "complex-demo/normalization-report.html"
```
Generate QC objects.
```{r complex-demo-qc, cache=T}
options(mc.core=20)
qc.objects <- meffil.qc(samplesheet, featureset="450k:epic:epic2", cell.type.reference="blood gse35069 complete", verbose=T)
```
QC report.
```{r}
qc.summary <- meffil.qc.summary(qc.objects, verbose=T)
meffil.qc.report(
qc.summary,
output.file=qc.file,
author=author,
study=study)
```
Normalization dataset.
```{r complex-demo-norm, cache=T}
norm.objects <- meffil.normalize.quantiles(
qc.objects,
number.pcs=number.pcs,
verbose=T)
beta.meffil <- meffil.normalize.samples(
norm.objects,
just.beta=T,
cpglist.remove=qc.summary$bad.cpgs$name,
verbose=T)
```
Compute principal components of the normalized methylation matrix.
```{r complex-demo-pcs, cache=T}
pcs <- meffil.methylation.pcs(
beta.meffil,
sites=meffil.get.autosomal.sites("epic"),
verbose=T)
```
Normalization report.
```{r}
parameters <- meffil.normalization.parameters(norm.objects)
parameters$batch.threshold <- 0.01
norm.summary <- meffil.normalization.summary(
norm.objects=norm.objects,
pcs=pcs,
parameters=parameters,
verbose=T)
meffil.normalization.report(
norm.summary,
output.file=norm.file,
author=author,
study=study)
```
It looks like the normalized methylation retains
has a strong association with chip.
```{r}
chip <- samplesheet$chip
stats <- t(apply(pcs,2,function(pc) {
fit <- coef(summary(lm(pc ~ chip)))
fit[which.min(fit[-1,"Pr(>|t|)"])+1,]
}))
stats[stats[,"Pr(>|t|)"] < 0.05,]
```
This isn't surprising given that the data for each
chip comes from a different study.
If there was reason to believe that the the chip
associations were entirely technical,
then we could remove some of this variation
by including 'chip' as a random effect
in the functional normalization.