Workflow validation #16

royfrancis · 2019-03-27T21:38:08Z

Hi, I am testing the meffil workflow with a publicly available dataset to see if I get the same results. But, I am unable to find any DMPs. I was able to reproduce the results using minfi but not using meffil.

So I am running this minfi workflow and the data is in this R package which needs to be installed from source.

Here is the code I use. Perhaps, I am doing something wrong or there is some explanation. One sample is discarded due to high detection p-value, cell type info is not used, norm is quantile. Also, I am unsure how to model contrasts with variables that have more than two levels.

library(meffil)
options(mc.cores=10)

# read data --------------------------------------------------------------------
dataDirectory <- system.file("extdata", package = "methylationArrayAnalysis")
ss <- meffil.read.samplesheet(dataDirectory,pattern="sampleSheet.csv")

m_qc <- meffil.qc(ss,cell.type.reference="andrews and bakulski cord blood",verbose=T)
qc_summary <- meffil.qc.summary(m_qc,parameters=meffil.qc.parameters(detectionp.samples.threshold=0.1))
meffil.qc.report(qc_summary,output.file="meffil-qc-report.html")

# remove bad samples -----------------------------------------------------------
outlier <- qc_summary$bad.samples
m_qc <- meffil.remove.samples(m_qc,outlier$sample.name)
ss <- ss[ss$Sample_Name!="11",]

# normalisation ----------------------------------------------------------------
pc <- meffil.plot.pc.fit(m_qc)
m_norm <- meffil.normalize.quantiles(m_qc,number.pcs=2)
m_beta <- meffil.normalize.samples(m_norm)

for (i in 1:length(m_norm)) {
  m_norm[[i]]$samplesheet$Sample_Source <- as.factor(m_norm[[i]]$samplesheet$Sample_Source)
  m_norm[[i]]$samplesheet$Sample_Group <- as.factor(m_norm[[i]]$samplesheet$Sample_Group)
}

norm_params <- meffil.normalization.parameters(m_norm,control.pcs=1:5,
                                               batch.pcs=1:5,batch.threshold=0.01)

mpcs <- meffil.methylation.pcs(m_beta,probe.range=20000)
norm_summary <- meffil.normalization.summary(m_norm, pcs=mpcs,parameters=norm_params)
meffil.normalization.report(norm_summary, output.file="meffil-norm-report.html")

# ewas -------------------------------------------------------------------------

# select two levels to compare
ss1 <- filter(ss,Sample_Group=="act_naive" | Sample_Group=="act_rTreg")
m_beta1 <- m_beta[,ss1$Sample_Name]

cvar <- ss1[,"Sample_Source",drop=F]
cvar$Sample_Source <- factor(cvar$Sample_Source)

# sva off due to error
m_ewas <- meffil.ewas(m_beta1,variable=factor(ss1$Sample_Group),covariates=cvar,sva=F,isva=F,smartsva=F)
ewas_summary <- meffil.ewas.summary(m_ewas,m_beta1)
meffil.ewas.report(ewas_summary, output.file="meffil-ewas-report.html")

# number of probes with fdr<0.05
pv <- m_ewas$analyses$all$table[,"fdr",drop=F]
nrow(pv[pv$fdr<0.05,])

perishky · 2019-04-17T10:47:01Z

Sorry about the silence on this. Thank you for pointing out this problem. I might realistically be able to have a look at this in more detail in May.

It looks like you are doing everything correctly. The final line could be simpler: sum(pv$fdr < 0.05).

It could help if you provided some output for the CpG site associations you were expecting to see, e.g. a boxplot showing the methylation levels by group of those CpG site(s) in the meffil and minfi normalized datasets, a scatterplot of the methylation levels for those site(s) meffil vs minfi, and the summary statistics for those site(s) from meffil and minfi. I might be able to determine the reason for the discrepancy by looking at those.

royfrancis · 2019-04-19T09:38:03Z

Hi, Thanks for looking into this.

I have a small correction to the script above. I changed cell.type.reference="andrews and bakulski cord blood" to cell.type.reference=NA, because this dataset is not Cord Blood. The minfi workflow does not use cell type info for their workflow. Anyhow, this does not seem to affect meffil results.

Here is a plot of few top significant probes using minfi workflow. The top probes are selected from a testing between act_naive-act_rTreg. The values used in the plot is quantile beta values to make it comparable with the meffil plot. The actual data used in minfi for DM is M values.

Here is the same probes using meffil's beta values.

It seems like the groups are distinct, but they don't show up as significant in meffil results. Here are the meffil FDR values for these probes.

                 fdr
cg27470208 0.7580112
cg26788216 0.7302481
cg18808929 0.8199669
cg16789764 0.7302481
cg16266809 0.7302481
cg15253304 0.7302481
cg06994787 0.7593767
cg05501327 0.7302481
cg04434994 0.7302481
cg00355656 0.7302481

The full script used for minfi workflow, meffil workflow and plots is attached here.
workflow.zip

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Workflow validation #16

Workflow validation #16

royfrancis commented Mar 27, 2019

perishky commented Apr 17, 2019

royfrancis commented Apr 19, 2019

Workflow validation #16

Workflow validation #16

Comments

royfrancis commented Mar 27, 2019

perishky commented Apr 17, 2019

royfrancis commented Apr 19, 2019