Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Workflow validation #16

Open
royfrancis opened this issue Mar 27, 2019 · 2 comments
Open

Workflow validation #16

royfrancis opened this issue Mar 27, 2019 · 2 comments

Comments

@royfrancis
Copy link

Hi, I am testing the meffil workflow with a publicly available dataset to see if I get the same results. But, I am unable to find any DMPs. I was able to reproduce the results using minfi but not using meffil.

So I am running this minfi workflow and the data is in this R package which needs to be installed from source.

Here is the code I use. Perhaps, I am doing something wrong or there is some explanation. One sample is discarded due to high detection p-value, cell type info is not used, norm is quantile. Also, I am unsure how to model contrasts with variables that have more than two levels.

library(meffil)
options(mc.cores=10)

# read data --------------------------------------------------------------------
dataDirectory <- system.file("extdata", package = "methylationArrayAnalysis")
ss <- meffil.read.samplesheet(dataDirectory,pattern="sampleSheet.csv")

m_qc <- meffil.qc(ss,cell.type.reference="andrews and bakulski cord blood",verbose=T)
qc_summary <- meffil.qc.summary(m_qc,parameters=meffil.qc.parameters(detectionp.samples.threshold=0.1))
meffil.qc.report(qc_summary,output.file="meffil-qc-report.html")

# remove bad samples -----------------------------------------------------------
outlier <- qc_summary$bad.samples
m_qc <- meffil.remove.samples(m_qc,outlier$sample.name)
ss <- ss[ss$Sample_Name!="11",]

# normalisation ----------------------------------------------------------------
pc <- meffil.plot.pc.fit(m_qc)
m_norm <- meffil.normalize.quantiles(m_qc,number.pcs=2)
m_beta <- meffil.normalize.samples(m_norm)

for (i in 1:length(m_norm)) {
  m_norm[[i]]$samplesheet$Sample_Source <- as.factor(m_norm[[i]]$samplesheet$Sample_Source)
  m_norm[[i]]$samplesheet$Sample_Group <- as.factor(m_norm[[i]]$samplesheet$Sample_Group)
}

norm_params <- meffil.normalization.parameters(m_norm,control.pcs=1:5,
                                               batch.pcs=1:5,batch.threshold=0.01)

mpcs <- meffil.methylation.pcs(m_beta,probe.range=20000)
norm_summary <- meffil.normalization.summary(m_norm, pcs=mpcs,parameters=norm_params)
meffil.normalization.report(norm_summary, output.file="meffil-norm-report.html")

# ewas -------------------------------------------------------------------------

# select two levels to compare
ss1 <- filter(ss,Sample_Group=="act_naive" | Sample_Group=="act_rTreg")
m_beta1 <- m_beta[,ss1$Sample_Name]

cvar <- ss1[,"Sample_Source",drop=F]
cvar$Sample_Source <- factor(cvar$Sample_Source)

# sva off due to error
m_ewas <- meffil.ewas(m_beta1,variable=factor(ss1$Sample_Group),covariates=cvar,sva=F,isva=F,smartsva=F)
ewas_summary <- meffil.ewas.summary(m_ewas,m_beta1)
meffil.ewas.report(ewas_summary, output.file="meffil-ewas-report.html")

# number of probes with fdr<0.05
pv <- m_ewas$analyses$all$table[,"fdr",drop=F]
nrow(pv[pv$fdr<0.05,])
@perishky
Copy link
Owner

Sorry about the silence on this. Thank you for pointing out this problem. I might realistically be able to have a look at this in more detail in May.

It looks like you are doing everything correctly. The final line could be simpler: sum(pv$fdr < 0.05).

It could help if you provided some output for the CpG site associations you were expecting to see, e.g. a boxplot showing the methylation levels by group of those CpG site(s) in the meffil and minfi normalized datasets, a scatterplot of the methylation levels for those site(s) meffil vs minfi, and the summary statistics for those site(s) from meffil and minfi. I might be able to determine the reason for the discrepancy by looking at those.

@royfrancis
Copy link
Author

Hi, Thanks for looking into this.

I have a small correction to the script above. I changed cell.type.reference="andrews and bakulski cord blood" to cell.type.reference=NA, because this dataset is not Cord Blood. The minfi workflow does not use cell type info for their workflow. Anyhow, this does not seem to affect meffil results.

Here is a plot of few top significant probes using minfi workflow. The top probes are selected from a testing between act_naive-act_rTreg. The values used in the plot is quantile beta values to make it comparable with the meffil plot. The actual data used in minfi for DM is M values.

minfi-act_naive-act_rTreg

Here is the same probes using meffil's beta values.

meffil-act_naive-act_rTreg

It seems like the groups are distinct, but they don't show up as significant in meffil results. Here are the meffil FDR values for these probes.

                 fdr
cg27470208 0.7580112
cg26788216 0.7302481
cg18808929 0.8199669
cg16789764 0.7302481
cg16266809 0.7302481
cg15253304 0.7302481
cg06994787 0.7593767
cg05501327 0.7302481
cg04434994 0.7302481
cg00355656 0.7302481

The full script used for minfi workflow, meffil workflow and plots is attached here.
workflow.zip

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants