-
Notifications
You must be signed in to change notification settings - Fork 28
Extracting structural variants
To generate copy number variants from Ilumina 450k methylation arrays you will need the original .idat
files for your samples because copy number variation is estimated from array intensities.
Load meffil and set how many cores to use for parallelization
library(meffil)
options(mc.cores=16)
Generate a samplesheet with your samples. The samplesheet can be generated automatically from the idat basenames by giving the directory with idat files or it can be done manually. It should contain at least the following necessary columns: Sample Name
, Sex
(possible values M
, F
or NA
) and Basename
. It tries to parse the basenames to guess if the Sentrix plate and positions are present.
samplesheet <- meffil.create.samplesheet("/path/to/idat/files")
At this point please ensure that the Sample_Name
column contains the actual sample IDs that are being used for the other data types. Please also add the sex values to the Sex
column. Don't change these column names though.
Copy number is estimated by comparison to a reference dataset. One is available from Bioconductor package CopyNumber450kData. To use it, ensure that the package is installed:
BiocManager::install("IlluminaHumanMethylation450kmanifest")
BiocManager::install("IlluminaHumanMethylation450kanno.ilmn12.hg19")
BiocManager::install("CopyNumber450kData")
Note: use 'biocLite' instead of 'BiocManager::install' for older installations of Bioconductor.
CopyNumber450kData is not available in the most recent versions of Bioconductor. If the install fails, then you can install it from source. First download the source file:
https://bioc.ism.ac.jp/packages/3.3/data/experiment/src/contrib/CopyNumber450kData_1.8.0.tar.gz
Then install the package, either from the command line:
R CMD INSTALL CopyNumber450kData_1.8.0.tar.gz
or in R:
install.packages("CopyNumber450kData_1.8.0.tar.gz", repos = NULL, type="source")
(this assumes that the file was downloaded to your current working directory).
Once installed, make the data available to meffil:
library(CopyNumber450kData)
controls <- meffil.add.copynumber450k.references()
Now estimate the CNVs:
cnv_values <- meffil.calculate.cnv(samplesheet, cnv.reference="copynumber450k", verbose=T)
A matrix of genetic copy number variation at each probe can now be generated:
cnv <- meffil.cnv.matrix(cnv_values)
Please save this object to the godmc/input_data
folder:
save(cnv, file="/path/to/godmc/input_data/cnv.RData")
and make sure that the object name that you are saving is cnv
, as this is the name that the pipeline will be expecting. For ARIES comprising 5469 samples, it took 30 hrs to extract cnvs using 6 cores. It takes about 30seconds for each sample.
- Installation
- Sample QC
- Functional normalization
- Functional normalizing separate datasets
- Extracting structural variants
- Estimating cellular composition
- Removing chrX and chrY probes
- Running EWAS
- Extracting CpG annotations
- Extracting SNP annotations
- Extracting detection p-values
- Extracting methylated and unmethylated intensities
- Generate normalization report from normalised betas
- Full pipeline for analysing massive datasets
- Common problems
- Citation