Please add context to descibe the probabilistic imputation. #5

qingjian1991 · 2022-04-09T02:05:38Z

In the third step: "Development and application of probabilistic imputation model" of your GB paper, the DNAm levels of unexpressed genes are estimated by a 2-state mixture of gamma distributions of all expressed genes. Users may want to learn how to build this model from the expression matrices of SCM2 and RMAP. I found that the DNAm levels, the gene expression levels, and the final posterior probabilities are stored in the data("dbSCM2") and data("dbRMAP"). The final posterior probabilities are inferred from the gene expression matrix based on the 2-state mixture of gamma distributions. Could you add some example codes to describe how to infer the posterior probabilities? Your open code will provide valuable learning resources for beginners in bioinformatics.

Below is some code, I have tried.

topIntMR.m <- topIntMRrmap.m
intDNAm.m <- betaRMAPint.m
intExpR.m <- expRMAPint.m
pEgX.m <- pEgXrmap.m

#Get the gene expressions.
exp =c(intExpR.m); exp = exp[!is.na(c(exp))]

#The 2 gramma distributions. We assume one is an exponential distribution.
#One is exponential distribution. a = 1, b = 1/0.235734
mean(exp[exp <= 1.1]) #0.235734

#define function to calculate mode
find_mode <- function(x) { u <- unique(x); tab <- tabulate(match(x, u));u[tab == max(tab)]}

#The other one is gammam distribution.
x = mean(exp[exp > 1.1]) #3.66
y = find_mode(exp[exp > 1.1]) #1.186501 

#get the alpha and beta values.
a = x/(x-y) #1.479176 1.429793
b = 1/(x-y) #0.4038569 0.3903738

#Fit the model, using the estimated prior parameters.
out = gammamixEM(exp,  alpha = c(1, 1.48), beta = c(4.24, 0.40),  epsilon = 1e-06)

out$lambda
#[1] 0.94017109 0.05982891

out$gamma.pars
#comp.1         comp.2
#alpha 1.000000 1.48000000
#beta  2.670065 0.01694109

a = c(1, 1.48); b = c(2.67, 0.0169)
x = exp

#The posterior probabilities
P_exp = dgamma(x, shape = a[2],  rate = b[2])*0.06/( dgamma(x, shape = a[2],  rate = b[2] )*0.06  + dgamma(x, shape = a[1],  rate = b[1])*0.94)

plot(density(P_exp))

The text was updated successfully, but these errors were encountered:

aet21 · 2022-04-12T02:05:47Z

For a given imputable marker gene, the procedure to determine which samples are expressed and which ones are not, is very clearly explained in the Genome Biology paper. This is basic 2-state mixture modelling stuff. For the two databases SCM2 and RMAP, these analyses are precomputed for each imputable gene, as there is no point recalculating these every single time you want to use EpiSCORE, i.e these analyses are completely separate from the actual imputation of the DNAm reference matrix.

qingjian1991 · 2022-04-12T11:44:53Z

For a given imputable marker gene, the procedure to determine which samples are expressed and which ones are not, is very clearly explained in the Genome Biology paper. This is basic 2-state mixture modelling stuff. For the two databases SCM2 and RMAP, these analyses are precomputed for each imputable gene, as there is no point recalculating these every single time you want to use EpiSCORE, i.e these analyses are completely separate from the actual imputation of the DNAm reference matrix.

Thanks, I just want to learn how to do the bayesian inference according to the two-components distributions. I have know how to do that according to this site: https://www.cs.toronto.edu/~rgrosse/csc321/mixture_models.pdf. By the way, my code provides the right soultion in R.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Please add context to descibe the probabilistic imputation. #5

Please add context to descibe the probabilistic imputation. #5

qingjian1991 commented Apr 9, 2022 •

edited

Loading

aet21 commented Apr 12, 2022

qingjian1991 commented Apr 12, 2022 •

edited

Loading

Please add context to descibe the probabilistic imputation. #5

Please add context to descibe the probabilistic imputation. #5

Comments

qingjian1991 commented Apr 9, 2022 • edited Loading

aet21 commented Apr 12, 2022

qingjian1991 commented Apr 12, 2022 • edited Loading

qingjian1991 commented Apr 9, 2022 •

edited

Loading

qingjian1991 commented Apr 12, 2022 •

edited

Loading