Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Please add context to descibe the probabilistic imputation. #5

Open
qingjian1991 opened this issue Apr 9, 2022 · 2 comments
Open

Comments

@qingjian1991
Copy link

qingjian1991 commented Apr 9, 2022

Hi @aet21 :

In the third step: "Development and application of probabilistic imputation model" of your GB paper, the DNAm levels of unexpressed genes are estimated by a 2-state mixture of gamma distributions of all expressed genes. Users may want to learn how to build this model from the expression matrices of SCM2 and RMAP. I found that the DNAm levels, the gene expression levels, and the final posterior probabilities are stored in the data("dbSCM2") and data("dbRMAP"). The final posterior probabilities are inferred from the gene expression matrix based on the 2-state mixture of gamma distributions. Could you add some example codes to describe how to infer the posterior probabilities? Your open code will provide valuable learning resources for beginners in bioinformatics.

Below is some code, I have tried.

topIntMR.m <- topIntMRrmap.m
intDNAm.m <- betaRMAPint.m
intExpR.m <- expRMAPint.m
pEgX.m <- pEgXrmap.m

#Get the gene expressions.
exp =c(intExpR.m); exp = exp[!is.na(c(exp))]

#The 2 gramma distributions. We assume one is an exponential distribution.
#One is exponential distribution. a = 1, b = 1/0.235734
mean(exp[exp <= 1.1]) #0.235734

#define function to calculate mode
find_mode <- function(x) { u <- unique(x); tab <- tabulate(match(x, u));u[tab == max(tab)]}

#The other one is gammam distribution.
x = mean(exp[exp > 1.1]) #3.66
y = find_mode(exp[exp > 1.1]) #1.186501 

#get the alpha and beta values.
a = x/(x-y) #1.479176 1.429793
b = 1/(x-y) #0.4038569 0.3903738

#Fit the model, using the estimated prior parameters.
out = gammamixEM(exp,  alpha = c(1, 1.48), beta = c(4.24, 0.40),  epsilon = 1e-06)

out$lambda
#[1] 0.94017109 0.05982891

out$gamma.pars
#comp.1         comp.2
#alpha 1.000000 1.48000000
#beta  2.670065 0.01694109

a = c(1, 1.48); b = c(2.67, 0.0169)
x = exp

#The posterior probabilities
P_exp = dgamma(x, shape = a[2],  rate = b[2])*0.06/( dgamma(x, shape = a[2],  rate = b[2] )*0.06  + dgamma(x, shape = a[1],  rate = b[1])*0.94)

plot(density(P_exp))

image

@aet21
Copy link
Owner

aet21 commented Apr 12, 2022

For a given imputable marker gene, the procedure to determine which samples are expressed and which ones are not, is very clearly explained in the Genome Biology paper. This is basic 2-state mixture modelling stuff. For the two databases SCM2 and RMAP, these analyses are precomputed for each imputable gene, as there is no point recalculating these every single time you want to use EpiSCORE, i.e these analyses are completely separate from the actual imputation of the DNAm reference matrix.

@qingjian1991
Copy link
Author

qingjian1991 commented Apr 12, 2022

For a given imputable marker gene, the procedure to determine which samples are expressed and which ones are not, is very clearly explained in the Genome Biology paper. This is basic 2-state mixture modelling stuff. For the two databases SCM2 and RMAP, these analyses are precomputed for each imputable gene, as there is no point recalculating these every single time you want to use EpiSCORE, i.e these analyses are completely separate from the actual imputation of the DNAm reference matrix.

Thanks, I just want to learn how to do the bayesian inference according to the two-components distributions. I have know how to do that according to this site: https://www.cs.toronto.edu/~rgrosse/csc321/mixture_models.pdf. By the way, my code provides the right soultion in R.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants