You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
In the third step: "Development and application of probabilistic imputation model" of your GB paper, the DNAm levels of unexpressed genes are estimated by a 2-state mixture of gamma distributions of all expressed genes. Users may want to learn how to build this model from the expression matrices of SCM2 and RMAP. I found that the DNAm levels, the gene expression levels, and the final posterior probabilities are stored in the data("dbSCM2") and data("dbRMAP"). The final posterior probabilities are inferred from the gene expression matrix based on the 2-state mixture of gamma distributions. Could you add some example codes to describe how to infer the posterior probabilities? Your open code will provide valuable learning resources for beginners in bioinformatics.
Below is some code, I have tried.
topIntMR.m <- topIntMRrmap.m
intDNAm.m <- betaRMAPint.m
intExpR.m <- expRMAPint.m
pEgX.m <- pEgXrmap.m
#Get the gene expressions.
exp =c(intExpR.m); exp = exp[!is.na(c(exp))]
#The 2 gramma distributions. We assume one is an exponential distribution.
#One is exponential distribution. a = 1, b = 1/0.235734
mean(exp[exp <= 1.1]) #0.235734
#define function to calculate mode
find_mode <- function(x) { u <- unique(x); tab <- tabulate(match(x, u));u[tab == max(tab)]}
#The other one is gammam distribution.
x = mean(exp[exp > 1.1]) #3.66
y = find_mode(exp[exp > 1.1]) #1.186501
#get the alpha and beta values.
a = x/(x-y) #1.479176 1.429793
b = 1/(x-y) #0.4038569 0.3903738
#Fit the model, using the estimated prior parameters.
out = gammamixEM(exp, alpha = c(1, 1.48), beta = c(4.24, 0.40), epsilon = 1e-06)
out$lambda
#[1] 0.94017109 0.05982891
out$gamma.pars
#comp.1 comp.2
#alpha 1.000000 1.48000000
#beta 2.670065 0.01694109
a = c(1, 1.48); b = c(2.67, 0.0169)
x = exp
#The posterior probabilities
P_exp = dgamma(x, shape = a[2], rate = b[2])*0.06/( dgamma(x, shape = a[2], rate = b[2] )*0.06 + dgamma(x, shape = a[1], rate = b[1])*0.94)
plot(density(P_exp))
The text was updated successfully, but these errors were encountered:
For a given imputable marker gene, the procedure to determine which samples are expressed and which ones are not, is very clearly explained in the Genome Biology paper. This is basic 2-state mixture modelling stuff. For the two databases SCM2 and RMAP, these analyses are precomputed for each imputable gene, as there is no point recalculating these every single time you want to use EpiSCORE, i.e these analyses are completely separate from the actual imputation of the DNAm reference matrix.
For a given imputable marker gene, the procedure to determine which samples are expressed and which ones are not, is very clearly explained in the Genome Biology paper. This is basic 2-state mixture modelling stuff. For the two databases SCM2 and RMAP, these analyses are precomputed for each imputable gene, as there is no point recalculating these every single time you want to use EpiSCORE, i.e these analyses are completely separate from the actual imputation of the DNAm reference matrix.
Thanks, I just want to learn how to do the bayesian inference according to the two-components distributions. I have know how to do that according to this site: https://www.cs.toronto.edu/~rgrosse/csc321/mixture_models.pdf. By the way, my code provides the right soultion in R.
Hi @aet21 :
In the third step: "Development and application of probabilistic imputation model" of your GB paper, the DNAm levels of unexpressed genes are estimated by a 2-state mixture of gamma distributions of all expressed genes. Users may want to learn how to build this model from the expression matrices of SCM2 and RMAP. I found that the DNAm levels, the gene expression levels, and the final posterior probabilities are stored in the
data("dbSCM2")
anddata("dbRMAP")
. The final posterior probabilities are inferred from the gene expression matrix based on the 2-state mixture of gamma distributions. Could you add some example codes to describe how to infer the posterior probabilities? Your open code will provide valuable learning resources for beginners in bioinformatics.Below is some code, I have tried.
The text was updated successfully, but these errors were encountered: