Statistical formula

🎯 Summary of equations

QuantityDefinition 
Genotype properties  
MAF estimate for a SNP $j$$\hat p_j = \frac {\bar{x}_j} {2}$ 
Expected MAF sampling variance${SE}^2_{\hat p_j} = \frac {\hat p_j(1 - \hat p_j)} {2N}$ 
Expected variance of a SNP$var(X_j) = 2\hat p_j (1 - \hat p_j)$ 
LD matrix$L = \frac {X^TX}{N}$ 
GRM$A = \frac {XX^T}{M}$ 
LD score of a SNP$l_j = \frac {1}{N^2}X^T_jXX^TX_j$ 
SNP effect estimates  
OLS effect estimate for model with one SNP($\hat \beta_{GWAS}$)$\hat \beta_{j, GWAS} = \frac{X^T_jy}{X^T_jX_j} = \frac{cov(X_j, y)}{var(X_j)}$ 
Mixed linear mode association (MLMA) estimate for one SNP$\beta _{j,MLMA} = \frac {X^T_jV^{-1}y}{X^T_jV{-1}X_j}$ 
OLS effect estimate for model with all SNPs ($\hat \beta_{OLS}$)$\hat \beta_{OLS} = (X^TX)^{-1}X^Ty$ 
BLUP effect estimate$\hat \beta_{BLUP} = (X^TX + \lambda I)^{-1}X^Ty$ 
Precision of SNP effect estimates  
Excepted sampling variance of $\hat \beta^*_{j,GWAS}$$SE^2_{\hat \beta^_j} = var({\hat \beta^_j}{\hat \beta*_j}) \approx \frac{1}{N \times var(X_j)}$

✅ 1. Genotype Properties

1.1 MAF estimate for SNP $j$:

\[\hat p_j = \frac {\bar{x}_j} {2}\]
p_hat_j = mean(x[, j]) / 2

1.2 Expected MAF sampling variance:

\[{SE}^2_{\hat p_j} = \frac {\hat p_j(1 - \hat p_j)} {2N}\]
se2_p_hat_j = p_hat_j*(1 - p_hat_j) / (2 * N)

1.3 Expected variance of a SNP:

\[var(X_j) = 2\hat p_j (1 - \hat p_j)\]
var_x_j = 2 * p_hat_j * (1 - p_hat_j)

1.4 LD matrix:

\[L = \frac {X^TX}{N}\]
L = t(x) %*% x / N