Statistical formula

🎯 Summary of equations

Quantity	Definition
Genotype properties
MAF estimate for a SNP $j$	$\hat p_j = \frac {\bar{x}_j} {2}$
Expected MAF sampling variance	${SE}^2_{\hat p_j} = \frac {\hat p_j(1 - \hat p_j)} {2N}$
Expected variance of a SNP	$var(X_j) = 2\hat p_j (1 - \hat p_j)$
LD matrix	$L = \frac {X^TX}{N}$
GRM	$A = \frac {XX^T}{M}$
LD score of a SNP	$l_j = \frac {1}{N^2}X^T_jXX^TX_j$
SNP effect estimates
OLS effect estimate for model with one SNP($\hat \beta_{GWAS}$)	$\hat \beta_{j, GWAS} = \frac{X^T_jy}{X^T_jX_j} = \frac{cov(X_j, y)}{var(X_j)}$
Mixed linear mode association (MLMA) estimate for one SNP	$\beta _{j,MLMA} = \frac {X^T_jV^{-1}y}{X^T_jV{-1}X_j}$
OLS effect estimate for model with all SNPs ($\hat \beta_{OLS}$)	$\hat \beta_{OLS} = (X^TX)^{-1}X^Ty$
BLUP effect estimate	$\hat \beta_{BLUP} = (X^TX + \lambda I)^{-1}X^Ty$
Precision of SNP effect estimates
Excepted sampling variance of $\hat \beta^*_{j,GWAS}$	$SE^2_{\hat \beta^_j} = var({\hat \beta^_j}	{\hat \beta*_j}) \approx \frac{1}{N \times var(X_j)}$

\[\hat p_j = \frac {\bar{x}_j} {2}\]

Meaning: the estimated frequency of the MAF for SNP $j$, assuming that genotypes are coded as 0/1/2.
R script:

p_hat_j = mean(x[, j]) / 2

\[{SE}^2_{\hat p_j} = \frac {\hat p_j(1 - \hat p_j)} {2N}\]

se2_p_hat_j = p_hat_j*(1 - p_hat_j) / (2 * N)

\[var(X_j) = 2\hat p_j (1 - \hat p_j)\]

Meaning: the genotype variance of SNP $j$, theoretically, consistent with the variance of the variable encoded as 0/1/2.
If we set $p$ as MAF and $q$ as another allele frequency, then $var(X_j) = 2\hat p_j * \hat q_j$
R script:

var_x_j = 2 * p_hat_j * (1 - p_hat_j)

\[L = \frac {X^TX}{N}\]

L = t(x) %*% x / N