Bioinformaticsのお勉強

二つの正規母集団の母平均の差の信頼係数1-aの信頼区間を求める

%formula.tex
\documentclass{jsarticle}
\usepackage{amsmath}

\begin{document}
\title{\TeX 母平均の差の信頼区間}
\author{kappa}
\date{Sunday September 30}
\maketitle
確率変数 $X$, $Y$ に関して以下の条件を前提にする。\\
%align requires amsmath package.
\begin{align}
X \stackrel{}{\sim} N(\mu_1, \sigma _2 ^2) \\
Y \stackrel{}{\sim} N(\mu_2, \sigma _2 ^2)
\end{align}
いまから、母平均の差$\mu_1 - \mu_2$の信頼区間について考える。\\
母分散が等しいとは仮定できるとすると、\\
\begin{align}
\sigma_1^2 = \sigma_2^2 = \sigma^2
\end{align}
母分散が共通の二つの標本を合併したものから、合併した分散(pooled variance):$s$を定義する。\\
\begin{align}
s^2 = \frac{\sum_{i=1}^m (X_i - \bar X)^2 + (Y_i - \bar Y)^2}{m + n - 2}\notag \\
= \frac{(m-1)s_1^2 + (n-1)s_2^2}{m + n - 2}
\end{align}
二標本を標準化すると
\begin{align}
Z = \frac{(X - Y) -(\mu_1 - \mu_2)}{\sqrt{(\frac{1}{m} + \frac{1}{n})\sigma^2}}\\
Z \stackrel{}{\sim} N(1, 0)
\end{align}
二標本のt統計量は
\begin{align}
t = \frac{Z}{\sqrt{s^2/\sigma^2}}\\
= \frac{(X - Y) -(\mu_1 - \mu_2)}{s \sqrt{\frac{1}{m} + \frac{1}{n}}}\\
t \stackrel{}{\sim} t(m + n - 2)
\end{align}
したがって、
\begin{align}
P(-t_{\alpha / 2} (m + n - 2) \leq \frac{(X - Y) -(\mu_1 - \mu_2)}{s \sqrt{\frac{1}{m} + \frac{1}{n}}} \leq P(-t_{\alpha / 2} (m + n - 2))
= 1 - \alpha
\end{align}
となる。差$\mu_1 - \mu_2$に関して解くと、信頼係数$1-\alpha$の信頼区間は
\begin{align}
[\bar X - \bar Y - t_{\alpha/2}(m + n - 2) s \sqrt{\frac{1}{m}} , \bar X - \bar Y + t_{\alpha/2}(m + n - 2) s \sqrt{\frac{1}{m}} ]
\end{align}
差$\mu_1 - \mu_2$の$1-\alpha$の信頼区間が0を含まない時、二群の母平均に有意差があるといえる。
\end{document}

platexによる数式集

platexを用いて少し体裁の整った数式集を作ってみました。

以下にスクリプトも記述しておきます。

$ vim formula.tex

%formula.tex
\documentclass{jsarticle}
\usepackage{amsmath}
\begin{document}
\title{\TeX による数式集}
\author{kappa}
\date{Friday September 28}
\maketitle
\begin{enumerate}
\item
\begin{verbatim}
\[ \sum_{k=1}^5 a_k = a_1 + a_2 + a_3 + a_4 + a_5 \]
\end{verbatim}
\[ \sum_{k=1}^5 a_k = a_1 + a_2 + a_3 + a_4 + a_5 \]
\item
\begin{verbatim}
\[ y=\frac{1}{1-x} \]
\end{verbatim}
\[ y=\frac{1}{1-x} \]
\item
\begin{verbatim}
\[ A = \left(
\[ \begin{array}{@{\,}cccc@{\,}}
a_{11} & a_{12} & \ldots & a_{1n} \\
a_{21} & a_{22} & \ldots & a_{2n} \\
\vdots & \vdots & \ddots & \vdots \\
a_{m1} & a_{m2} & \ldots & a_{mn} \\
\end{array}
\right) ]\
\end{verbatim}
\[ A = \left(
\begin{array}{@{\,}cccc@{\,}}
a_{11} & a_{12} & \ldots & a_{1n} \\
a_{21} & a_{22} & \ldots & a_{2n} \\
\vdots & \vdots & \ddots & \vdots \\
a_{m1} & a_{m2} & \ldots & a_{mn} \\
\end{array}

\right) \]

\item

\begin{verbatim}

% requiring "usepackage{amsmath}" at the head of the script

\[ A = \begin{pmatrix}

a_{11} & a_{12} & \ldots & a_{1n} \\

a_{21} & a_{22} & \ldots & a_{2n} \\

\vdots & \vdots & \ddots & \vdots \\

a_{m1} & a_{m2} & \ldots & a_{mn} \\

\end{pmatrix} \]

\end{verbatim}

\[ A = \begin{pmatrix}

a_{11} & a_{12} & \ldots & a_{1n} \\

a_{21} & a_{22} & \ldots & a_{2n} \\

\vdots & \vdots & \ddots & \vdots \\

a_{m1} & a_{m2} & \ldots & a_{mn} \\

\end{pmatrix} \]

\end{enumerate}

\end{document}

$ platex formula.tex

$ dvipdfm formula.dvi
#dvipdfmのコマンドを使用しないと、Macの場合は日本語が文字化けしてしまう。

TeXworks、TeXShopの紹介

texで数式を文中に含む文書を書くとき、どうしてもスクリプトを書くのとコンパイルの作業の行き来が面倒です。

そこで、TeXworksやTeXShopを用いると、簡単にtexのファイルをコンパイルしてpdfファイルにすることができます。

どちらもhttp://www.tug.org/mactex/からMacTeX をダウンロードしてインストールすれば、自動的に使用可能となります。

スクリプトの修正も迅速にできます。

#TeXworks

#TeXShop

カイ二乗分布と分散の推定

正規分布N(μ, σ^2)に従う母集団から、大きさnの無作為標本x1,x2, ..., xnが得られたとき、母分散σ^2の点推定量は、不偏分散Vである。

##########################

\documentclass{jarticle}

\begin{document}

Normal distribution
\begin{eqnarray}
\sigma ^2 = V = \frac{S}{n - 1} = \frac{1}{n-1}\sum_{i=1}^n (x_i - \bar{ x})^2
\end{eqnarray}
\pagestyle{empty}
\end{document}

##########################

このσ^2の信頼区間を求めるには、S/σ^2の分布が必要になる。
標準正規分布(0, 1^2)からとられた大きさnの無作為標本x1,x2, ... , xuについて
χ^2 = x1^2 + x2^2 …　+ xn^2
の分布(カイの二乗)であるf(χ^２）は、

Rで描写

#################################

#カイ2乗分布
png("120927_chi.png")
curve(dchisq(x,1), from=0, to=20, lty=1, xlab="Z", ylab="Tn(Z)", ylim=c(0,0.8), main = "chi square distribution")
abline(h=0)
curve(dchisq(x,3), add=T, lty=2)
curve(dchisq(x,5), add=T, lty=5)
curve(dchisq(x,7), add=T, lty=7)
legend(x=15, y=0.8, lty=c(1,2,3,4), legend=c("n+1", "n=3", "n=5", "n=7"))
dev.off()

#################################

ここでさらに
正規分布N(μ, σ^2)に従う母集団から、大きさnの無作為標本x1,x2, ..., xnが得られたとき

は、自由度nのカイ二乗分布に従う。
同様に、母集団の平均値が不明のとき、標本平均を推定値として採用した式、

は自由度n-1のカイ二乗分布に従う。標本平均を使用した場合、自由に動きうる変数の個数が一つ減るため、自由度はn - 1n

#################################

\documentclass{jarticle}

\begin{document}

unbiased variance

\begin{eqnarray}

z = {(x_1 - \bar{\mu})^2 + (x_2 - \bar{\mu})^2 + \cdots + (x_n - \bar{\mu}^2} / \sigma ^2

\end{eqnarray}

\pagestyle{empty}

\end{document}

#################################

S/σ^2がn-1の自由度のカイ二乗分布に従うこと利用すると、

こにょうにして、

Normal distribution

正規分布の確率密度関数を標準偏差を0.5, 1.0, 2.0と変化させて描写します。

正規分布の確率密度関数は以下のものです。

########################

Normal distribution

\begin{eqnarray}

f(x) = \frac{1}{\sqrt{2 \pi \sigma}} e^{- \frac{(x - \mu)^2}{2 \sigma ^2} }

\end{eqnarray}

\pagestyle{empty}

\end{document}

#######################

Rのスクリプトです。

#script.R

#正規分布

png("120923_normal distribution.png")

#σ=0.5の正規分布、横軸、縦軸のラベル、縦軸の目盛,タイトル

curve(dnorm(x,0,0.5), from=-7,to=7, lty=2, xlab="x", ylab="y", ylim=c(0,0.8), main="Normal distribution")

#縦軸のベースライン

abline(h=0)

#σ=1.0の正規分布、add=T:図を重ねる

curve(dnorm(x,0,1.0),add=T, lty=1)

#σ=2.0の正規分布、add=T:図を重ねる

curve(dnorm(x,0,2.0),add=T, lty=3)

#Figure legend

legend(x=4,y=0.6,lty=c(1,2,3),legend=c("σ=1","σ=0.5","σ=2.0"))

dev.off()

$ R

> source("script.R")

q()

Bioinformaticsのお勉強

二つの正規母集団の母平均の差の信頼係数1-aの信頼区間を求める

platexによる数式集

TeXworks、TeXShopの紹介

カイ二乗分布と分散の推定

Normal distribution

自己紹介

過去のブログ♪♪♪