#ncbiのftpサーバーにアクセスする
$ ftp ftp.ncbi.nlm.nih.gov
> cd ./refseq
> ls
dr-xr-xr-x 3 ftp anonymous 4096 Mar 10 2008 B_taurus
dr-xr-xr-x 3 ftp anonymous 4096 Feb 19 2004 D_rerio
-r--r--r-- 1 ftp anonymous 10585 Feb 28 2005 FTP_CHANGE_NOTICE
dr-xr-xr-x 5 ftp anonymous 4096 Mar 13 2008 H_sapiens
dr-xr-xr-x 3 ftp anonymous 4096 Apr 27 2007 LocusLink
dr-xr-xr-x 3 ftp anonymous 4096 Dec 20 2001 M_musculus
-r--r--r-- 1 ftp anonymous 9692 Jun 11 2010 README
dr-xr-xr-x 3 ftp anonymous 4096 Jun 23 2000 R_norvegicus
dr-xr-xr-x 3 ftp anonymous 4096 Mar 10 2008 X_tropicalis
dr-xr-xr-x 3 ftp anonymous 28672 Jul 29 06:19 daily
dr-xr-xr-x 17 ftp anonymous 4096 Jul 8 13:18 release
dr-xr-xr-x 3 ftp anonymous 86016 Jul 29 07:00 removed
dr-xr-xr-x 5 ftp anonymous 8192 Jul 27 03:06 special_requests
dr-xr-xr-x 2 ftp anonymous 4096 Oct 2 2007 uniprotkb
dr-xr-xr-x 3 ftp anonymous 98304 Jul 28 05:58 wgs
> cd D_rerio
> ls
-r--r--r-- 1 ftp anonymous 2762 Mar 10 2008 README
dr-xr-xr-x 3 ftp anonymous 4096 Jul 25 15:29 mRNA_Prot
> cd mRNA_Prot
dr-xr-xr-x 2 ftp anonymous 4096 Jul 12 2007 tmpold
-r--r--r-- 1 ftp anonymous 8947160 Jul 25 15:26 zebrafish.protein.faa.gz
-r--r--r-- 1 ftp anonymous 22553087 Jul 25 15:26 zebrafish.protein.gpff.gz
-r--r--r-- 1 ftp anonymous 18512295 Jul 25 15:26 zebrafish.rna.fna.gz
-r--r--r-- 1 ftp anonymous 53048114 Jul 25 15:26 zebrafish.rna.gbff.gz
> get zebrafish.rna.fna.gz
#ftpサーバーから抜ける
> exit
# .gzを解凍
$ gunzip zebrafish.rna.fna.gz
#解答したファイルの中身の確認
$ less zebrafish.rna.fna
>gi|68369925|ref|XM_696518.1| PREDICTED: Danio rerio si:rp71-1g13.2 (si:rp71-1g13.2), mRNA
ATGGACTCTTTTCAGAAAGAAATAGAGAAGTATGAAGTAGTGATAAGGTTTAAAGAAACAAACCAAGAAATTATAAAGAA
AGCAAACCCATTTGGGTTAACAACTAGCCTGGCAAATAAAATAGGACAGATAGAGTACGCAAAGATCCTTAATGATGGTA
ACCTACTAATAAGATGTGCTGACGCTGGGCAAATGGAAAAAGCCCTAAAAATTAAGGATGTGGTCAAATGTAAGGTGGAG
AATACAGCTAGGGTGGGAATGGGAAGGAAATGTGTAGCTAAAGGGGTGATCACAGGAGTATCATTAAGTATAACAGAAGA
AGAAATGAAAAAGAATATAAAAGGAGCAAAAGTAGTGAATGTTACAAGAATGAAAACAACTAGAGATGGAGAAGCTAAAG
ACAGTAAAACCGTGCTATTAGAATTCGATGAAGTGGTAGTGCCAAAGAAAGTATTTCTTGAATTTGTAAATTATCCAGTG
AGATTGTATGTACCAAAACCATTGAGGTGCTATAACTGCCAAAGATTTGACCACACAGCAAAAATCTGTAATAGGCAAAG
AAGGTGTGCAAGGTGTGGAGGGGATCATGATTACGAAAACTGTGGAGCAGGCGTTCAACCAAAATGTTGCAATTGCGGAG
GTGCTCACAATGTGGCATTCAGTGGATGTGAAGTCATGCAGAGGGAGACAAATATACAAAAGATAAGAGTGGAGAAAAAA
ATCACATACGCTGAAGCGGTTAAAGTGTCAAGAGAAAAGAAAACCAAAGAAAATGAAGTGGTTATGGATTCTCAACAGCA
AGACAATTCAGAGAAAATCTACGTCAAAAATAAAAGAACTAGTAACGTTTATAGCAGGTGTGATAAATAG
>gi|18859030|ref|NM_131477.1| Danio rerio major histocompatibility complex class II DEB gene (mhc2deb), mRNA
ATGTCTTTGCAAAACCTTTTTATTTTTCATCTCCTGTTGTTTCTATTTCCTGACGGGTATTATCACAGTAGGCTTACAAA
ATGCATCTTCCAGCTCCAGGATCTCAGTGACATAGGTGTTCATGATAATTATATCTTCAATAAAGATGTGTACATACGAT
TCAACAGCACTTTGGGGTACTTTGTTGGGTACACTGAACATGGAGTATATAATGCACAATTATGGAAGCAACGATACCAG
CTTCTCGAGCAAGAGAGAGCTCACGAGGATCGATTCTGCAAATACAATGCTGAGATTGACTACAACAACATTCTAGGAAA
AACAGTAAAACCACAGGTTAAGCTTAATTCAGTGAAGCAGGCTGGTGGCAGACAGCCAGCTGTGTTGGTGTGCAGTGCAT
ATGACTTCTATCCCAAAAGAATCAAAGTCACCTGGTTAAGAAATGGTAAACCAGTGACCACTGATGTAACCTCCACTGAG
GAGCTGGCTGATGGGGACTGGTACTACCAAATTCATTCCCACCTGGAATACACCCCCAAATCTGGAGAAAAGATTTCCTG
TATGGTGGATCATGCCAGCTCAACTGAACCCATCATCATAGCCTGGGATTCATCTCTCTCTGAGCCTGAGAGGAATAAAA
TTGCTATTGGAGCATCTGGTTTGGTGCTGGGAATCATCATTGCCACTGCTGGACTCATTTATTACAAGAAGAAATCATCA
GGTCAGTTTAAATAA
>gi|18859028|ref|NM_131706.1| Danio rerio major histocompatibility complex class II DCB gene (mhc2dcb), mRNA
ATGATTTTGTCTGCTTTATTGGAAAAAGTATGTGGAAATTACGGCTATCTTCAAAGTCAATGTCGAGTACTGAGCTCTAC
AAAGAAAGTTGAGCTCATCTTCTCATTCATCTTCAACAAGATTGAATACATTAGATACAACAGTACTGATCAGAAAATTG
TTGGCTACACTGAATTTGGAGAGAAATTTGTTGAAAACTATAAAAATAACACATTTGTGCTAGTCCTGGCTGAGTTTGGG
ATTTACAACTGCAAAAAAATTGCAAAGGCACTAATCTCTGATGGAATGCTGAATCATGTGACAGTGAAACCAGAAGTCAT
TATTCGGTCAGTTACTGAAGCTAAAGGCAATCAGAAAGCTGTCCTGGTGTGCAGTGAATATGACTTTTACCCCAAAGCCA
TTAAACTGACGTGGATGAGGAATGATAAAAGGGTTACAGCTGATGTGACGTCCATTGAGGAGATGGCTGATGGAGACTGG
TATTATCAGATTCACTCCCACCTGGAATATTTTCCTCAACCTGGAGAGAAGATCTCCTGTGTGGTGGATCATGCCAGCTT
CCATAAACCCATGATCTATTACTGGGATCCCTCTCTCCCCGAGACTGAAAGATCTAAGATCATTCTTGGGGCTGTGGGGC
TGCTGATGGGGATCTTTACAGCAGCTGCAGGAGTGATCTATTATAAAAGAAATCAAACAGGTTAG
>gi|18858320|ref|NM_131670.1| Danio rerio ATPase, Na+/K+ transporting, beta 3b polypeptide (atp1b3b), mRNA
TGGCAAGCCCGAGCCGACGCTTTCTTTGATTTGTCCTCATCCATCGCTCTCAAACTGGTTTATCTATCCTCTCCACACTA
TGGCCAACAAAGAGGAGAAAGCTGACGAGAAGCAGTCGAGTTGGAAAGATTTTATCTACAACCCGCGGACAGGGGAATTC
ATCGGGCGCACGGCGAGCAGTTGGGCTCTTATATTCCTCTTTTATTTGGTCTTCTATGGCTTTCTGGCGGGAATGTTCAC
GCTTACCATGTGGGTGATGCTACAGACACTGGATGACCATACTCCCAAATACAGGGACCGAGTGGCCAATCCAGGGCTGA
TGATCAGACCAAGGTCCTTGGATATTGCATTTAACCGGTCTATTCCTCAGCAATACAGCAAGTATGTGCAGCATCTGGAG
#################以下、永遠と続く######################
#blastのインストール
$ sudo apt-get install blast
#blast用データベースの構築
$ formatdb -p F -i zebrafish.rna.fna -n zebrafish
$ ls
formatdb.log zebrafish.nhr zebrafish.nin zebrafish.nsq zebrafish.rna.fna
それでは,current directoryにpositive control としてtp53のfasta形式のmRNA配列情報をtp53.fastaと名前をつけて置きます。ファイルの中身の一行目の遺伝子情報は、何か適当な文に書き換えて置きます。
いよいよblastの実行です。
$ blastall -p blastn -i tp53.fasta -d zebrafish -e 1e-90 | less
最後に結果の一部を掲載しておきます。
###############ここから###########################
BLASTN 2.2.21 [Jun-14-2009]
Reference: Altschul, Stephen F., Thomas L. Madden, Alejandro A. Schaffer,
Jinghui Zhang, Zheng Zhang, Webb Miller, and David J. Lipman (1997),
"Gapped BLAST and PSI-BLAST: a new generation of protein database search
programs", Nucleic Acids Res. 25:3389-3402.
Query= test
(2105 letters)
Database: zebrafish
28,321 sequences; 57,848,304 total letters
Searching..................................................done
Score E
Sequences producing significant alignments: (bits) Value
gi|18859502|ref|NM_131327.1| Danio rerio tumor protein p53 (tp53... 2218 0.0
>gi|18859502|ref|NM_131327.1| Danio rerio tumor protein p53 (tp53),
mRNA
Length = 2105
Score = 2218 bits (1119), Expect = 0.0
Identities = 1173/1200 (97%)
Strand = Plus / Plus
Query: 1 gtttagtggagaggaggtcggcaaaatcaattcttgcaaagcaatggcgcaaaacgacag 60
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Sbjct: 1 gtttagtggagaggaggtcggcaaaatcaattcttgcaaagcaatggcgcaaaacgacag 60
Query: 61 ccaagagttcgcggagctctgggagaagaatttgattattcagcccccaggtggtggctc 120
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Sbjct: 61 ccaagagttcgcggagctctgggagaagaatttgattattcagcccccaggtggtggctc 120
Query: 121 ttgctgggacatcattaatgatgaggagtacttgccgggatcgtttgaccccaannnnnn 180
##################以下続く##########################