生信流程搭建(14)家鸡的参考基因下载与注释文件

生信流程搭建(14)家鸡的参考基因下载与注释文件

与生信流程搭建(13)拟南芥参考基因下载与注释文件配置方法类似

了解原鸡的拉丁名:

从部分常见物种拉丁名中文名对照直到家鸡的拉丁名为Gallus gallus

到Ensembl数据库下载

动物参考基因组:http://asia.ensembl.org/index.html

植物参考基因组:http://plants.ensembl.org/index.html

其他真菌细菌等参考基因组:http://ensemblgenomes.org/

然后在这里找对应的家鸡名字:发现没那么复杂,其实就叫chicken

点击然后进到页面

再选择版本

一般都选择toplevel

然后迅雷下载,一般来说,充了会员会快一些

其实也不一定,我就没充会员

下载gtf注释文件

稍微更改一下地址:

下载红框那个即可

用Xftp将文件传送到服务器上

解压

gzip -d Gallus_gallus.GRCg6a.98.gtf.gz

gzip -d Gallus_gallus.GRCg6a.dna.toplevel.fa.gz

查看下gtf文件内容

=======================================================================

以下是构建10X单细胞pipline所需reference的过程,bulk测序的可以忽略以下内容

cellranger 检查并生成指定用于10X pipiline的gtf文件

$cellranger mkgtf Gallus_gallus.GRCg6a.98.gtf Gallus_gallus.GRCg6a.98_new.gtf

/opt/biosoft/cellranger-expression/cellranger-cs/3.1.0/bin

cellranger mkgtf (3.1.0)

Copyright (c) 2019 10x Genomics, Inc. All rights reserved.

-------------------------------------------------------------------------------

Writing new genes GTF file (may take 10 minutes for a 1GB input GTF file)...

...done

为了后面分析流程的需要,在线粒体基因上加上"Mt"标记

需要自己写个Perl或者Python小脚本

python ../add_mt_marker.py Gallus_gallus.GRCg6a.98_new.gtf Gallus_gallus.GRCg6a.98_new2.gtf

mv Gallus_gallus.GRCg6a.98_new2.gtf Gallus_gallus.GRCg6a.98.gtf

less -S Gallus_gallus.GRCg6a.98.gtf

cellranger 检查并生成指定用于10X pipiline的reference

$cellranger mkref --genome=chicken --fasta=Gallus_gallus.GRCg6a.dna.toplevel.fa --genes=Gallus_gallus.GRCg6a.98.gtf

/opt/biosoft/cellranger-expression/cellranger-cs/3.1.0/bin

cellranger mkref (3.1.0)

Copyright (c) 2019 10x Genomics, Inc. All rights reserved.

-------------------------------------------------------------------------------

Creating new reference folder at /share/nas1/Data/Users/luohb/Data/Reference/chicken/chicken

...done

Writing genome FASTA file into reference folder...

...done

Computing hash of genome FASTA file...

...done

Indexing genome FASTA file...

...done

Writing genes GTF file into reference folder...

...done

Computing hash of genes GTF file...

...done

Writing genes index file into reference folder (may take over 10 minutes for a 3Gb genome)...

...done

Writing genome metadata JSON file into reference folder...

...done

Generating STAR genome index (may take over 8 core hours for a 3Gb genome)...

Jan 15 18:01:55 ..... Started STAR run

Jan 15 18:01:55 ... Starting to generate Genome files

Jan 15 18:02:55 ... starting to sort Suffix Array. This may take a long time...

Jan 15 18:02:59 ... sorting Suffix Array chunks and saving them to disk...

Jan 15 18:42:16 ... loading chunks from disk, packing SA...

Jan 15 18:42:51 ... Finished generating suffix array

Jan 15 18:42:51 ... Generating Suffix Array index

Jan 15 18:45:49 ... Completed Suffix Array index

Jan 15 18:45:49 ..... Processing annotations GTF

Jan 15 18:45:55 ..... Inserting junctions into the genome indices

Jan 15 18:52:03 ... writing Genome to disk ...

Jan 15 18:52:04 ... writing Suffix Array to disk ...

Jan 15 18:52:13 ... writing SAindex to disk

Jan 15 18:52:14 ..... Finished successfully

...done.

>>> Reference successfully created! <<<

You can now specify this reference on the command line:

cellranger --transcriptome=/share/nas1/Data/Users/luohb/Data/Reference/chicken/chicken ...

这步有点久= =

新生成的文件目录

$cd chicken/

$tree

.

├── fasta

│ ├── genome.fa

│ └── genome.fa.fai

├── genes

│ └── genes.gtf

├── pickle

│ └── genes.pickle

├── reference.json

└── star

├── chrLength.txt

├── chrNameLength.txt

├── chrName.txt

├── chrStart.txt

├── exonGeTrInfo.tab

├── exonInfo.tab

├── geneInfo.tab

├── Genome

├── genomeParameters.txt

├── SA

├── SAindex

├── sjdbInfo.txt

├── sjdbList.fromGTF.out.tab

├── sjdbList.out.tab

└── transcriptInfo.tab

4 directories, 20 files

保存原始的压缩文件,和说明文档。说明文件来源

cd ..

mkdir source

cd source/

vi README.txt

搞掂~

相关推荐

win11如何录制电脑内部的声音
365国际彩票下载

win11如何录制电脑内部的声音

📅 06-30 👁️ 871
慧书法字典
365bet体育投注官网

慧书法字典

📅 08-01 👁️ 7092
BIM应用 | 桥梁bim软件有哪些?国内市政BIM软件都用哪些?-BIM免费教程