Bowtie2数据比对教程

2024-01-09 16:44:13

一边学习，一边总结，一边分享！

此教程原文链接：Bowtie2数据比对教程

转录组教程

1. 转录组上游分析教程[零基础(完)]

2. 一个转录组上游分析流程 | Hisat2-Stringtie

3. 转录组无参比对教程 | Trinity

写在前面

随着我们教程逐渐发布，我们转录组分析系列教程也逐渐分章节开放。若你有需要，可直接查看转录组上游分析教程[零基础(完)]。

个人笔记，可能会出现一些错误！

若我们的分享对你有用，希望您可以点赞+收藏+转发，这是对小杜最大的支持。

Bowtie2和Bwa是用于短reads的比对软件，bowtie2主要用于50-1000bp的reads进行比对，生产SAM文件。在做转录组数据分析前，会过RNA-seq数据中的tRNA等序列，常常使用bowtie2进行过滤。

Bowtie2的使用手册

Bowtie 2: （https://bowtie-bio.sourceforge.net/bowtie2/manual.shtml#how-is-bowtie-2-different-from-bowtie-1)

2012年发表在Nature Methods期刊

一、bowtie2的安装

使用conda安装

conda install -y bowtie2

使用源码安装
网址：https://sourceforge.net/projects/bowtie-bio/files/bowtie2/

# 直接下载后解压
wget https://nchc.dl.sourceforge.net/project/bowtie-bio/bowtie2/2.5.1/bowtie2-2.5.1-linux-x86_64.zip
unzip bowtie2-2.5.1-linux-x86_64.zip
cd bowtie2-2.5.1-linux-x86_64
## 配置路径即可使用
echo 'PATH=$PATH:/software/bowtie2-2.5.1-linux-x86_64' >> ~/.bashrc

参数：

$ bowtie2 -h
Bowtie 2 version 2.4.5 by Ben Langmead (langmea@cs.jhu.edu, www.cs.jhu.edu/~langmea)
Usage: 
  bowtie2 [options]* -x <bt2-idx> {-1 <m1> -2 <m2> | -U <r> | --interleaved <i> | -b <bam>} [-S <sam>]

二、bowtie的使用

2.1 创建bowtie2的index索引

bowtie2-build [options]* <reference_in> <bt2_index_base>

操作：

bowtie2-build --threads 30 Sl.fa Bowtie2-index/Tomato-bowtie-index

参数：

-threads
	运行线程数量
--large-index 
	使用较大的索引。一般情况下基因组大于4G的时候，考虑使用大索引。

2.2 Bowtie2的比对

Single End：

bowtie2 -p 10 -x 02_Geneome_index/Bowtie2-index/Tomato-bowtie-index -U input.fq -S **.sam 2> **.bowtie2.log

Paired End：

bowtie2 -p 10 -x 02_Geneome_index/Bowtie2-index/Tomato-bowtie-index -1  **_1.fq.gz -2 **_2.fq.gz -S **.sam 2> **.bowtie2.log

可以使用管道符|进行sort排序

bowtie2 -p 10 -x 02_Geneome_index/Bowtie2-index/Tomato-bowtie-index -U input.fq | samtools sort  -O bam  -@ 10 -o - > output.bam

Bowti2参数设置：

必须参数：

-x 
	bowtie-bulid建立的索引
-1 
	双端测序中的第一个文件
-2 
	双端测序中的第二个文件
-U
	单端测序
-S 
	生成的Sam文件

输入参数（可选参数）

-q 
	输入的文件为fastq格式
-qseq
	输入文件为QSEQ格式
-f 
	输入的文件为fa的格式。选择此项，--ignore-quals也被选择
-r
  输入的文件，每一行代表每一条序列，没有序列名和测序质量等。

其他参数可以bowtie2 -h查看
3. 比对参数：

-N 
	进行种子比对时允许mismatch数，允许设置0或1。default：0
-L
	设置种子的长度
-i 
	设置两个相邻种子间的所间距的碱基数。
--ignore-quals
	计算错配罚分的时候不考虑碱基质量，当输入序列模式为-f，-r或-c的时候，该设置自动成默认设置。
--nofw/--norc
	--nofw设置reads不和前导链进行比对，--norc设置reads和后随链进行比对。
--end-to-end
	比对是将整个reads和参考序列进行比对，该模式下--ma的值为0

得分罚分参数：

--ma 
	设置匹配得分，--local模式下每个read上碱基和参考序列上发碱基匹配。在--end-to-end模式下无效,default：2
--mp MX,MN
	设定错配罚分。最大值MX，最小角值MN。default：MX = 6， MN = 2
--np 
	当匹配位点中read,reference上有不确定碱基时设定的罚分值。default：1
--ref
	设置reference上打开gap罚分<int1>z，延长gap罚分<int2>. defualt:5,3

Paried end参数：

-I/ --minins <int>
	设定最小的插入片段长度。default：0
-X/ --maxins 
	设定最长的插入片段长度。default：500
--fr / --rf / --ff

--no-mixed
	默认设置下，一堆reads不能成对比对到参考基因序列上，则单独比对上每个reads进行比对。

输出参数：

-t/--time          print wall-clock time taken by search phases
--un <path>        write unpaired reads that didn't align to <path>
	`将unpaired reads输出到<path>
--al <path>        write unpaired reads that aligned at least once to <path>
	`将至少能比对1次以上的unpaired reads输出
--un-conc <path>   write pairs that didn't align concordantly to <path>
--al-conc <path>   write pairs that aligned concordantly at least once to <path>
    (Note: for --un, --al, --un-conc, or --al-conc, add '-gz' to the option name, e.g.
--un-gz <path>, to gzip compress output, or add '-bz2' to bzip2 compress output.)
	`将输出的reads进行gzip压缩`
--quiet            print nothing to stderr except serious errors
--met-file <path>  send metrics to file at <path> (off)
--met-stderr       send metrics to stderr (off)
--met <int>        report internal counters & metrics every <int> secs (1)
--no-unal          suppress SAM records for unaligned reads
--no-head          suppress header lines, i.e. lines starting with @
--no-sq            suppress @SQ header lines
--rg-id <text>     set read group id, reflected in @RG line and RG:Z: opt field
--rg <text>        add <text> ("lab:value") to @RG line of SAM header.
                     Note: @RG line only printed when --rg-id is set.
--omit-sec-seq     put '*' in SEQ and QUAL fields for secondary alignments.
--sam-no-qname-trunc Suppress standard behavior of truncating readname at first whitespace 
                      at the expense of generating non-standard SAM.
--xeq              Use '='/'X', instead of 'M,' to specify matches/mismatches in SAM record.
--soft-clipped-unmapped-tlen Exclude soft-clipped bases when reporting TLEN

若我们的分享对你有用，希望您可以点赞+收藏+转发，这是对小杜最大的支持。

往期文章：

1. 复现SCI文章系列专栏

2. 《生信知识库订阅须知》,同步更新，易于搜索与管理。

3. 最全WGCNA教程（替换数据即可出全部结果与图形）

4. 精美图形绘制教程

精美图形绘制教程

5. 转录组分析教程

转录组上游分析教程[零基础]

一个转录组上游分析流程 | Hisat2-Stringtie

小杜的生信筆記 ，主要发表或收录生物信息学的教程，以及基于R的分析和可视化（包括数据分析，图形绘制等）；分享感兴趣的文献和学习资料!!

若我们的分享对你有用，希望您可以点赞+收藏+转发，这是对小杜最大的支持。

文章来源:https://blog.csdn.net/kanghua_du/article/details/135481332
本文来自互联网用户投稿，该文观点仅代表作者本人，不代表本站立场。本站仅提供信息存储空间服务，不拥有所有权，不承担相关法律责任。如若内容造成侵权/违法违规/事实不符，请联系我的编程经验分享网邮箱：veading@qq.com进行投诉反馈，一经查实，立即删除！