Bowtie 比对
2017-02-24 16:47阅读:
【Bowtie】DNA序列拼接的原理
【Jenny点评】
我一直以为Bowtie是一个短序列拼接工作,实际上这是错误的。它不是序列拼接工作,只是一个序列比对的工具。最后的结果是相对index而言,对各个短序列进行定位。
------------------
短序列比对的原理如何?目前有哪些常用的短序列比对软件?
ok
http://blog.sina.cn/dpool/blog/s/blog_9617895f01011npk.html?vt=4
答:序列比对(alignment):为确定两个或多个序列之间的相似性以至于同源性,而将它们按照一定的规律排列。跟长序列比对不同,短序列比对有其特点,因此,两者的算法不一样。短序列比对中,一般常用的算法主要有三个:
(1)
空位种子片段索引法,如MAQ、ELAND等,首先将读段切分,并选取其中一段或几段作为种子建立搜索索引,再通过查找索引、延展匹配来实现读段定位,通过轮换种子考虑允许出现错配(mismatch)的各种可能的位置组合;
(2)
Burrows
Wheeler转换法,如Bowtie、BWA、SOAP2等,通过B-W转换将基因组序列按一定规则压缩并建立索引,再通过查找和回溯来定位读段,在查找时可通过碱基替代来实现允许的错配;
(3)
Smith-Waterman动态规划算法,如BFAST,SHRiMP等,利用初始条件和迭代关系式计算两个序列的所有可能的
比对分值,并将结果存放于一个矩阵中,利用动态规划的方法回溯寻找最优的比对结果。
华大基因拼接 ok
http://www.ebiotrade.com/newsf/2010-1/2010128171022809.htm
下一代基因序列拼接算法研究
http://www.fdurop.fudan.edu.cn/upload/stu/docs/rcYsXb_102804-1303180458.pdf
基因组测序及分析
Good!推荐看!
http://ibi.zju.edu.cn/bioinplant/courses/chap4.pdf
基因序列拼接算法设计
http://www.doc88.com/p-741680604744.html
【Bowtie】Bowtie2使用方法与参数详细介绍
Bowtie2使用方法与参数详细介绍
懒人必看
Bowtie2 -q --phred33 --sensitive --end-to-end -I 0 -X 500
--fr --un unpaired --al aligned \ --un-conc unconc --al-conc alconc
-p 6 --reorder -x{-1-2| -U} -S []
用法:
bowtie2 [options]* -x {-1 -2 | -U } -S []
-x 由bowtie2-build所生成的索引文件的前缀。首先 在当前目录搜寻,然后 在环境变量
BOWTIE2_INDEXES 中制定的文件夹中搜寻。 -1 双末端测寻对应的文件1。可以为多个文件,并用逗号分开;多个文件必须和
-2 中制定的文件一一对应。比如:'-1 flyA_1.fq,flyB_1.fq -2 flyA_2.fq,flyB _2.fq'.
测序文件中的reads的长度可以不一样。 -2 双末端测寻对应的文件2. -U
非双末端测寻对应的文件。可以为多个文件,并用逗号分开。测序文件中的reads的 长度可以不一样。 -S
所生成的SAM格式的文件前缀。默认是输入到标准输出。
以下是可选参数:
-q 输入的文件为FASTQ格式文件,此项为默认值。 -qseq 输入的文件为QSEQ格式文件。 -f
输入的文件为FASTA格式文件。选择此项时,表示--ignore-quals也被选择了。 -r
输入的文件中,每一行代表一条序列,没有序列名和测序质量等。选择此项时,表示-- ignore-quals也被选择了。 -c
后直接为比对的reads序列,而不是包含序列的文件名。序列间用逗号隔开。选择此项时, 表示—ignore-quals也被选择了。
-s/--skip input的reads中,跳过前个reads或者pairs。 -u/--qupto
只比对前个reads或者pairs(在跳过前个reads或者 pairs后)。Default: no limit.
-5/--trim5 剪掉5'端长度的碱基,再用于比对。(default: 0). -3/--trim3
剪掉3'端长度的碱基,再用于比对。(default: 0). --phred33 输入的碱基质量等于ASCII码值加上33.
在最近的illumina pipiline中 得以运用。 --phred64 输入的碱基质量等于ASCII码值加上64.
--solexa-quals 将Solexa的碱基质量转换为Phred。在老的GA Pipeline版本中得以 运用。Default:
off. --int-quals 输入文件中的碱基质量为用“ ”分隔的数值,而不是ASCII码。比如 40 40 30
40...。Default: off.
–end-to-end模式下的预设参数
--very-fast Same as: -D 5 -R 1 -N 0 -L 22 -i S,0,2.50 --fast
Same as: -D 10 -R 2 -N 0 -L 22 -i S,0,2.50 --sensitive Same as: -D
15 -R 2 -N 0 -L 22 -i S,1,1.15 (default in --end-to-end mode)
--very-sensitive Same as: -D 20 -R 3 -N 0 -L 20 -i
S,1,0.50
–loca模式下的预设参数
–loca模式下的预设参数 --very-fast-local Same as: -D 5 -R 1 -N 0 -L 25
-i S,1,2.00 --fast-local Same as: -D 10 -R 2 -N 0 -L 22 -i S,1,1.75
--sensitive-local Same as: -D 15 -R 2 -N 0 -L 20 -i S,1,0.75
(default in --local mode) --very-sensitive-local Same as: -D 20 -R
3 -N 0 -L 20 -i S,1,0.50
-N 进行种子比对时允许的mismatch数. 可以设为0或者1. Default: 0. -L 设定种子的长度.
************************************************************ 功能选项
给bowtie的一些参数设定值的时候,使用一个计算公式代替,于是值的大小与比对序列的长 度成一定关系。有三部分组成: (a)计算方法,
包括常数(C),线性(L),平方根(S)和 自然对数(G); (b)一个常数; (c)一个系数. 例如: 为 L,-0.4,-0.6
则计算公式为: f(x) = -0.4 + -0.6 * x 为G,1,5.4 则计算公式为: f(x) = 1.0 + 5.4 *
ln(x) ************************************************************
-i 设定两个相邻种子间所间距的碱基数。
************************************************************
例如:如果read的长度为30, 种子的长度为10, 相邻种子的间距为6,则提取出的种子如下 所示: Read:
TAGCTACGCTCTACGCTATCATGCATAAAC Seed 1 fw: TAGCTACGCT Seed 1
rc: AGCGTAGCTA Seed 2 fw: CGCTCTACGC Seed 2 rc: GCGTAGAGCG Seed 3
fw: ACGCTATCAT Seed 3 rc: ATGATAGCGT Seed 4 fw: TCATGCATAA Seed 4
rc: TTATGCATGA
************************************************************
在--end-to-end模式中默认值为”-i S,1,1.15”.即表示f(x) = 1 + 1.15 * sqrt(x).
如果read长度为100, 则相邻种子的间距为12. --n-ceil
设定read中允许含有不确定碱基(非GTAC,通常为N)的最大数目. Default: L,0,0.15. 计算公式为: f(x) =
0 + 0.15 * x, 表示长度为100的read 最多运行存在15个不确定碱基. 一旦不确定碱基数超过15,
则该条read会被过滤掉. --dpad Default: 15. --gbar 在read头尾个碱基内不允许gap.
Default: 4. --ignore-quals 计算错配罚分的时候不考虑碱基质量. 当输入序列的模式为-f, -r 或
者-c的时候, 该设置自动成为默认设置. --nofw/--norc –nofw设定read不和前导链(forward
reference strand)进行比对; --norc设定不和后随链(reverse-complement reference
strand)进行比对. Default: both strands enabled. --end-to-end
比对是将整个read和参考序列进行比对. 该模式--ma的值为0. 该模式为 默认模式, --local模式冲突. --local
该模式下对read进行局部比对, 从而, read两端的一些碱基不比对,从而使比 对得分满足要求. 该模式下
–ma默认为2.
--ma 设定匹配得分. --local模式下每个read上碱基和参考序列上碱基匹配, 则 加分.
在—end-to-end模式中无效. Default: 2. --mp MX,MN 设定错配罚分. 其中MX为所罚最高分,
MN为所罚最低分. 默认设置下罚分与 碱基质量相关. 罚分遵循的公式为: MN + floor( (MX-MN)(MIN(Q,
40.0)/40.0) ). 其中Q为碱基的质量值. 如果设置了—ignore-qual参数, 则错配总是罚最高分. Default:
MX = 6, MN = 2. --np 当匹配位点中read, reference上有不确定碱基(比如N)时所设定的罚分值.
Default: 1. --rdg , 设置在read上打开gap 罚分, 延长gap罚分. Default: 5, 3. --rfg
, 设置在reference上打开gap 罚分, 延长gap罚分 . Default: 5, 3. --score-min
设定成为有效比对的最小分值. 在—end-to-end模式下默认值为: L,-0.6,-0.6; 在--local模式下默认值为:
G,20,8.
-k 默认设置下, bowtie2搜索出了一个read不同的比对结果, 并报告其中最好的
比对结果(如果好几个最好的比对结果得分一致, 则随机挑选出其中一个). 而在该模式下, bowtie2最多搜索出一个read
个比对结果, 并将这些结果按得分降序报告出来. -a 和-k参数一样, 不过不限制搜索的结果数目.
并将所有的比对结果都按降序报告出来. 此参数和-k参数冲突. 值得注意的是: 如果基因组含有很多重复序列时, 该参数会导致程序
运行极其缓慢.
Effort
参数
-D 比对时, 将一个种子延长后得到比对结果, 如果不产生更好的或次好的比对结果, 则该次比对失败.
当失败次数连续达到次后, 则该条read比对结束. Bowtie2才会 继续进行下去. Default: 15.
当具有-k或-a参数, 则该参数所产生的限制会自动调整. -R 如果一个read所生成的种子在参考序列上匹配位点过多.
当每个种子平均匹配超 过300个位置, 则通过一个不同的偏移来重新生成种子进行比对. 则是重新生成种子 的次数. Default:
2.
Paired-end
参数
-I/--minins 设定最小的插入片段长度. Default: 0. -X/--maxins 设定最长的插入片段长度.
Default: 500. --fr/--rf/--ff 设定上下游reads和前导链paired-end比对的方向. --fr:
匹配时, read1在5'端上游, 和前导链一致, read2在3'下游, 和前导链反向互补. 或者read2在 上游,
read1在下游反向互补; --rf: read1在5'端上游, 和前导链反向互补, read2在 3'端下游, 和前导链一致;
--fr: 两条reads都和前导链一致. Default: --fr. 默认
设置适合于Illumina的paired-end测序数据; 若是mate-paired, 则要选择—rf参数. --no-mixed
默认设置下, 一对reads不能成对比对到参考序列上, 则单独对每个read进 行比对. 该选项则阻止此行为.
--no-discordant 默认设置下, 一对reads不能和谐比对(concordant alignment, 即满足-I,
-X, --fr/--rf/--ff的条件)到参考序列上, 则搜寻其不和谐比对(discon cordant alignment,
即两条reads都能独一无二地比对到参考序列上, 但是不满足-I, -X,--fr/--rf/--ff的条件). 该选项阻止此行为.
--dovetail read1和read2的关系为dovetail的时候,该状况算为和谐比对. 默认情况
下dovetail不算和谐比对. --no-contain read1和read2的关系为包含的时候, 该状况不算为和谐比对.
默认情况 下包含关系算为和谐比对. --no-overlap read1和read2的关系为有重叠的时候, 该状况不算为和谐比对.
默认情 况下两个reads重叠算为和谐比对.
-t/--time --un 将unpaired reads写入到. --un-gz 将unpaired
reads写入到, gzip压缩. --un-bz2 将unpaired reads写入到, bz2压缩. --al
将至少能比对1次以上的unpaired reads写入. --al-gz ... ,gzip压缩. --al-bz2 ...
,bz2压缩. --un-conc 将不能和谐比对的paired-end reads写入. --un-conc-gz ...
,gzip压缩. --un-conc-bz2 ... ,bz2压缩. --al-conc
将至少能和谐比对一次以上的paired-end reads写入. --al-conc-gz ... ,gzip压缩.
--al-conc-bz2 ... ,bz2压缩. --quiet 安静模式,除了比对错误和一些严重的错误, 不在屏幕上输出任何东西.
--met-file 将bowtie2的检测信息(metrics)写入文件. 用于debug. Default: metrics
disabled. --met-stderr 将bowtie2的检测信息(metrics)写入标准错误文件句柄. 和上
一个选项不冲突. Default: metrics disabled. --met 每隔秒写入一次metrics记录.
Default: 1.
--no-unal 不记录没比对上的reads. --no-hd 不记录SAM header lines (以@开头).
--no-sq 不记录@SQ的SAM header lines. --rg-id 设定read group Id到. --rg
增加作为一行@RG.
-o/--offrate 无视index的offrate值, 以取代之. Index默认的 值为5.
值必须大于index的offrate值, 同时越大, 耗时越长,耗内存越少. -p/--threads NTHREADS 设置线程数.
Default: 1 --reorder 多线程运算时, 比对结果在顺序上会和文件中reads的顺序不一致, 使用该选 项,
则使其一致. --mm 使用内存定位的I/O来载入index, 而不是常规的文件I/O. 从而使多个bowtie程
序共用内存中同样的index, 节约内存消耗.
--qc-filter 滤除QSEQ fileter filed为非0的reads. 仅当有—qseq选项时有效.
Default: off. --seed 使用作为随机数产生的种子. Default: 0. --version 打印程序版本并退出
-h/--help 打印用法信息并推出
更多详细信息请阅读:
http://bowtie-bio.sourceforge.net/bowtie2/manual.shtml
本文来自:http://www.hzaumycology.com/chenlianfu_blog/?p=178
【Bowtie】BOWTIE2:Manual(参数)
http://bowtie-bio.sourceforge.net/bowtie2/manual.shtml
bowtie2 [options]* -x {-1 -2 | -U } -S []
Main arguments
-x
|
The basename of the index for
the reference genome. The basename is the name of any of the index
files up to but not including the final.1.bt2
/ .rev.1.bt2 /
etc. bowtie2 looks for the
specified index first in the current directory, then in the
directory specified in theBOWTIE2_INDEXES
environment variable.
|
-1
|
Comma-separated list of files
containing mate 1s (filename usually includes
_1), e.g. -1
flyA_1.fq,flyB_1.fq. Sequences specified with this
option must correspond file-for-file and read-for-read with those
specified in . Reads may be a mix of different lengths.
If- is specified,
bowtie2 will read the mate
1s from the 'standard in' or 'stdin' filehandle.
|
-2
|
Comma-separated list of files
containing mate 2s (filename usually includes
_2), e.g. -2
flyA_2.fq,flyB_2.fq. Sequences specified with this
option must correspond file-for-file and read-for-read with those
specified in . Reads may be a mix of different lengths.
If- is specified,
bowtie2 will read the mate
2s from the 'standard in' or 'stdin' filehandle.
|
-U
|
Comma-separated list of files
containing unpaired reads to be aligned, e.g.
lane1.fq,lane2.fq,lane3.fq,lane4.fq.
Reads may be a mix of different lengths. If
- is specified,
bowtie2 gets the reads from
the 'standard in' or 'stdin' filehandle.
|
-S
|
File to write SAM alignments to.
By default, alignments are written to the 'standard out' or
'stdout' filehandle (i.e. the console).
|
-q
|
Reads (specified with ,
, ) are FASTQ files. FASTQ files usually
have extension .fq or .fastq. FASTQ is the
default format. See also: --solexa-quals
and
--int-quals.
|
--qseq
|
Reads (specified with ,
, ) are QSEQ files. QSEQ files usually
end in _qseq.txt. See also: --solexa-quals
and--int-quals.
|
-f
|
Reads (specified with ,
, ) are FASTA files. FASTA files usually
have extension .fa, .fasta, .mfa, .fna
or similar. FASTA files do not have a way of specifying
quality values, so when -f is set, the result is as
if --ignore-quals is also set.
|
-r
|
Reads (specified with ,
, ) are files with one input sequence
per line, without any other information (no read names, no
qualities). When -r is set, the result is as if
--ignore-quals is also set.
|
-c
|
The read sequences are given on
command line. I.e. , and are
comma-separated lists of reads rather than lists of read files.
There is no way to specify read names or qualities, so
-c also implies
--ignore-quals.
|
-s/--skip
|
Skip (i.e. do not align) the
first reads or pairs in the input.
|
-u/--qupto
|
Align the first
reads or read pairs from the input (after the
-s/--skip
reads or pairs have been
skipped), then stop. Default: no limit.
|
-5/--trim5
|
Trim bases from
5' (left) end of each read before alignment (default:
0).
|
-3/--trim3
|
Trim bases from
3' (right) end of each read before alignment (default:
0).
|
--phred33
|
Input qualities are ASCII chars
equal to the Phred
quality plus 33.
This is also called the 'Phred+33' encoding, which is used by the
very latest Illumina pipelines.
|
--phred64
|
Input qualities are ASCII chars
equal to the Phred
quality plus 64.
This is also called the 'Phred+64' encoding.
|
--solexa-quals
|
Convert input qualities from
Solexa
(which can be negative)
toPhred
(which can't). This scheme was
used in older Illumina GA Pipeline versions (prior to 1.3).
Default: off.
|
--int-quals
|
Quality values are represented
in the read input file as space-separated ASCII integers, e.g.,
40 40 30 40..., rather than ASCII
characters, e.g., II?I....
Integers are treated as being on the Phred
quality scale
unless --solexa-quals
is also specified. Default:
off.
|
--very-fast
|
Same as: -D
5 -R 1 -N 0 -L 22 -i S,0,2.50
|
--fast
|
Same as: -D
10 -R 2 -N 0 -L 22 -i S,0,2.50
|
--sensitive
|
Same as: -D
15 -R 2 -L 22 -i S,1,1.15 (default in
--end-to-end
mode)
|
--very-sensitive
|
Same as: -D
20 -R 3 -N 0 -L 20 -i S,1,0.50
|
--very-fast-local
|
Same as: -D
5 -R 1 -N 0 -L 25 -i S,1,2.00
|
--fast-local
|
Same as: -D
10 -R 2 -N 0 -L 22 -i S,1,1.75
|
--sensitive-local
|
Same as: -D
15 -R 2 -N 0 -L 20 -i S,1,0.75(default in
--local
mode)
|
--very-sensitive-local
|
Same as: -D
20 -R 3 -N 0 -L 20 -i S,1,0.50
|
-N
|
Sets the number of mismatches to
allowed in a seed alignment during multiseed
alignment. Can be set to 0
or 1. Setting this higher makes alignment slower (often much
slower) but increases sensitivity. Default: 0.
|
-L
|
Sets the length of the seed
substrings to align during multiseed
alignment. Smaller values
make alignment slower but more senstive. Default: the
--sensitive
preset is used by default,
which sets -L to 20 both in
--end-to-end
mode and in
--local
mode.
|
-i
|
Sets a function governing the
interval between seed substrings to use during
multiseed
alignment. For instance,
if the read has 30 characers, and seed length is 10, and the seed
interval is 6, the seeds extracted will be:
Read: TAGCTACGCTCTACGCTATCATGCATAAAC Seed 1 fw:
TAGCTACGCT Seed 1 rc: AGCGTAGCTA Seed 2 fw: CGCTCTACGC Seed 2 rc:
GCGTAGAGCG Seed 3 fw: ACGCTATCAT Seed 3 rc: ATGATAGCGT Seed 4 fw:
TCATGCATAA Seed 4 rc: TTATGCATGA
Since it's best to use longer intervals for longer reads,
this parameter sets the interval as a function of the read length,
rather than a single one-size-fits-all number. For instance,
specifying -i S,1,2.5 sets
the interval function f to
f(x) = 1 + 2.5 * sqrt(x), where x
is the read length. See also: setting
function options. If the
function returns a result less than 1, it is rounded up to 1.
Default: the --sensitive
preset is used by default,
which sets -i to
S,1,1.15 in
--end-to-end
mode to -i
S,1,0.75 in --local
mode.
|
--n-ceil
|
Sets a function governing the
maximum number of ambiguous characters (usually
Ns and/or
.s) allowed in a read as a
function of read length. For instance, specifying
-L,0,0.15 sets the
N-ceiling function f to
f(x) = 0 + 0.15 * x, where x is
the read length. See also: setting
function options. Reads
exceeding this ceiling are filtered
out. Default:
L,0,0.15.
|
--dpad
|
'Pads' dynamic programming
problems by columns on either side to allow gaps.
Default: 15.
|
--gbar
|
Disallow gaps within
positions of the beginning or end of the read. Default:
4.
|
--ignore-quals
|
When calculating a mismatch
penalty, always consider the quality value at the mismatched
position to be the highest possible, regardless of the actual
value. I.e. input is treated as though all quality values are high.
This is also the default behavior when the input doesn't specify
quality values (e.g. in -f, -r, or -c
modes).
|
--nofw/--norc
|
If
--nofw is specified,
bowtie2 will not attempt to
align unpaired reads to the forward (Watson) reference strand. If
--norc is specified,
bowtie2 will not attempt to
align unpaired reads against the reverse-complement (Crick)
reference strand. In paired-end mode,
--nofw and
--norc pertain to the
fragments; i.e. specifying --nofw
causes bowtie2
to explore only those paired-end configurations
corresponding to fragments from the reverse-complement (Crick)
strand. Default: both strands enabled.
|
--no-1mm-upfront
|
By default, Bowtie 2 will
attempt to find either an exact or a 1-mismatch end-to-end
alignment for the read before
trying themultiseed
heuristic. Such alignments
can be found very quickly, and many short read alignments have
exact or near-exact end-to-end alignments. However, this can lead
to unexpected alignments when the user also sets options governing
themultiseed
heuristic, like
-L
and
-N. For instance, if the user specifies
-N 0 and
-L equal to the length of
the read, the user will be surprised to find 1-mismatch alignments
reported. This option prevents Bowtie 2 from searching for
1-mismatch end-to-end alignments before using the
multiseed
heuristic, which leads to
the expected behavior when combined with options such as
-L
and
-N. This comes at the expense of speed.
|
--end-to-end
|
In this mode, Bowtie 2 requires
that the entire read align from one end to the other, without any
trimming (or 'soft clipping') of characters from either end. The
match bonus --ma
always equals 0 in this mode,
so all alignment scores are less than or equal to 0, and the
greatest possible alignment score is 0. This is mutually exclusive
with --local. --end-to-end is
the default mode.
|
--local
|
In this mode, Bowtie 2 does not
require that the entire read align from one end to the other.
Rather, some characters may be omitted ('soft clipped') from the
ends in order to achieve the greatest possible alignment score. The
match bonus --ma
is used in this mode, and the
best possible alignment score is equal to the match bonus
(--ma) times the length of the read. Specifying
--local and one of the
presets (e.g. --local --very-fast)
is equivalent to specifying the local version of the preset
(--very-fast-local). This is mutually
exclusive with --end-to-end. --end-to-end is
the default mode.
|
--ma
|
Sets the match bonus. In
--local
mode is added
to the alignment score for each position where a read character
aligns to a reference character and the characters match. Not used
in --end-to-end
mode. Default:
2.
|
--mp MX,MN
|
Sets the maximum
(MX) and minimum
(MN) mismatch penalties, both integers. A
number less than or equal to MXand
greater than or equal to MN
is subtracted from the alignment score for each
position where a read character aligns to a reference character,
the characters do not match, and neither is an
N. If --ignore-quals
is specified, the number
subtracted quals MX. Otherwise,
the number subtracted is MN + floor(
(MX-MN)(MIN(Q, 40.0)/40.0) ) where Q is the
Phred quality value. Default: MX
= 6, MN =
2.
|
--np
|
Sets penalty for positions where
the read, reference, or both, contain an ambiguous character such
as N. Default:
1.
|
--rdg ,
|
Sets the read gap open () and
extend () penalties. A read gap of length N gets a penalty of
+ N * . Default: 5, 3.
|
--rfg ,
|
Sets the reference gap open ()
and extend () penalties. A reference gap of length N gets a penalty
of + N * . Default: 5, 3.
|
--score-min
|
Sets a function governing the
minimum alignment score needed for an alignment to be considered
'valid' (i.e. good enough to report). This is a function of read
length. For instance, specifying
L,0,-0.6 sets the
minimum-score function f to
f(x) = 0 + -0.6 * x, where
x is the read length. See
also: setting
function options. The
default in --end-to-end
mode is
L,-0.6,-0.6 and the default
in --local
mode is
G,20,8.
|
-k
|
By default,
bowtie2 searches for
distinct, valid alignments for each read. When it finds a valid
alignment, it continues looking for alignments that are nearly as
good or better. The best alignment found is reported (randomly
selected from among best if tied). Information about the best
alignments is used to estimate mapping quality and to set SAM
optional fields, such as AS:i
and
XS:i.
When -k is specified,
however, bowtie2 behaves
differently. Instead, it searches for at most
distinct, valid alignments for each read. The search
terminates when it can't find more distinct valid alignments, or
when it finds , whichever happens first. All alignments
found are reported in descending order by alignment score. The
alignment score for a paired-end alignment equals the sum of the
alignment scores of the individual mates. Each reported read or
pair alignment beyond the first has the SAM 'secondary' bit (which
equals 256) set in its FLAGS field. For reads that have more than
distinct, valid alignments,
bowtie2does not gaurantee that the
alignments reported are the best possible in terms
of alignment score. -k is
mutually exclusive with -a.
Note: Bowtie 2 is not designed with large values for
-k in mind, and when
aligning reads to long, repetitive genomes large
-k can be very, very
slow.
|
-a
|
Like
-k
but with no upper limit on
number of alignments to search for.
-ais mutually exclusive with
-k.
Note: Bowtie 2 is not designed with
-a mode in mind, and when
aligning reads to long, repetitive genomes this mode can be very,
very slow.
|
-D
|
Up to consecutive
seed extension attempts can 'fail' before Bowtie 2 moves on, using
the alignments found so far. A seed extension 'fails' if it does
not yield a new best or a new second-best alignment. This limit is
automatically adjusted up when -k or -a are specified. Default:
15.
|
-R
|
is the maximum number of
times Bowtie 2 will 're-seed' reads with repetitive seeds. When
're-seeding,' Bowtie 2 simply chooses a new set of reads (same
length, same number of mismatches allowed) at different offsets and
searches for more alignments. A read is considered to have
repetitive seeds if the total number of seed hits divided by the
number of seeds that aligned at least once is greater than 300.
Default: 2.
|
-I/--minins
|
The minimum fragment length for
valid paired-end alignments. E.g. if -I
60 is specified and a paired-end alignment
consists of two 20-bp alignments in the appropriate orientation
with a 20-bp gap between them, that alignment is considered valid
(as long as -X
is also satisfied). A 19-bp
gap would not be valid in that case. If trimming options
-3
or
-5
are also used, the
-I
constraint is applied with
respect to the untrimmed mates.
The larger the difference between
-I
and
-X, the slower Bowtie 2 will run. This is because larger
differences bewteen -I
and
-X
require that Bowtie 2 scan a
larger window to determine if a concordant alignment exists. For
typical fragment length ranges (200 to 400 nucleotides), Bowtie 2
is very efficient.
Default: 0 (essentially imposing no
minimum)
|
-X/--maxins
|
The maximum fragment length for
valid paired-end alignments. E.g. if -X
100 is specified and a paired-end alignment
consists of two 20-bp alignments in the proper orientation with a
60-bp gap between them, that alignment is considered valid (as long
as -I
is also satisfied). A 61-bp
gap would not be valid in that case. If trimming options
-3
or
-5are also used, the -X
constraint is applied with respect to the untrimmed
mates, not the trimmed mates.
The larger the difference between
-I
and
-X, the slower Bowtie 2 will run. This is because larger
differences bewteen -I
and
-X
require that Bowtie 2 scan a
larger window to determine if a concordant alignment exists. For
typical fragment length ranges (200 to 400 nucleotides), Bowtie 2
is very efficient.
Default: 500.
|
--fr/--rf/--ff
|
The upstream/downstream mate
orientations for a valid paired-end alignment against the forward
reference strand. E.g., if --fr
is specified and there is a candidate paired-end
alignment where mate 1 appears upstream of the reverse complement
of mate 2 and the fragment length constraints
(-I
and
-X) are met, that alignment is valid. Also, if mate 2
appears upstream of the reverse complement of mate 1 and all other
constraints are met, that too is valid.
--rf likewise requires that
an upstream mate1 be reverse-complemented and a downstream mate2 be
forward-oriented. --ff
requires both an upstream mate 1 and a downstream mate
2 to be forward-oriented. Default: --fr
(appropriate for Illumina's Paired-end Sequencing
Assay).
|
--no-mixed
|
By default, when
bowtie2 cannot find a
concordant or discordant alignment for a pair, it then tries to
find alignments for the individual mates. This option disables that
behavior.
|
--no-discordant
|
By default,
bowtie2 looks for
discordant alignments if it cannot find any concordant alignments.
A discordant alignment is an alignment where both mates align
uniquely, but that does not satisfy the paired-end constraints
(--fr/--rf/--ff,-I, -X). This option disables that behavior.
|
--dovetail
|
If the mates 'dovetail', that is
if one mate alignment extends past the beginning of the other such
that the wrong mate begins upstream, consider that to be
concordant. See also:Mates
can overlap, contain or dovetail each other. Default: mates cannot dovetail in a concordant
alignment.
|
--no-contain
|
If one mate alignment contains
the other, consider that to be non-concordant. See also:
Mates
can overlap, contain or dovetail each other. Default: a mate can contain the other in a
concordant alignment.
|
--no-overlap
|
If one mate alignment overlaps
the other at all, consider that to be non-concordant. See also:
Mates
can overlap, contain or dovetail each other. Default: mates can overlap in a concordant
alignment.
|
-t/--time
|
Print the wall-clock time
required to load the index files and align the reads. This is
printed to the 'standard error' ('stderr') filehandle. Default:
off.
|
--un --un-gz
--un-bz2
|
Write unpaired reads that fail
to align to file at . These reads correspond to the SAM
records with the FLAGS0x4 bit set
and neither the 0x40 nor
0x80 bits set. If
--un-gz is specified,
output will be gzip compressed. If
--un-bz2 is specified,
output will be bzip2 compressed. Reads written in this way will
appear exactly as they did in the input file, without any
modification (same sequence, same name, same quality string, same
quality encoding). Reads will not necessarily appear in the same
order as they did in the input.
|
--al --al-gz
--al-bz2
|
Write unpaired reads that align
at least once to file at. These reads correspond to the SAM records
with the FLAGS 0x4,
0x40, and
0x80 bits unset. If
--al-gz is specified,
output will be gzip compressed. If
--al-bz2 is specified,
output will be bzip2 compressed. Reads written in this way will
appear exactly as they did in the input file, without any
modification (same sequence, same name, same quality string, same
quality encoding). Reads will not necessarily appear in the same
order as they did in the input.
|
--un-conc --un-conc-gz
--un-conc-bz2
|
Write paired-end reads that fail
to align concordantly to file(s) at . These reads correspond
to the SAM records with the FLAGS 0x4
bit set and either the 0x40
or0x80 bit set
(depending on whether it's mate #1 or #2)..1
and .2 strings
are added to the filename to distinguish which file contains mate
#1 and mate #2. If a percent symbol,
%, is used in , the percent
symbol is replaced with 1
or 2 to make
the per-mate filenames. Otherwise, .1
or .2 are
added before the final dot in to make the per-mate
filenames. Reads written in this way will appear exactly as they
did in the input files, without any modification (same sequence,
same name, same quality string, same quality encoding). Reads will
not necessarily appear in the same order as they did in the
inputs.
|
--al-conc --al-conc-gz
--al-conc-bz2
|
Write paired-end reads that
align concordantly at least once to file(s) at . These reads
correspond to the SAM records with the FLAGS
0x4 bit unset and either
the0x40 or
0x80 bit set (depending on
whether it's mate #1 or #2). .1
and .2 strings
are added to the filename to distinguish which file contains mate
#1 and mate #2. If a percent symbol,
%, is used in , the percent
symbol is replaced with 1
or 2 to make
the per-mate filenames. Otherwise, .1
or .2 are
added before the final dot in to make the per-mate
filenames. Reads written in this way will appear exactly as they
did in the input files, without any modification (same sequence,
same name, same quality string, same quality encoding). Reads will
not necessarily appear in the same order as they did in the
inputs.
|
--quiet
|
Print nothing besides alignments
and serious errors.
|
--met-file
|
Write
bowtie2 metrics to file
. Having alignment metric can be useful for debugging
certain problems, especially performance issues. See also:
--met. Default: metrics disabled.
|
--met-stderr
|
Write
bowtie2 metrics to the
'standard error' ('stderr') filehandle. This is not mutually
exclusive with --met-file. Having alignment metric can be useful for debugging
certain problems, especially performance issues. See
also:--met. Default: metrics disabled.
|
--met
|
Write a new
bowtie2 metrics record
every seconds. Only matters if either
--met-stderr
or
--met-file
are specified. Default:
1.
|
--no-unal
|
Suppress SAM records for reads
that failed to align.
|
--no-hd
|
Suppress SAM header lines
(starting with
@).
|
--no-sq
|
Suppress
@SQ SAM header
lines.
|
--rg-id
|
Set the read group ID to
. This causes the SAM @RG
header line to be printed, with as the
value associated with theID: tag.
It also causes the RG:Z:
extra field to be attached to each SAM output record,
with value set to .
|
--rg
|
Add (usually of
the form TAG:VAL, e.g.
SM:Pool1) as a field on the
@RG header line. Note: in
order for the @RG line to
appear, --rg-id
must also be specified. This
is because the IDtag is required
by the SAM
Spec. Specify
--rg multiple times to set
multiple fields. See the SAM Spec
for details about what fields
are legal.
|
--omit-sec-seq
|
When printing secondary
alignments, Bowtie 2 by default will write out the
SEQ and
QUAL strings. Specifying
this option causes Bowtie 2 to print an asterix in those fields
instead.
|
-o/--offrate
|
Override the offrate of the
index with . If is greater than the offrate
used to build the index, then some row markings are discarded when
the index is read into memory. This reduces the memory footprint of
the aligner but requires more time to calculate text offsets.
must be greater than the value used to build the
index.
|
-p/--threads
NTHREADS
|
Launch
NTHREADS parallel search
threads (default: 1). Threads will run on separate processors/cores
and synchronize when parsing reads and outputting alignments.
Searching for alignments is highly parallel, and speedup is close
to linear. Increasing -p
increases Bowtie 2's memory footprint. E.g. when
aligning to a human genome index, increasing
-p from 1 to 8 increases
the memory footprint by a few hundred megabytes. This option is
only available if bowtie is
linked with thepthreads library
(i.e. if BOWTIE_PTHREADS=0
is not specified at build time).
|
--reorder
|
Guarantees that output SAM
records are printed in an order corresponding to the order of the
reads in the original input file, even when
-p
is set greater than 1.
Specifying --reorder and
setting -p
greater than 1 causes Bowtie 2
to run somewhat slower and use somewhat more memory then if
--reorder were not
specified. Has no effect if -p
is set to 1, since output
order will naturally correspond to input order in that
case.
|
--mm
|
Use memory-mapped I/O to load
the index, rather than typical file I/O. Memory-mapping allows many
concurrentbowtie processes on the
same computer to share the same memory image of the index (i.e. you
pay the memory overhead just once). This facilitates
memory-efficient parallelization of
bowtie in situations where
using -p
is not possible or not
preferable.
|
--qc-filter
|
Filter out reads for which the
QSEQ filter field is non-zero. Only has an effect when read format
is --qseq. Default: off.
|
--seed
|
Use as the seed
for pseudo-random number generator. Default: 0.
|
--non-deterministic
|
Normally, Bowtie 2
re-initializes its pseudo-random generator for each read. It seeds
the generator with a number derived from (a) the read name, (b) the
nucleotide sequence, (c) the quality sequence, (d) the value of the
--seed
option. This means that if two
reads are identical (same name, same nucleotides, same qualities)
Bowtie 2 will find and report the same alignment(s) for both, even
if there was ambiguity. When
--non-deterministic is
specified, Bowtie 2 re-initializes its pseudo-random generator for
each read using the current time. This means that Bowtie 2 will not
necessarily report the same alignment for two identical reads. This
is counter-intuitive for some users, but might be more appropriate
in situations where the input consists of many identical
reads.
|
--version
|
Print version information and
quit.
|
-h/--help
|
Print usage information and
quit.
|