miRNA靶基因预测方法(动物)
2013-08-04 21:35阅读:
miRNA进行靶基因预测,软件根据种子区(seed
region)原则把miRNA
5’端的第2-8位碱基与mRNA的3’UTR上的一段7nt序列进行完成互补配对。
表 常用软件特点总结
r>
如何快速找到不同软件预测的交集?简单的方法是,可以使用在线工具。例如 bioinformatics.psb.ugent.be/webtools/Venn/
在网页上分别输入不同的软件的结果列表,提交之后就可以得到结果了。

以上软件一般都能在线使用,例如PITA,可以分别输入多条基因序列和多条miRNA,并且自己设置参数进行靶基因预测。(网址:http://genie.weizmann.ac.il/pubs/mir07/mir07_prediction.html
)

附:PITA和RNAhybrid的在linux系统下的使用方法。(是转载的)
======PITA=====
PITA的基本参数
[4]大概是这样子的:
- ΔΔG 小于或等于 -10 kcal/mol
- Seed区域长度为 7-8 nt
- 不允许 G:U配对
PITA的帮助文档如下,其中值得你关注的点我注释了出来:
syntax:
pita_prediction.pl
[OPTIONS]
Execute the PITA algorithm
for identifying and
scoring microRNA target sites.
options:
-utr
: fasta file containing
the UTRs to be
scanned(以fasta格式的mRNA文件,可以是完整的mRNA,虽然我通常做法是挑取测序后reads的peak来作为被预测输入)
-mir
: fasta file containing
the microRNA
sequences(相应地你的候选miRNA数据库,比如人类miRNA数据库http://www.mirbase.org/cgi-bin/mirna_summary.pl?org=hsa)
-upstream : fasta file containing the
upstream sequence for each UTR. The IDs
in should match the
IDs found int the UTR file. If less 200 bp are
given (or if no
file is given), it is padded with Poly-A.
-flank_up
-flank_down
: Flank
requirement in basepairs
(default: zero
for both)
-ddG_context
: Number
of bases upstream and
downstream for target site that
are
taken
into account when
folding the UTR
(default:
70)
-prefix
: Add
the string as a prefix
to the output files
(pita_results.tab
and
ext_utr.stab)(也就是输出文件的前缀,作为你识别文件的文件名)
-gxp:
Produce a gxp
(Genomica project
file) output
file.
Seed matching
parameters:
(接下来就是你比较需要重视的参数)
-l
<</span>num1-num2>:
Search
for seed lengths of
num1,...,num2 to the
MicroRNA
(default:
6-8)
(就是seed区域的长度,默认是6-8,这里调整为7-8)
-gu
:
Lengths for which
G:U wobbles are allowed
and number of allowed
wobbles.
Format of nums:
<</span>length;num
G:U>,<</span>length;num
G:U>,...
(default:
6;0,7;1,8;1)
(因为不允许G:U配对,所以调整为
6;0,7;0,8;0)
-m
:
Lengths for which
mismatches are allowed and number of
allowed mismatches
Format of nums:
<</span>length;num
mismatches>,<</span>length;num
mismatches>,...
(default:
6;0,7;0,8;1)
-loop
:
Lengths for which a single
loop in either the target
or the microrna is
allowed
Format of nums:
,<</span>length>,...
(default:
none)
PITA的标准输出结果示例如下:
UTR microRNA Start
End Seed Loop
dGduplex dG5 dG3
dG0 dG1 dGopen
ddG
chr1-32146379
hsa-miR-339-5p
82 74
8:1:1
0 -20.89
-10.5 -10.39
-39.82 -20.46
-19.35
-1.53
chr20-48173714
hsa-let-7a
39 31
8:1:1
0 -13.4
-5.7 -7.7
-19.48 -0.43
-19.04 5.64
-
从上述输出结果可以一窥,因为PITA没有提供一个cutoff来限制能量值,所以自行写个脚本去读取PITA的输出文件并筛出ΔΔG
小于或等于 -10 kcal/mol的案例。
-
另外PITA也不给出miRNA-mRNA之间的配对关系,只给出位置信息,喜欢偷懒的我选择用RNAhybrid帮我去绘制配对关系图。所以RNAhybrid也就沦为我的一个绘图工具而已。
=====RNAhybrid=====
RNAhybrid对MFE(minimum free
energy)有cutoff参数限制,所以这里我会选择
ΔG小于或等于-20 kcal/mol
[4]。
Usage: RNAhybrid
[options]
[target sequence]
[query sequence].
options:
-b
<</span>number of hits per
target>
-c compact
output
-d
,<</span>theta>
-f helix
constraint
-h help
-m
<</span>max
targetlength>
-n
<</span>max query
length>
-u
<</span>max internal
loop size (per
side)>
-v
<</span>max bulge loop
size>
-e
<</span>energy
cut-off>
-p
<</span>p-value
cut-off>
-s
(3utr_fly|3utr_worm|3utr_human)
-g
(ps|png|jpg|all)
-t
<</span>target
file>
-q
<</span>query
file>
Either a target file has to be given
(FASTA format)
or one target sequence
directly.
Either a query file has to be given
(FASTA format)
or one query sequence
directly.
The helix constraint format
is 'from,to',
eg. -f
2,7
forces
structures to have a helix from
position 2 to 7
with respect to the
query.
and are the position and
shape parameters,
respectively,
of the extreme value distribution assumed
for p-value
calculation.
If omitted, they are
estimated from the maximal duplex energy
of the query.
In that case,
a data set name has to be
given with the
-s flag.
PS graphical output not
supported.
PNG and JPG graphical output
not supported.
- 输入的miRNA和mRNA可以是单纯序列,也可以是一个fasta文件里好多个序列。
- 输出会直接打印在终端里,所以建议你在终端以 “>'
输出保存为一个文件,所以你也能体会我为什么把它当作我下游一个绘图工具使唤了
RNAhybrid标准输出是这样子的:
target:
*****(具体UTR个案为具体个案名字)
length: 30
miRNA :
*****(具体miRNA个案为具体个案名字)
length: 22
mfe: -24.4
kcal/mol
(MFE 即minimum free
energy)
p-value:
0.001448
position 6
target 5' C G GG
AU U 3'
GAU GA
UAGG UGGUGCUG
UUG CU
GUCU ACCACGAU
miRNA 3' A G
AAA
5'
所以基本上呢,上述形式是不太适合发表格式的,所以建议你自制一个代码专门读取这些文件,最后这个文件会被整理成这个样子:
-24.4
kcal/mol
***
5' CCUACCACUCACCCUAGCA
3'
| ||
|||| ||||||
******** 3'
AGCGGGAGAGUUGGGUCGAAAA 5'
-- end && reference
- 【转】miRNA数据库 http://joseph.yy.blog.163.com/blog/static/50973959201192121757343/
- PITA http://genie.weizmann.ac.il/pubs/mir07/mir07_data.html
- RNAhybrid http://bibiserv.techfak.uni-bielefeld.de/rnahybrid/>
- ?Marín, R. M., & Vanícek, J. (2011). Efficient use of
accessibility in microRNA target prediction. Nucleic acids
research, 39(1), 19-29. doi:10.1093/nar/gkq768