Alamut Documentation Home | Tutorials | Managing Variants | Splicing Module | Orthologue Alignments

Alamut 1.5 Splicing Prediction Module


Alamut 1.3 (released Feb. 2008) introduced a new splicing module, integrating a number of prediction methods:

Alamut 1.5 brings a number of enhancements to the splicing module:

Background on prediction methods

SpliceSiteFinder-like

This method is based on position weight matrices computed from a set of human constitutive exon/intron junctions for donor and acceptor sites (see below). For branch points, we use the matrix described by Zhang et al. (1998).

We use the same algorithms as those referenced by Alex Dong Li's SpliceSiteFinder web site.

MaxEntScan

MaxEntScan is a method based on the Maximum Entropy principle, developed by the Burge Lab at MIT and described in Yeo et al., 2004.

The MaxEntScan splice site datasets and algorithms are fully integrated inside Alamut, with permission from Christopher Burge.

NNSPLICE

NNPLICE (available at the Berkeley Drosophila Genome Project web site) is a prediction method based on neural networks (Reese et al. 1997). Although not fully integrated inside Alamut, it is transparently queried from within the software.

GeneSplicer

GeneSplicer is an Open Source software available from the University of Maryland CBCB. GeneSplicer combines several splice site detection techniques, among which Markov models (Pertea et al. 2001).

Known constitutive signals

Alamut reports in the splicing module each occurrence of the 9-mers (3 exonic + 6 intronic nucleotides) found in the donor subset of human constitutive exon/intron junctions (see below), and each occurrence of the 6-mers (4 intronic + 2 exonic) found in the acceptor subset. Acceptor 6-mers are reported only where at least 6 of the 8 upstream nucleotides are pyrimidines.

ESEFinder

The ESEFinder method computes putative binding sites for Exonic Splicing Enhancers (Cartegni et al., 2003). We have embedded the ESEFinder matrices (licensed from Cold Spring Harbor Laboratory) inside Alamut so as to perform the same computation as that provided by the CSHL ESEFinder web site.

RESCUE-ESE

In the RESCUE-ESE approach, specific hexanucleotide sequences are identified as candidate ESEs (Fairbrother et al., 2002). The set of human hexamers available from the RESCUE-ESE web site is embedded inside Alamut.

Set of human constitutive exon/intron junctions

We have gathered a set of human constitutive exon/intron junction sequences as follows. 10,728 human mRNA sequences from the RefSeq database (as of Dec. 2007), with status 'reviewed', were mapped onto the human reference genome (NCBI 36). Based on this mapping, genomic exon/intron boundary sequences were extracted into separate subsets for donor and acceptor sites.

With these sequences, we have built three position weight matrices: two matrices for donor sites (GT and GC sites), and one matrix for acceptor sites (AG sites). See sequence logos below.

Using the Splicing Prediction Module

To open the splicing window, click the 'Splicing' button from the variant annotation window:

The splicing window shows up:

The window displays the reference (wild-type) and mutated sequences (in the range displayed in the main window when the Splicing button was clicked) and predictions are reported above and under each one. Exons are drawn as blue boxes.

Hits from SpliceSiteFinder-like, MaxEntScan, NNSPLICE and GeneSplicer are displayed as blue vertical bars for 5' (donor) sites, and as green vertical bars for 3' (acceptor) sites. The height of each bar is proportional to the maximum possible score computed by the corresponding algorithm.

Known constitutive signals are displayed as small blue (5') or green (3') triangles, close to the sequence letters.

When moving the mouse over each vertical bar or triangle, a tooltip appears with the corresponding score. You can display score numbers for each hit bar by just clicking the bar itself.

To reveal differences between wild-type and mutated scores, click on the 'Highlight Differences' button. Unchanged scores get dimmed, while score numbers are displayed beside those that differ:

To display ESE predictions, click the 'ESE Predictions' button. ESE hits from ESEfinder are now displayed above each sequence, and RESCUE-ESE hexamers are drawn under them:

Splicing reports

To generate a tabular report of splicing predictions, click the Report button. The report is generated in HTML web format. It can be later opened and edited by most word processors.

References

Cartegni et al. ESEfinder: A web resource to identify exonic splicing enhancers.
Nucleic Acids Res (2003) vol. 31 (13) pp. 3568-71

Fairbrother et al. Predictive identification of exonic splicing enhancers in human genes.
Science (2002) vol. 297 (5583) pp. 1007-13

Pertea et al. GeneSplicer: a new computational method for splice site prediction.
Nucleic Acids Res (2001) vol. 29 (5) pp. 1185-90

Reese et al. Improved Splice Site Detection in Genie.
J Comp Biol (1997) vol. 4 (3), pp. 311-23

Yeo et al. Maximum entropy modeling of short sequence motifs with applications to RNA splicing signals.
J Comput Biol (2004) vol. 11 (2-3) pp. 377-94

Zhang et al. Statistical features of human exons and their flanking regions.
Hum Mol Genet (1998) vol. 7 (5) pp. 919-32

Sequence logos

These sequence logos (computed by enoLOGOS) depict the position weight matrices used by the SpliceSiteFinder-like algorithm in Alamut.


2009 Interactive Biosoftware - Last modified: 4 May 2009