Mutation T@sterDocumentation |
documentation |
input | output | statistics | error messages |
Bayes classifier | QueryEngine | known bugs and limitations | contact |
automation |
examples & statistics |
examples & tutorial | statistics - cross-validation & comparison with similar tools |
Gene
You can identify your gene of interest by entering one of the following:Transcript
You can also directly enter the Ensembl transcript id (starting with ENST, e.g. ENST00000308868) of your gene of interest.Position / snippet refers to
Choose coding sequence if you are working with coding sequence positions / sequence for localising the alteration of interest. Coding sequence (CDS) position 1 refers to the A of the start ATG (and is sometimes also called ORF, for open reading frame).Alteration, all types by sequence
Choose all types by sequence if you have a sequence snippet around an alteration that you want to analyse. You can paste this sequence snippet into this field, putting square brackets [ ] around the altered base and the new base (e.g. ACGGTT[A/G]CTCTAAGGA for a base exchange from A to G). Comprehensive examples of the format are provided directly on the input mask. Additionally, you have to (1) indicate the HGNC symbol of the gene in question and (2) the transcript ID (or select one after entering a gene). All entries have to refer to the 5'-3' direction of the transcript sequence.Alteration, single base exchange by position
Choose single base exchange by position if you are working with a single base exchange. This means, that only one single base is altered. If you have named the mutation according to the HGVS variation nomenclature there should be indicated whether you have to work in the coding sequence (CDS) or gDNA mode.Alteration, insertion or deletion by position
Choose insertion / deletion by position if you are working with an insertion, a deletion or a combination thereof. You do not need to further specify which kind of alteration you are exactly dealing with, since this is automatically determined by the software (and displayed in the output).options
Check 'show nucleotide alignment' if you want to see multi-species alignment of nucleotide sequence around the submitted alteration in the results. By default, nucleotide alignment is not run, since the BLAST call slows down MutationTaster and the results are not used by the Bayes Classifier anyway.Name of alteration
You can enter a self-chosen name for the alteration in question. This will be displayed in the output in order to facilitate the identification of printed outputs for different mutations in the same gene.
Bayes classifier
MutationTaster employs a Bayes classifier to eventually predict the disease potential of an alteration. The Bayes classifier is fed with
the outcome of all tests and the features of the alterations and calculates probabilities for the alteration to be either
a disease mutation or a harmless polymorphism. For this prediction, the frequencies of all single features for known
disease mutations/polymorphisms were studied in a large training set composed of >390,000 known disease mutations from HGMD Professional
and >6,800,000 harmless SNPs and Indel polymorphisms from the 1000 Genomes Project (TGP).
Output: probability value
The probability value is the probability of the prediction, i.e. a value close to 1 indicates a high 'security' of the prediction. Please note that the p value
used here is NOT the probability of error as used in t-test statistics.
Our results show that wrong predictions are usually not reflected by low probability values but are rather caused by polymorphisms or disease causing
alterations that show characteristics of the other case, e.g. SNPs that are highly conserved and destroy protein features or disease mutations that
appear to have no effect on the protein/gene at all.
If an alteration is a 'true' SNP (as confirmed by the existence of each of the three
genotypes AA, AB, BB in the HapMap data or by presence in TGP in homozygous state in > 4 cases),
it is automatically predicted to be a polymorphism. Alterations causing a premature termination codon and ultimately leading to
nonsense-mediated mRNA
deday (NMD) are automatically assigned the 'disease causing' status. In both cases, the Bayes classifier is run nevertheless and the
probability for the prediction that was automatically made is shown. Scores below 0.5 hence indicate, that our classifier comes to a
different conclusion. A few SNPs listed in HapMap introduce premature stop codons and will cause NMD; these are likely to be mistaken for
disease mutations.
We advise you not to exclude an alteration due to a dbSNP ID. Many SNPs from dbSNP are not validated and some are even known to be disease
causing variants (e.g. rs28939070 is
responsible for Trichorhinophalangeal Syndrome, type I).
Since we used 'true' SNPs from the 1000 Genomes Project as our polymorphism data set, we did not include Genotype data from the
1000 Genomes Project (and HapMap frequencies either) in the training and optimisation of MutationTaster nor in the comparison with other
applications.
The Bayes classifier is regularly updated, i.e. predictions might in some cases change over time.
prediction
MutationTaster predicts an alteration as one of four possible types:summary
List of the most prominent features of the analysed alteration (e.g. 'at intron-exon boundary', 'spans start ATG', 'homozygous in TGP' etc.)name of alteration
A user-specified name in order to identify printed outputs.alteration (phys. location)
The alteration on "physical" i.e. chromosomal level (e.g. chr7:91623937_91623938insGGCAAT).HGNC symbol
The official HGNC symbol.Ensembl transcript ID
Ensembl [1] transcript ID, starting with ENST.UniProt peptide (SwissProt ID)
UniProt KB / SwissProt [2] accession ID. Unfortunately, this does not always correctly correspond to the selected product of the transcript.alteration type
Is either a base exchange, a combination of insertion and deletion, an insertion or a deletion.alteration region
Is either 5'UTR (untranslated region), CDS (coding sequence), 3'UTR or intron.DNA changes
Alteration on nucleotide level. gDNA level (g.) is displayed always, cDNA level (cDNA.) for alterations located in exons, CDS level (c.) only for alterations residing in an exon in the coding sequence.AA changes
Any amino acid changes are shown here, displaying the original versus the new amino acid as well as the position of the substitution and a score for it. This score is taken from an amino acid substitution matrix (Grantham Matrix [3]) which takes into account the physico-chemical characteristics of amino acids and scores substitutions according to the degree of difference between the original and the new amino acid. Scores may range from 0.0 to 215. Since the Grantham matrix does not provide values for an amino acid insertion/deletion, no score is given in such cases. The score is only displayed for information reasons and does not influence the MutationTaster prediction as generated by our Bayes classifier. An asterisk (*) stands for a stop codon, a minus (-) means that in the original AA sequence, there was no AA at this position. If the initial Methionine codon (startATG) is lost, MutationTaster searches for a potential new, downstream startATG and informs you about AA changes based on the assumed alternative AA sequence.position(s) of altered AA
Lists the positions of altered AA. For mutations resulting in a frameshift, the position of the first altered AA is displayed along with the information that due to a frameshift, there are further changes downstream.frameshift
Can be either yes or no.regulatory features
Our database contains so-called regulatory features from the Ensembl Regulation database, such as histone modification sites, open chromatin or transcription factor binding sites. For more information about Ensembl Regulation, please see their documentation. Since it is not yet clear if and how the regulatory features influence the gene under scrutiny or rather up- / downstream genes, the regulatory features are not used by the Bayes classifier for prediction, but only displayed for informational reasons here.phyloP / phastCons
phastCons and phyloP are both methods to determine the grade of conservation of a given nucleotide [6]. MutationTaster uses values which are precomputed and offered by UCSC. phastCons values vary between 0 and 1 and reflect the probability that each nucleotide belongs to a conserved element, based on the multiple alignment of genome sequences of 46 different species (the closer the value is to 1, the more probable the nucleotide is conserved). It considers not just each individual alignment column, but also its flanking columns. By contrast, phyloP (values between -14 and +6) separately measures conservation at individual columns, ignoring the effects of their neighbors. Moreover, phyloP can not only measure conservation (slower evolution than expected under neutral drift) but also acceleration (faster than expected). Sites predicted to be conserved are assigned positive scores, while sites predicted to be fast-evolving are assigned negative scores. For more information about phyloP and phastCons, please see the cited paper or the description on the UCSC website.splice sites
MutationTaster uses a locally installed third party splice site prediction program, namely NNSplice [7] from the Berkeley Drosophila Genome Project (a web-based version is available at http://fruitfly.org/seq_tools/splice.html) to analyse possible changes in splice sites.Kozak consensus sequence altered
The Kozak consensus sequence (gccRccAUGG; R = purine) starts upstream of the start codon (AUG) and plays a major role in the initiation of translation. The purine (R) at position -3 as well as the G in position +4 are highly conserved. The program checks whether for a given alteration a previously strong consensus sequence has been weakened.conservation on AA level
For conservation analysis, amino acid or nucleotide sequence homologues of ten other species (chimp, rhesus macaque, mouse, cat, chicken, claw frog, pufferfish, zebrafish, fruitfly, and worm) are aligned with the corresponding human sequence of the gene in question. Sequences are aligned with blastp [8], which is installed as stand-alone executable on our server, and analysed by MutationTaster.protein features
The program checks whether any protein features are directly or indirectly affected by the alteration. Our database stores all human SwissProt protein features. Some features will not have an influence on the prediction; they are only displayed for information and should not have an impact on the disease-causing potential of the alteration (e.g. CONFLICT or MUTAGEN).length of protein
MutationTaster checks if the resulting protein will be elongated (prolonged), truncated, or whether nonsense-mediated mRNA decay (NMD) is likely to occur. MutationTaster determines the NMD border as last intron/exon junction minus 50 bp and analyses if a given premature termination codon occurs 5' to this border thus leading to NMD. An elongated protein is referred to as prolonged, i.e. the original termination codon is destroyed and the translation stops later than normal. Truncated is reffered to as either slightly truncated (if less than 10% of the wild-type protein length are missing) or strongly truncated (if more than 10% of original protein length are missing). In the two latter cases, the additional information 'might cause NMD' is given, because the '-55 boundary rule' is not fulfilled, but it cannot be ruled out that NMD occurs nevertheless. If MutationTaster concludes that an alteration causes NMD, this alteration is automatically regarded as a disease mutation. The classifier is run never-theless and the p value for the prediction is shown.AA sequence altered
Can be either yes (AA exchange) or no (no AA exchange)position(s) of altered AA
If the alteration in question is located in the CDS, the position on amino acid level is shown here. If the alteration spans two or more amino acids, these are all displayed and separated by a comma.position of stopcodon in wt / mu CDS
Position of the last base of the stop codon (this can either be TGA, TAA or TAG), position 1 refers to the A in the start ATG codon.position (AA) of stopcodon in wt / mu AA sequence
Position of the stop asterisk (*) in the amino acid sequence, position 1 refers to the first amino acid of the protein.poly(A) signal
MutationTaster uses a locally installed version of the program polyadq [9] for analysis of polyadenylation signals. More information at http://rulai.cshl.org/tools/polyadq/polyadq_form.htmlconservation on nucleotide level
Conservation on nucleotide level is analysed similarly to AA level: Using bl2seq, homologue DNA sequences of different species are compared to the human DNA sequence. Conservation status can either be all identical (same base(s) in human and species sequence), not conserved (different base(s) in human and species sequence) or no alignment (if no local alignment around the indicated position(s) was found). If no homologue sequences are found, this is indicated by no homologue. Up to now, conservation on nucleotide level is not used for the prediction.position of start ATG in wt / mu cDNA
Position of the A in the start ATG, position 1 refers to the first base of the cDNA. If the regular start ATG is changed by an alteration, MutationTaster searches for the next most 5'-ATG and assumes this to be the new start ATG for the mutated sequence.position of termination codon in wt / mu cDNA
Position of the last base pair of the termination codon (this can be either TGA, TAA or TAG), position 1 refers to the first base pair of the cDNA.chromosome
The chromosome the alteration is located on.strand
Is either 1 for forward strand or -1 for reverse strand.last intron/exon border
The last base of the exon before the last exon.theoretical NMD border in CDS
In order to avoid truncated proteins which might act in a dominant-negative manner, the eukaryotic cell has a surveillance mechanism to ensure that only error-free mRNAs are translated. It was shown that mRNA shorter than a given length is nearly completely degraded. This process is known as nonsense-mediated mRNA decay or NMD. The rule seems to be that a termination codon occurring 50-55 nucleotides upstream of the final intron / exon junction initiates the NMD machinery and the mRNA gets degraded. Therefore, this program determines the NMD border as last intron / exon junction minus 50 bp and analyses if a given premature termination codon occurs 5' to this border thus eventually leading to NMD.length of CDS
The length of the coding sequence from the A of the initiation codon (ATG) to the last base of the termination codon.cDNA position
Gives the last wild-type base before alteration and first wild-type base after alteration in coding DNA sequence context (positions relative to start of transcribed coding DNA reference sequence) e.g. 1203 / 1205, the altered base is at position 1204.gDNA position
Gives the last wild-type base pair before alteration and first wild-type base pair after alteration in genomic DNA sequence context (positions relative to start of genomic DNA reference sequence) e.g. 53,344 / 53,346, the altered base is at position 53,345.chromosomal position
Gives the last wild-type base before alteration and first wild-type base after alteration in chromosomal sequence context (position relative to start of chromosomal reference sequence) e.g. 154,372,337 / 154,372,339, the altered base is at position 154,372,338.gDNA and cDNA sequence snippet
The sequence surrounding the alteration (20 bp up- and downstream). The altered bases are highlighted in blue.wild-type and mutated AA sequence
Complete AA sequences, the asterisk (*) indicates STOP.speed
This is the time MutationTaster needed for analysis & prediction - your browser might need some extra time to display the results, especially if you include images.InsDel too long
At present, MutationTaster handles only InsDels up to 12 bases.Your mutation of interest seems to span an exon/intron boundary.
This kind of mutation can only be analysed in gDNA mode.No transcripts for this gene found!
You might have mis-spelled the gene symbol or used a protein name which is not always also the correct symbol (e.g. protein p53 is gene TP53). Also, in some (rare) cases a NCBI gene could not be mapped to an Ensembl gene. As some external data is based on NCBI while other is based on Ensembl, MT needs both to make a prediction. Moreover, we filter out protein-coding transcripts (Ensembl biotype protein_coding) without a correct start codon (ATG) and correct stop codon (TGA, TAA, TAG). This might lead to the phenomenon that MutationTaster complains about "no suitable transcripts" or "no transcripts for this gene found" although Ensembl lists one or several. Transcripts of mitochondrial genes are not tested for integrity due to differences in the mitochondrial genetic code.No internal Ensembl transcript ID found. / No Ensembl gene ID found for transcript. / No stable ID for this gene.
Our database doesn't know the transcript you specified. This might happen if you refer to a newer or older release than the one we use. The release MT uses is mentioned on the query interface.Ensembl gene XXX not found in ENSEMBL
Our database doesn't know the gene you specified. This might happen if you refer to a newer or older release than the one we use. The release MT uses is mentioned on the query interface.No NCBI gene ID found. / No NCBI gene ID found for this transcript.
In some (rare) cases an Ensembl gene could not be mapped to a NCBI gene. As some external data is based on NCBI while other is based on Ensembl, MT needs both to make a prediction.Too many NCBI gene IDs found.
In some (rare) cases an Ensembl gene could not be mapped to a single NCBI gene. As some external data is based on NCBI while other is based on Ensembl, MT needs both to make a prediction.Only invalid NCBI gene IDs found.
In some (very rare) cases an Ensembl gene could not be mapped to a valid NCBI gene, i.e. the NCBI gene Ensembl refers to is 'discontinued' and was replaced by another gene. As some external data is based on NCBI while other is based on Ensembl, MT needs both to make a prediction. Please contact us if you encouter such a case.Gene XXX not found on any chromosome.
The gene under scrutiny has no valid positional data. This should not occur at all. Please contact us if you encouter such a case.Gene XXX (Entrez gene YYY) and transcript ZZZ do not match!
The transcript you entered is not a product of the gene you entered. Please check your input.Position is out of gene!
You entered a position that is located outside the gene. This may happen when you mapped genomic position to gene-specific position using an old genome build. Or, of course, by typos. Please check your input.Could not retrieve a sequence or sequence is too short.
MT was not able to get the gene sequence from Ensembl. This might be due to network problems so you should repeat the analysis after some time. Should this not work, please contact us.No start ATG exon found.
The transcript is not properly annotated: there is no start position of the coding sequence in the database. Please select another transcript of the same gene.No stop exon found.
The transcript is not properly annotated: there is no stop position of the coding sequence in the database. Please select another transcript of the same gene.Chosen transcript ENSTXXX has no correct start ATG annotated.
Protein-coding transcripts (Ensembl biotype protein_coding) are tested for transcript integrity, i.e. for presence of a correct start codon (ATG) and correct stop codon (TGA, TAA, ATG). If one is missing, an error message is thrown out because analysis in corrupt transcripts might lead to a wrong prediction.Sequence XXX is not unique in your gene!
Please use a longer snippet.Sequence was not found in your gene.
Please check your input: is there a typo in your snippet? Or do you use a snippet created from the wrong strand? MT always refers to the strand the gene is located on.Snippet not properly formatted.
Please check your input: snippets must be specified as ACGTACGT[OLDBASES/NEWBASES]ACGTACGT.2013: SIFT, PROVEAN, PolyPhen-2