User's Guide

What is the Automated Splice Site Analyses Server?

What is the basis for identification of the binding sites?

How do I obtain long term access to this resource?

Which reference sequence of Human Genome is this based on?

Why is a list of various accession numbers displayed, when I submit a HUGO designated gene name?

What does "Submit own sequence" mean?

What does "Submit mRNA Accession #" mean?

What does "Designated Gene Name" mean?

What does "Mutation / Variant" mean?

What does "Window Range" mean?

What does "Analyze following Sites:" mean?

What does "Mutation Coordinate is specified relative to the beginning of either the:" mean?

What does "Translate all forward frames in the lister generated map" mean?

What does "mRNA Accession No." mean?

Why Gene Name is required in "Submit mRNA Accession #" option?

What does "Direction" mean?

What does "Sequence" mean?

How much time does it take, to analyze one mutation?

Why such a long time to analyze one small mutation?

What do the sub divisions "acceptor, donor, etc.." in the results page mean?

Explain different headings in the results tables

Any dependencies?

Why am I supposed to submit at least 110 bases in the "Submit Own Sequence" option?

Examples of HUGO Designation:

How do you compute the fold change in binding affinity?

Architecture and Program Flow Chart of ASSEDA

The Logic and Formulation of Exon Definition for Splice and Splicing Regulatory Sites with Negative Information Content


What is the Automated Splice Site Analyses Server?

top

A system to evaluate changes in splice site strength based on information theory-based models.

What is the basis for identification of the binding sites?

top

Shannon Information theory. Read about the application of information theory to molecular biology at Dr. Tom Schneider's page.

How do I obtain long term access to this resource?

top

Please contact us for licensing or materials transfer agreements!

Which reference sequence of Human Genome is this based on?

top

Presently the system is based on April 2003 reference sequence. It will be soon extended to all drafts available in UCSC genome browser, where the user can choose the draft they are interested in.

Why is a list of various accession numbers displayed when I submit a HUGO designated gene name?

top

Multiple accession numbers may be attached to the same functional gene name. In such cases, a list of the mRNA accession numbers are displayed, allowing the user to choose one of them.

It is recommended for the user to choose the mRNA accession number with largest range of base pairs.

What does "Submit own sequence" mean?

top

If you find that a particular accession number is missing in the UCSC genome April 2003 assembly, then you can make use of this option "Submit own sequence". This option enables you to submit your own sequence and make desired mutation(s). Hopefully, the server will be updated very soon, so that it is not limited to April 2003 draft alone.

What does "Submit mRNA Accession #" mean?

top

Some of the accession numbers still do not have a associated gene name. In such cases, the user can use this option where mutational analyses can be done with out requiring associated HUGO designated gene names.
Wondering why "Gene Name" field is still asked? Then you click this link.

What does "Designated Gene Name" mean?

top

Designated Gene Name is a HUGO designated gene name, which is present in the UCSC genome browser. To know the name of associated genes use the following link UCSC Genome Browser or Genew database search engine. In case you don't find the gene name, you can choose the "Submit mRNA Accession #" option, where the gene name asked is for naming conventions only.

What does "Mutation / Variant" mean?

top

This is the Mutation / Variant field where the user can submit mutation / variant. The Mutation indicated should be in strict conformation with HUGO Designated Mutation Nomenclature. The user can analyze multiple mutations / variants by submitting multiple mutations / variants separated by a '+'.

What does "Window Range" mean?

top

The Window Range is the region, in bases before and after the base, where the mutation takes place. It is the region where the information content of sites will be calculated. The sites falling outside the range of the window will be neglected. In case of haplotypes, all the sites falling in-between the bases where the mutations are taking place will be considered. The window range is limited to only 1000 bases to reduce the overhead of scanning all the base pairs. The default value is 54, which is twice of acceptor Ri(b,l) matrix range.

What does "Analyze following Sites:" mean?

top

There are variety of Information weight matrix (Ri(b,l)) matrices available, which can recognize certain kind of sites. The user is given the option to choose one or multiple Information weight matrices. The acceptor and donor Ri(b,l) matrices are scanned by default. In the near future, more Ri(b,l) matrices will be added to the list.

The binding site selection method has been redesigned, using checkboxes instead of a drop down menu. Over time, the number of models developed for ASSEDA has increased, making the old selection method cumbersome. The new method allows for easier selection of specific models, and can easily be expanded without adding clutter. Simply check the models before submitting your mutation. Additional models of RNA binding proteins involved in splicing are planned to be added in the near future.

List of available binding sites: Donors and acceptors (human and mouse), branch point, SF2/ASF (SRSF1), SC35 (SRSF2), SRp40 (SRSF5), SRp55 (SRSF6), hnRNPA1, hnRNPH1.

What does "Mutation Coordinate is specified relative to the beginning of either the:" mean?

top

CDS (CoDing Segment) introduces a complex section that describes the gene open reading frame (ORF), the portion of the sequence that codes for a protein product.
It is observed that most of the authors indicating the mutation considered initial start codon as position 1, where as, on contrary in some of the publications the start position of the gene is considered to be position 1. To facilitate the user's preference to set the parameters according to their numbering terminology this option is provided:
  • Open Reading Frame ( Initial CDS position in NCBI mRNA Accession): The initial start codon is considered as position 1.

  • First Position of the NCBI mRNA Accession: The first position of the mRNA Accession is considered as position 1.

What does "Translate all forward frames in the lister generated map" mean?

top

Every region of DNA has six possible reading frames, three in each direction. The resulting visualization map of binding sites is configured such that only forward frames are shown. When the user selects this option, the resulting visualization map of binding sites will indicate all of the three forward frames with amino acids encoded. This enables the user to analyze whether the mutation made shifts within the reading frame or not.

What does "mRNA Accession No." mean?

top

mRNA Accession Number is the accession number associated with the gene name. The user has to enter the accession number which is present in the April 2003 draft of the UCSC genome assembly, as this system is based on that draft. The user can find the mRNA Accession Number of the gene from the links Genew database search engine or from UCSC Genome Browser. The accession number should not be the refseq accession number.

If you find that a particular accession number is missing in the April 2003 draft of the UCSC genome assembly, then you can make use of the option "Submit own sequence". This option enables you to submit your own sequence and make desired mutation(s). Hopefully, the server will be updated very soon, so that it is not limited to the April 2003 draft alone.

Why is Gene Name required in "Submit mRNA Accession #" option?

top

This is simply to provide the gene name in the results pages. The user can enter any gene name, but it will not be tested or verified. It is used to generate comprehensive information results only.

What does "Direction" mean?

top

Direction is the strand of the sequence pasted in the sequence text box. The user can specify either '+' or '-' strand.

What does "Sequence" mean?

top

This is the text box where the user can paste in his own sequence. The sequence is expected to contain only characters a, g, c, or t. If any other characters are found, they will be removed from the sequence. 

How much time does it take to analyze one mutation?

top

Depending upon the type of the option chosen (submitting own sequence or submitting designated gene name or mRNA accession number), it will take approximately 30 to 60 seconds to analyze one mutation when the load is optimum. A longer delay may be expected if load is high.

Why does it take such a long time to analyze one small mutation?

top

The mutation(s) / variant(s) submitted is/are parsed and the base pairs where the changes are taking place are identified. All the base pairs falling in the window range from those base pairs are pulled out from the library file (of that chromosome) which consists of millions of bases. To identify and pull out specific parts of the chromosome will naturally lead to delay.
Not to forget, "It's always Worth Waiting!"

What do the sub divisions "acceptor, donor, etc.." in the results page mean?

top

The information content obtained at the sites, when scanned with various information weight matrices (Ri(b,l) matrices), are categorized into decreased, increased and no change depending upon the type of the information content change obtained. The total sites sub division contains all the sites recognized which have information content greater than threshold set by the user. The above categories are displayed under the sub heading of their respective information weight matrix name.

Explain different headings in the tables.

top

Genomic Coordinate: The genomic coordinate number of the base where the information content
                                is measured.
Position Relative to Natural Site: The relative distance of the base from the closest natural site.
Closest Natural Site: The genomic coordinate number of the closest natural site. This link,
                              when clicked, pops up a window containing information content information of all
                              the natural sites of that particular mRNA accession.
Initial(Ri):
                Initial information content measured at the base before the mutation is made.
Final(Ri):
                 Final information content measured at the base after the mutation is made.
Δ Ri:
                       Final(Ri) - Initial(Ri); change of information content obtained at the site due to a
                             mutation or variant.
Fold change:            A single bit difference in Ri value corresponds to at least a two-fold difference in
                              binding site strength.  Fold change indicates the change
in binding affinity of two
                              sites.

                              Fold Change  =
2ΔRi ;
                                            where
ΔRi = difference between their respective individual information
                                                               contents
of two sites ( wild type, mutant type)
% Binding (Final/Initial):
Indicates the change of binding energy: calculated as a percentage.
Initial(Z):
                Z score for this evaluation, assuming that individual information values form a
                             Gaussian distribution.
Final(Z):                
Z score after mutation.
Δ Z:                       Change in Z score obtained at the site due to a mutation or variant.

Any dependencies?

top

This system uses the Delila system tools for the identification of potential sites.

Why am I supposed to submit at least 110 bases in the "Submit Own Sequence" option?

top

The sequence submitted is scanned by different weight matrices selected by the user. Acceptor and Donor weight matrices are used by default. The number of bases scanned by each matrix is twice the length of the weight matrix on either side of the base(s) where change is made. Since the acceptor weight matrix scans the longest number of bases (27 bases), twice the length of acceptor window on both sides of the base where change is made sums up to about 110 bases.

Examples of HUGO Designation:

top

Below are examples of Delila conversions for mutations indicated in HUGO nomenclature.
     Gene: PAH
     Accession: U49897
     Chromosome: 12
     Strand: Reverse Strand

 

HUGO Designation

Delila instruction

IVS2nt1g>a + Ivs2nt4a>t

c103239515t.t103239512a

IVS2nt1g>a

c103239515t

100A>G

t103244229c

IVS5nt5delg

d103193316,103193316

c.509delg

d103243820,103243820

100_150del

d103244179,103244229

100insagct

i103244228,103244229agct

IVS2nt1insgcta

i103239514,103239515tagc

IVS11+3-+6delGAGT d103170365,103170368
504_506delGATinsA i103243823,103243827t
IVS2_IVS3del or  88+?_923+? or EX3_5del Not Supported*

The Delila instructions are generated internally and are used to extract the sequence from the reference genome; the wild-type and mutant sequence are generated as well.

How do you compute the fold change in binding affinity?

top

The fold change in binding affinity of two sites ( wild-type, mutant) is 2ΔRi , where ΔRi is the difference between their respective individual information contents.

Architecture and Program Flow Chart of ASSEDA

top

The architecture diagram can be found here, and the program flow can be found here.

The Logic and Formulation of Exon Definition for Splice and Splicing Regulatory Sites with Negative Information Content

top

There has been recent changes to the Exon Definition formulation in regards to the impact of negative values. This update will not affect individual Ri values, but may affect previous computations of Ri,total involving sites which were abolished. The Logic and Formulation of Exon Definition for Splice and Splicing Regulatory Factors is described in detail here.