Sherbrooke Alternative Protein Feature IdentificatoR (SAPFIR) seeks to understand how alternative splicing, transcription initiation and termination change the localization or function of a gene by regulating which localization signals, functional features and other important protein features are present in the mature mRNA.

Single Gene Annotation

The Single Gene Annotation function of the SAPFIR tool visualizes the position of functional features within a gene.

The search parameters include:

A single gene, either using HGNC (human) or MGI (mouse) gene symbol or its ENSEMBL Gene ID.
Please note that human and mouse gene symbol do not follow the same standard, e.g., RBFOX2 (human) vs Rbfox2 (mouse), nor always the same, e.g., QKI (human) vs Qk (mouse). The gene symbols also get updated from time to time. ENSEMBL Gene ID is preferred.
The species, choosing from human or mouse.
The prediction tool(s) used by InterProScan to predict the features.
Please note that each tool is designed to predict a different set of features of the protein sequence. A detailed description of the tools can be found at InterPro website. Choosing multiple tools may produce redundancy in the result if they are designed to predict similar features.
The CDS length ratio threshold.
This is used to exclude transcripts with short CDS. The default value of 0.25 is recommanded.

The result page consists of two downloadable tables and a graph.

The first table lists the features predicted by IntroProScan. The table contains the following columns:

Major isoforms according to APPRIS database are marked by * (if they are tagged as "PRINCIPAL:1" in the APPRIS database) or by ** (if they are tagged as "PRINCIPAL:2" or higher, or tagged as "ALTERNATIVE") in the table above. Transcripts without * or ** are minor isoforms. Please visit the APPRIS web site for more information concerning their scoring system. In summary, when the process to select the major isoform only identifies one peptide candidate, all transcripts coding for this peptide are tagged as "PRINCIPAL:1". Multiple transcripts of one single gene can have this tag if they have identical CDS and only differs in their untranslated regions. However when the process identifies multiple candidates, they are tagged as either "PRINCIPAL:2" to "PRINCIPAL:5" or "ALTERNATIVE:1" or "ALTERNATIVE:2". Untagged isoforms are considered as minor isoforms.

Example of the first table
Fig.1 - Example of the first table.

The second table indicates whether the features are predicted present in all transcripts. It contains the following columns:

Example of the second table
Fig.2 - Example of the second table.

The motivation behind this choise is that often a gene has many non-coding transcripts and short transcripts (transcripts 4 and 5 in the following illustration) according to Ensembl annotations, which makes the majority if not all predicted features alternative. Thus, limiting the transcripts to the coding ones or those with relatively longer CDS produces more meaningful results.

Another consideration is the difference between alternative splicing, alternative transcription start sites (ATSS), and alternative transcription termination sites (ATTS). Despite ATSS and ATTS involving different mechanisms and regulation as compared to alternative slicing, they all contribute to the diversity in transcripts and proteins produced by a single gene. Indeed, many differential splicing tools report changes in ATSS and ATTS. The third standard of Overlap CDS makes the result more relevant for users particularly interested in splicing.

In this case, the selection of transcripts varies for each feature. We defien overlapping to occur when the extremities of the feature in question are within the extremities of the CDS of a (coding) transcript. Hence, in the following illustration, transcripts 1 and 2 overlap with feature1, while transcripts 1, 2 and 3 overlap with feature 4. Thus, feature1 is constitutive and feature4 alternative.

Whether a feature is alternative depends on the transcripts in consideration
Fig.3 - Whether a feature is alternative depends on the transcripts in consideration.

Finally, the graph represents the gene structure and the predicted features:

Example of the final plot
Fig.4 - Example of the final plot.

Enrichment Analysis

The goal of the Enrichment Analysis is to help understand how changes in splicing profile affect the protein function in the cell. This function compares the frequency of InterProScan predicted protein features found in two lists of genomic regions, refered to as "target" and "background". A typical input can be a list of alternatively spliced exons or junctions identified by an RNA-seq experiments as the "target", and a list of expressed but not alternatively spliced exons or junctions in the same experiment as the "background"; although generally any regions meaningful for the user can be used

The expected input includes :

The result page consists of a summary of the enrichment analysis, a table of comparison and enrichment, and functional features annonation for the target and background list.

This result page will be available for 48 hours after completion.

Contact Us

SAPFIR is developped by the research groups of Michelle Scott, Ph.D. and Sherif Abou Elela, Ph.D..

SAPFIR is managed by Delong Zhou. Comments, questions or suggestions can be communicated via e-mail: delong(dot)zhou(at)usherbrooke(dot)ca; please include "SAPFIR" in the subject.