This website accompanies 'Massively Integrated Coexpression Analysis Reveals Transcriptional Regulation, Evolution and Cellular Implications of the Noncanonical Translatome' paper.
All relevant supplementary data can be found at figshare.
To calculate coexpression at the translatome scale in S. cerevisiae, we integrated a vast dataset of RNA-seq samples, applied center log ratio (clr) transformation, and used the proportionality metric ρ for quantification. We addressed data sparsity and reliability by discarding observations with low raw counts and setting a minimum sample threshold, resulting in an 11,630 by 11,630 coexpression matrix. Additionally, we normalized the coexpression data using spatial quantile normalization (SpQN) to correct for expression level biases, and created a network representation by considering only the top 0.2% of ρ values between all ORF pairs. We investigated canonical ORFs (cORFs), those that have been studied in depth and mostly shown to be involved in a variety of biological processes, and noncanonical ORFs (nORFs) that are usually excluded from genome annotations because of their short length, lack of evolutionary conservation, and perceived irrelevance to cellular physiology. We studied 5,803 cORFs and 5,827 nORFs.
The list of RNA-Seq data (SRA sample accessions) used is listed in Supplementary Data 1
To determine the transcriptional associations of nORFs with specific cellular processes, we performed gene set enrichment analyses (GSEA) on the coexpression partners of each cORF and nORF. In this method, we took an ordered list of genes, sorted by their coexpression level, and assessed whether higher-ranked genes were preferentially annotated with specific Gene Ontology (GO) terms. This approach allowed us to identify potential functional associations of nORFs and cORFs with various cellular processes based on the patterns of coexpression.
See methods of our paper for details.
Main Navigation: At the top, you'll find a navigation bar titled 'ORF Information App.'
This contains two main tabs: 'Search' and 'About.'
Accessing the Search Tab: Click on the 'Search' tab to begin exploring ORF data.
Entering ORF Name: In the sidebar, there's a text input field labeled 'Enter your ORF name here.' Type the name of the ORF you're interested in (e.g., orf14870, YBR196C, PGI1).
Selecting Result Type: Below the text input, choose the type of results you want to view: 'Coexpression' or 'Sequence.'
Submit Your Query: After entering the ORF name and selecting the result type, click the 'Submit' button to proceed.
Sequence Information: If you chose 'Sequence' in the result type:
A section will appear displaying the sequence information of the ORF, including CDS (coding sequence), amino acid sequence, and genomic coordinates. You can copy sequences to clipboard for further use.
Coexpression Information: If 'Coexpression' is selected:
The app will display coexpression information related to the ORF.
This includes Gene Set Enrichment analysis (GSEA) (corresponds to Supplementary Data 5 in Rich et al.) and a visualization of the Coexpression Network (corresponds to Supplementary Data 4 in Rich et al.), which can be used to study the ORF's relationship with other genes and its potential cellular roles.
GSEA results are displayed in a table, which can be sorted by clicking on the column headers. The table contains the following columns:
Gene Set Enrichment Analysis Example
Coexpression Data
You can use sliders and filters to refine the coexpression network and thresholds.