scEnrichment
Last updated
Last updated
scEnrichment identifies cell types based on DEGs provided by the user. This is done by measuring the level of overlap between the user DEGs and the DEG genesets generated with the .
scEnrichment requires 2 inputs from the user:
DEG list
LogFC of each DEG (optional)
User provides DEGs and their direction of regulation.
We perform geneset enrichment by comparing the DEGs provided in the users input to every available geneset found in the DISCO atlas, and generating an odds ratio table to determine if the enriched DEGs can be matched to a set of DEGs already found in the atlas.
3 genesets are of relevance when deriving the odds ratio table. Firstly, the 'Atlas Geneset', which contains a collection of genesets in a specific atlas (in this case, Adipose) containing the DEGs comparing either between cell types, or across different phenotypes (e.g. disease states). Secondly, the 'Derived Geneset', containing one geneset taken from the Atlas geneset. Lastly, the user provides their own DEG geneset ('Input DEG').
The input DEG geneset is first referenced against the adipose atlas genesets, and any DEGs in the Input DEG absent in the adipose atlas geneset is removed (DEG 6 in this case). An odds ratio table is then constructed and an odds ratio (OR) is calculated, with a higher odds ratio (>1) indicating a higher association between the user input and the particular derived geneset. Fishers' Exact test is used to determine if the odds ratio is significantly different from 1. This process is repeated for every single geneset within the atlas, and the same process is repeated across every atlas, and the top 20 most highly enriched genesets are returned, visualized as an output table and bar graph.
2 example files are provided using DEGs found in publications to demonstrate the effectiveness of scEnrichment in identifying cells across cell types (Example 1) and across phenotypes (Example 2), both in the presence (Example 1) and absence (Example 2) of logFC values provided. In both these examples, the single cell expression data is not publically available, and are hence, not incorporated in our atlases. Irregardless, scEnrichment can still successfully identify the cell types present!
In the event that the LogFC values are provided, a 'Weighted Fishers' Exact test' is used instead, which we previously developed for our . A slightly different odds ratio table is constructed, taking into account the weightage of each DEG based on their provided logFC, with absent DEGs given a value of 1.