📔
DISCO Vignette
  • The motivation behind DISCO
  • Overview
  • DATABASE CONTENT
    • Repository
    • Atlas
    • Cell Type
  • Tool
    • CELLiD
    • CellMapper
    • scEnrichment
    • Online Integration
    • customDEG
Powered by GitBook
On this page
  • Input
  • Workflow
  • Obtain User Input
  • Odds Ratio Table Generation
  • Example
  1. Tool

scEnrichment

PreviousCellMapperNextOnline Integration

Last updated 6 months ago

scEnrichment identifies cell types based on DEGs provided by the user. This is done by measuring the level of overlap between the user DEGs and the DEG genesets generated with the .

Input

scEnrichment requires 2 inputs from the user:

  1. DEG list

  2. LogFC of each DEG (optional)

NOTE

  • At least 5 DEGs must be provided.

  • Only positively regulated DEGs should be provided.

  • DEGs do not need to be ordered.

  • It is recommended to provide DEGs filtered firstly based on your tolerance for p-value, followed by sorting for absolute logFC, with the highest values submitted as input.

Workflow

Obtain User Input

User provides DEGs and their direction of regulation.

Odds Ratio Table Generation

We perform geneset enrichment by comparing the DEGs provided in the users input to every available geneset found in the DISCO atlas, and generating an odds ratio table to determine if the enriched DEGs can be matched to a set of DEGs already found in the atlas.

3 genesets are of relevance when deriving the odds ratio table. Firstly, the 'Atlas Geneset', which contains a collection of genesets in a specific atlas (in this case, Adipose) containing the DEGs comparing either between cell types, or across different phenotypes (e.g. disease states). Secondly, the 'Derived Geneset', containing one geneset taken from the Atlas geneset. Lastly, the user provides their own DEG geneset ('Input DEG').

The input DEG geneset is first referenced against the adipose atlas genesets, and any DEGs in the Input DEG absent in the adipose atlas geneset is removed (DEG 6 in this case). An odds ratio table is then constructed and an odds ratio (OR) is calculated, with a higher odds ratio (>1) indicating a higher association between the user input and the particular derived geneset. Fishers' Exact test is used to determine if the odds ratio is significantly different from 1. This process is repeated for every single geneset within the atlas, and the same process is repeated across every atlas, and the top 20 most highly enriched genesets are returned, visualized as an output table and bar graph.

Example

2 example files are provided using DEGs found in publications to demonstrate the effectiveness of scEnrichment in identifying cells across cell types (Example 1) and across phenotypes (Example 2), both in the presence (Example 1) and absence (Example 2) of logFC values provided. In both these examples, the single cell expression data is not publically available, and are hence, not incorporated in our atlases. Irregardless, scEnrichment can still successfully identify the cell types present!

In the event that the LogFC values are provided, a 'Weighted Fishers' Exact test' is used instead, which we previously developed for our . A slightly different odds ratio table is constructed, taking into account the weightage of each DEG based on their provided logFC, with absent DEGs given a value of 1.

DISCO Atlases
EWAS toolkit