Repository
We performed manual metadata curation, harmonizing the metadata across datasets. To do so, we utilized Experimental Factor Ontology (EFO) and a controlled vocabulary to organize all entries in the repository, to facilitate consistent and efficient content indexing and data retrieval.
Samples in DISCO are classified into two groups: Tissue and Cell Line. Tissue samples are derived from human tissues, while Cell Line samples include commercialized cell lines or organoids. Each sample type has unique metadata fields, while also sharing some common descriptors.
Common Metadata Fields
Sample ID
Original sample ID in GEO, ArrayExpress and other databases
Project ID
Original project ID in GEO, ArrayExpress and other databases
Sample type
In the DISCO (Database of Immune Cell Atlas) platform, each sample is assigned to a specific category to facilitate systematic classification and analysis. Detailed information about the sample types is provided at this page.
Platform
The platform used for generating the library (e.g., 10x 3', 10x 5')
RNA source
The RNA source, which can be either the whole cell or the nucleus
Number of cell
The number of cell in the processed data
Median UMI
The median UMI count across all cells
Other metadata
Any supplementary information, beyond our predefined metadata fields, that may be relevant to a given study
Metadata for Tissue Sample
Field
Tissue
Source tissue
Anatomical site
More detailed information on the tissue
Disease
The disease status of sample
Age group
The fetal age or rough age
Cell sorting
The sorting strategy used
Genotype
The genotype information, such as CNV and SNV
Treatment
Information about the treatment applied to the sample
Time point
The time point of the treatment or other relevant events
Subject ID
The donor ID, typically used when there are multiple samples from a single donor
Age
The chronological age of sample
Gender
The gender information of sample
Race
The racial/ethnic information of the sample
Infection
The virus/bacterial infection of sample
Batch
The batch information, typically used when there are multiple batches in the study
Disease stage
The stage of a cancer tells you how big it is and whether it has spread.
Disease grade
The grade of a cancer tells you how much the cancer cells look like normal cells.
Disease subtype
Subtype of disease, such as triple negative breast cancer
Sample Type
control
Sample without reported disease
disease tissue (non-cancer)
Tissue from a non-cancerous disease
blood cancer
Sample from blooad cancer, such as AML
adjacent normal
Sample from a blood cancer, such as AML (acute myeloid leukemia)
tumor tissue
Primary tumor tissue sample
other tissue from tumor patient
Sample from a tumor patient, but not from a primary or metastatic tumor
metastatic tumor
Metastatic tumor tissue
allograft
Tissue that is transplanted from one person to another
experimental treatment
Sample undergoing experimental treatment
Metadata for Cell Line Data
Cell line studies, which often involve treatments or induced differentiation, are now accompanied by expanded metadata that includes information on the source cells, induced cell types, and treatment protocols.
Field
Source cell line
The source cell line
Source tissue
The tissue of source cell line
Source disease
The disease status of source cell line
Source cell type
The cell type of source cell line
Induced cell tissue
The induced cell type or organoid
Treatment
Information about the treatment applied to the sample
Collect time
Sample collection time after treatment.
Genotype
Information about the treatment applied to the sample
Sample Type
cell line
Cultures of cells that can be propagated repeatedly
organoid
Organ produced in vitro in three dimensions
primary cell culture
Cultures established by cultivation of cells which have been released from organ fragments
Last updated