GUDMAP Data Overview
This is a working draft. Please send questions and suggestions to email@example.com to help us improve this content.
What is a GUDMAP record?
In the GUDMAP Data Browser, a ‘unit’ of data is referred to as a record. Searches pull up related records and then you click the “View” icon to view that record page.
Assay Types and their Metadata Models
In GUDMAP, the metadata for different types of data are organized to best represent information of interest to users about this data and how useful it is for research. Below is a diagram that represents our different data types, or assay types:
For each assay type, different types of records are assigned to provide metadata, data files and other supplementary information that can help users discover the data, download data and reproduce experiments and studies as desired.
The following diagrams represent the “metadata model”, or a visual representations of how the individual metadata records are related, for each assay type.
Legend for Model Colors
|Elements related directly to the primary assay type of this model.|
|Other assay types that are being linked to the primary assay type.*|
* Keep in mind these related assay types have their own metadata models. (For example, in the Sequencing model, there is a link to Specimen, in yellow, which is a different assay type).
Anatomy of a record
In GUDMAP, data is usually represented with a top-level “base” record. This generally provides that data’s name or title, identifiers, description and metadata.
Then there are other records that are linked to the base record which can include the data download files, analysis files and other aspects of the data that make sense for that assay type.
This web of linked records plays a significant role in how users can discover data with the faceting sidebar.
The following sections describe the major types of records available in the GUDMAP repository.
For example, in the protocols model above, there is a top-level Protocol record that includes most of the protocol information.
Then records for Author, Subject, Keyword and Figure are linked to the base Protocol record.
Note: This model is based on requirements for Nature protocols.
Here is where we can see how one assay type is linked to other assay types. The squares in yellow indicate “cross-references” where sequencing data is linked with Specimen data which in turn links to anatomical sources, cell types, mouse strain and/or alleles.
Sequencing Data: Studies, Experiments & Replicates
The GUDMAP Hub has modeled our sequencing data to reflect the realities of actual research to help provide the most accurate linking between datasets. Here’s a brief overview to help you understand “where” you are when navigating between them.
A Study describes high-level objectives and the overall design of one ore more experiments. A Study may also include study-level analysis files.
An Experiment describes the protocols, procedures, and experiment settings applied to one or more Replicates.
A Replicate contains information about bio-samples as well as its biological and technical replicate numbers.
For single cell RNA-Seq, Replicates may also include:
- A Single Cell Metrics record summarizing the statistics of a replicate
- Replicate-specific experimental assays such as sequencing files, analysis files can be uploaded under File and attach to the proper replicate.
Here you can see Specimen files are linked to other assays (in yellow) such as Imaging data, Specimen Expression Scored, Replicates (from sequencing data), etc.
The squares in green represent linked records of data for that Specimen. Not all specimen records will have such data but they are provided where applicable.
- Imaging data is always linked to a Specimen record.
- Video is accepted, also linked to a Specimen record with certain requirements (in blue text).
- Formats: .avi and .mp4
- Resolution: 1080 or 720
- Aspect ratio: 16:9
- We add attribution to GUDMAP in the bottom left corner and generate youtube videos.
- The base record is Antibody. Linked records include:
- Antibody Tests, which in turn are linked to Test methods.
- Antibody media files (Formats: .jpg, .png, and .pdf)
Cell Line and Mouse Strains
As the diagram indicates, there are three separate assay types for this model:
- Parental Cell Line:
- Includes a Pluripotency Validation Asset
- Reporter Cell Line, includes the following linked records:
- Reporter Validation Asset
- Reporter Vector Map
- Targeting Validation Asset
- Mouse Strain: This is a single record capturing information about available mouse strains.
Data collections are an arbitrary collection of data for capturing records that may be from different records - the most common use case is to link all the source data for a publication into one location that is citable with a permanent DOI url. For more information, see What is a Data Collection?.
A collection links to a publication record, Specimen, Sequencing and/or Gene records.