Data Repository
Data Release 4.0 - April 2011
Sage Bionetworks has established a catalogue of datasets for use in integrative genomics analysis and building predictive computational disease models. The goal is to collate, curate and host these datasets for use by the entire research community. Later this year the Repository will be hosted within a new interactive software platform designed to facilitate the active sharing and evolution of datasets, disease models and computational tools.
Datasets are listed in the repository under one of three categories; (A) data currently available for download, (B) pending datasets that will soon be made available, and (C) other notable datasets that have been identified and/or are currently being generated. Data available for download is publicly available. Access the datasets by registering on the Repository Access Page and then confirming agreement with the user license. Licenses vary by data package and include requirements to properly cite and acknowledge the data and model sources in all publications resulting from use of these datasets.
Datasets in the Sage Bionetworks Repository are selected based on their utility for modeling techniques. Global coherent datasets (GCDs) are the most powerful and contain three layers of information: genome-wide DNA variation, genome-wide intermediate traits and phenotypes. Intermediate traits are typically gene expression profiles, but may also include proteomic, metabolomic, RNA-seq and other molecular data. Although current Sage Bionetworks efforts are focused on providing access to global coherent datasets, we will also offer other useful genomic datasets containing phenotypic traits in conjunction with either DNA variation or intermediate trait data as they become available. Data is in a datapackage under the parameters described here.
For more information you can also:
- Read about Sage Bionetworks' Strategy for community-based integrative genomics modeling.
- Download the terms of use for Sage Bionetworks Repository data, tools and models as well as the guidelines for the data curation process.
Contribute to Community-Based Data Acquisition
Tell us about a dataset you have generated or one you are interested in accessing through this repository:
