Sage Available and Transition Datasets
Sage Available Datasets are complete, global coherent datasets (GCD) freely available from the Sage Commons Repository. The following table provides specifications of the datasets currently available and a PMID reference link or a text description. Researchers need only enter their name, organizational affiliation and email address on the download page. The repository packages include a readme file with descriptions and references and a (large!) compressed file with datasets and analyses. Researchers should send feedback on the program as well as questions, comments, and suggestions to repdata@sagebase.org.
Please note that in order to assure proper acknowlegement, the datasets are provided under a Creative Commons CCBY3 license (http://creativecommons.org/licenses/by-sa/3.0/) while the code and software are licensed under an Apache Software Foundation Version 2 License (http://www.apache.org/licenses/LICENSE-2.0.html). License text is included in each of the data packages. Downloading these data or software packages confirms your agreement with the terms of these licenses.
The Sage Transition Dataset table has the specifications of global coherent datasets that are in the process of being made publicly available from the Sage Commons repository. Interested researchers should check the Repository Download Page as partial datasets may be available. For more information on the status of the datasets contact repdata@sagebase.org.
Sage Available DatasetsName |
Tumor/ Tissue Type |
Species | Disease | Approx. Num. Individuals | Investigator | Institution | Reference PMID/ Description |
| Human_Cancer_Glioblastoma_TCGA | Glioblastoma | Human | Cancer | 465 | TCGA | TCGA | 18772890 |
| Mouse_Metabolic_Liver_UCLA | Liver | Mouse | Metabolic | 111 | Jake Lusis | UCLA | 12646919 |
| Human_Cancer_TCGA | Multiple | Human | Cancer | >2,400 | TCGA | TCGA | TCGA |
| Mouse_CVD_Adipose_Liver_ Brain_Muscle_UCLA |
Adipose, Liver, Brain, Muscle | Mouse | CVD | 334 | Jake Lusis | UCLA | description |
| Human_Cancer_HCC_HKU | HCC | Human | Cancer | 250 | John Luk | HKU | description |
| Human_CVD_Liver_Vanderbilt/ Pittsburg/StJudes |
Liver | Human | CVD | 517 | Guengrich/Strom/ Schuetz |
Vanderbilt Pittsburg StJudes |
18462017 |
Sage Transition DatasetsName |
Tumor/ Tissue Type |
Species | Disease | Approx. Num. Individuals | Investigator | Institution | Reference PMID/ Description |
| Human_Cancer_Breast_BCCA | Breast | Human | Cancer | 1,500 | Aparicio/Caldas | BCCA Cambridge |
description |
| Human_Neuro-degenerative_Brain: Prefrontal cortex_Visual Cortex_Cerbellum_HBTRC |
Brain:Prefrontal cortex, Visual Cortex, Cerbellum | Human | Neuro-degenerative | 700 | Francine Benes | HBTRC | description |
| Human_Cancer_AML(pediatric)_ FHCRC |
AML(pediatric) | Human | Cancer | 200 | Soheil Meshinchi | FHCRC | description |
Descriptions:
Mouse_CVD_Adipose_Liver_Brain_Muscle_UCLA
C57BL/6J and C3H/HeJ inbred mouse strains exhibit dramatically different null background. In order to identify the genes that contribute to these differences, we constructed an F2 intercross between the B6.Apoe-/- and C3H.Apoe-/- strains consisting of 334 animals. The mice were fed on a chow diet until 8 weeks of age, then fed a high fat (42% fat) "western" diet for 16 weeks to exacerbate the phenotypes and euthanized at 24 weeks of age via cervical dislocation. Prior to death, mice were fasted for 4 hours in the morning, anesthetized using Isoflurane, and weighed. Blood was collected by retro-orbital bleed; plasma was frozen at -800C. We measured plasma cholesterol, HDL, LDL, triglycerides, free fatty acids, glucose, insulin, leptin, adiponectin and PON1 activity levels. Liver, brain, skeletal muscle (hamstring) and adipose (gonadal fat pad) were flash frozen in liquid nitrogen. RNA was isolated from the tissues using the Trizol method and utilized in microarray analysis on a custom 60mer Agilent chip (reference for chip would be useful). Hepatic cholesterol, triglyceride and free fatty acid levels were also measured. Hearts and aortae were extracted, perfused and fixed for atherosclerotic lesion analysis. The aortic arch was serially sectioned through to the aortic sinus with every fifth 10um section stained with hematoxylin and oil-red-o, which specifically stains lipids. Slides were examined by light microscopy. The fatty streak lesion area was quantified using an ocular with a grid; forty sections per mouse were quantified and averaged. Vascular calcification and aneurysm formation were also measured in a semi-quantitative manner based on presence or absence and size or severity. DNA was isolated from kidney using a phenol chloroform extraction method. The mice were genotyped at 1500 SNPs using the ParAllele molecular inversion probe technology; 1353 SNPs passed quality control for a final marker density of 1.5cM.
References:
1. Ghazalpour, A., et al., Integrating genetic and network analysis to
characterize genes related to mouse weight. PLoS Genet, 2006. 2(8):
p. e130.
2. Itoh, Y., et al., Dosage compensation is less effective in birds than in
mammals. J Biol, 2007. 6(1): p. 2.
3. Lum, P.Y., et al., Elucidating the murine brain transcriptional network in
a segregating mouse population to identify core functional modules for
obesity and diabetes. J Neurochem, 2006. 97 Suppl 1: p. 50-62.
4. Meng, H., et al., Identification of Abcc6 as the major causal gene for
dystrophic cardiac calcification in mice through integrative genomics.
Proc Natl Acad Sci U S A, 2007. 104(11): p. 4530-5.
5. Schadt, E.E., et al., Mapping the genetic architecture of gene expression in
human liver. PLoS Biol, 2008. 6(5): p. e107.
6. van Nas, A., et al., Elucidating the role of gonadal hormones in sexually
dimorphic gene coexpression networks. Endocrinology, 2009. 150(3):
p. 1235-49.
7. Wang, S.S., et al., Identification of pathways for atherosclerosis in mice:
integration of quantitative trait locus analysis and global gene
expression data. Circ Res, 2007. 101(3): p. e11-30.
8. Yang, X., et al., Tissue-specific expression and regulation of sexually
dimorphic genes in mice. Genome Res, 2006. 16(8): p. 995-1004.
Human_Cancer_HCC_HKU
The HKU Hepatocarcinoma study (HKU-HCC) aimed to characterize the process of tumorigenesis in hepatocellular carcinoma (HCC) using genotyping, gene expression profiling and clinical endpoints in adjacent normal (AN) and tumor (TU) samples representing, respectively, the pre-cancer state and the results of tumor evolution. The HKU-HCC-100 is a subset of 100 matched paired TU and AN liver tissue samples collected from Asian subjects undergoing surgical resection for treatment of HCC. These 200 samples represent a subset of the 250 matched paired TU and AN samples screened. DNA was isolated from all AN and TU tissues and genotyped on the Illumina 650Y SNP genotyping array representing 655,352 tag SNP markers. Copy number aberration markers (sCNV markers) were then imputed for 32,711 locations in the genome from this high-density SNP panel. RNA samples were profiled on a custom Affymetrix microarray comprised of oligonucleotide probes targeting transcripts representing 37,585 known and predicted genes, including high-confidence non-coding RNA sequences.
References
Zhang, B. & Horvath, S.
A general framework for weighted gene co-expression network analysis.
Stat Appl Genet Mol Biol 4, Article17 (2005).
Langfelder P, Zhang B, Horvath S (2007)
Defining clusters from a hierarchical cluster tree: the Dynamic
Tree Cut library for R.
Human_Cancer_Breast_BCCA
METABRIC database – 1600 breast cancers (all subtypes) generated jointly by Drs. Sam Aparicio at BCCA and Carlos Caldas at CRC Cambridge UK as part of the METABRIC project. Data types include - DNA variation (Affymetrix SNP6.0), gene expression (Illumina Infinium II Bead arrays) and clinical outcome (5 year minimum outcomes information).
Human_Neuro-degenerative_Brain:Prefrontal cortex_Visual Cortex_Cerbellum_HBTRC
The ~700 individuals in this dataset are composed of approximately 400 Alzheimer’s disease (AD) cases, 220 Huntington's Disease and 400 controls matched for age, gender, and post mortem interval (PMI). Around 1M SNPs for these individuals are genotyped. Three brain regions (cerebellum, visual cortex, and dorsolateral prefrontal cortex) from the same individuals were profiled using a high-density microarray covering mRNAs, splice variants, miRNAs and known ncRNAs. Clinical outcomes available include age at onset, age at death, Braak scores, Vonsattel scores, Regional brain enlargement/atrophy.
Human_Cancer_AML(pediatric)_FHCRC
The Therapeutically Applicable Research to Generate Effective Treatments (TARGET) Initiative is as an effort to use tools of modern genetics on the discovery of valid therapeutic targets in childhood cancers so that new, more effective treatments can be rapidly developed. Initial initiative included childhood ALL as well as Neuroblastoma. More recently NCI/CTEP expended the original TARGET initiative to include additional diseases. One of the diseases that was chosen as part of the TARGET expansion was childhood AML. TARGET AML proposes to use whole genome approaches using multiple platforms to generate data in order to identify patients with high risk of failure. Patient population includes total of 200 patients with no known high risk factors who have achieved an initial remission, with approximately 80 of these patients who have gone on to have a leukemic relapse. For studies using genomic DNA (SNP/CGH, sequencing, methylation), matching tumor and germline DNA will be used to to more accurately identify disease associated events from germ line polymorphisms. As part of this project, we are obtaining SNP genotyping data from matching tumor and germ line DNA for copy number and LOH determination, expression data from Gene ST human exon arrays and methylation status using the Illumina methylation arrays. We are also seeking additional funds to perform miRNA sequencing in the same cohort of patients. Furthermore, a contractor selected by NCI/CTEP will perform transcriptome sequencing or exome sequencing in this cohort of patients.
