Sage Available and Transition Datasets

Sage Available Datasets are complete, global coherent datasets (GCD) freely available from the Sage Commons Repository. The following table provides specifications of the datasets currently available and a PMID reference link or a text description. Researchers need only enter their name, organizational affiliation and email address on the download page. The repository packages include a readme file with descriptions and references and a (large!) compressed file with datasets and analyses. Researchers should send feedback on the program as well as questions, comments, and suggestions to repdata@sagebase.org.

Please note that in order to assure proper acknowlegement, the datasets are provided under a Creative Commons CCBY3 license (http://creativecommons.org/licenses/by-sa/3.0/) while the code and software are licensed under an Apache Software Foundation Version 2 License (http://www.apache.org/licenses/LICENSE-2.0.html). License text is included in each of the data packages. Downloading these data or software packages confirms your agreement with the terms of these licenses.

Go to Repository Download Page



The Sage Transition Dataset table has the specifications of global coherent datasets that are in the process of being made publicly available from the Sage Commons repository. Interested researchers should check the Repository Download Page as partial datasets may be available. For more information on the status of the datasets contact repdata@sagebase.org.

Sage Available Datasets

Name

Tumor/
Tissue Type
Species Disease Approx. Num. Individuals Investigator Institution Reference
PMID/
Description
Human_Cancer_Glioblastoma_TCGA Glioblastoma Human Cancer 465 TCGA TCGA 18772890
Mouse_Metabolic_Liver_UCLA Liver Mouse Metabolic 111 Jake Lusis UCLA 12646919
Human_Cancer_TCGA Multiple Human Cancer >2,400 TCGA TCGA TCGA
Mouse_CVD_Adipose_Liver_
Brain_Muscle_UCLA
Adipose, Liver, Brain, Muscle Mouse CVD 334 Jake Lusis UCLA description
Human_Cancer_HCC_HKU HCC Human Cancer 250 John Luk HKU description
Human_CVD_Liver_Vanderbilt/
Pittsburg/StJudes
Liver Human CVD 517 Guengrich/Strom/
Schuetz
Vanderbilt
Pittsburg
StJudes
18462017
               

Sage Transition Datasets

Name

Tumor/
Tissue Type
Species Disease Approx. Num. Individuals Investigator Institution Reference
PMID/
Description
Human_Cancer_Breast_BCCA Breast Human Cancer 1,500 Aparicio/Caldas BCCA
Cambridge
description
Human_Neuro-degenerative_Brain:
Prefrontal cortex_Visual Cortex_Cerbellum_HBTRC
Brain:Prefrontal cortex, Visual Cortex, Cerbellum Human Neuro-degenerative 700 Francine Benes HBTRC description
Human_Cancer_AML(pediatric)_
FHCRC
AML(pediatric) Human Cancer 200 Soheil Meshinchi FHCRC description

 

Descriptions:

Mouse_CVD_Adipose_Liver_Brain_Muscle_UCLA

C57BL/6J and C3H/HeJ inbred mouse strains exhibit dramatically different null background.  In order to identify the genes that contribute to these differences, we constructed an F2 intercross between the B6.Apoe-/- and C3H.Apoe-/- strains consisting of 334 animals.  The mice were fed on a chow diet until 8 weeks of age, then fed a high fat (42% fat) "western" diet for 16 weeks to exacerbate the phenotypes and euthanized at 24 weeks of age via cervical dislocation.  Prior to death, mice were fasted for 4 hours in the morning, anesthetized using Isoflurane, and weighed.   Blood was collected by retro-orbital bleed; plasma was frozen at -800C.  We measured plasma cholesterol, HDL, LDL, triglycerides, free fatty acids, glucose, insulin, leptin, adiponectin and PON1 activity levels.  Liver, brain, skeletal muscle (hamstring) and adipose (gonadal fat pad) were flash frozen in liquid nitrogen. RNA was isolated from the tissues using the Trizol method and utilized in microarray analysis on a custom 60mer Agilent chip (reference for chip would be useful).  Hepatic cholesterol, triglyceride and free fatty acid levels were also measured.  Hearts and aortae were extracted, perfused and fixed for atherosclerotic lesion analysis.  The aortic arch was serially sectioned through to the aortic sinus with every fifth 10um section stained with hematoxylin and oil-red-o, which specifically stains lipids.  Slides were examined by light microscopy.  The fatty streak lesion area was quantified using an ocular with a grid; forty sections per mouse were quantified and averaged.  Vascular calcification and aneurysm formation were also measured in a semi-quantitative manner based on presence or absence and size or severity. DNA was isolated from kidney using a phenol chloroform extraction method.  The mice were genotyped at 1500 SNPs using the ParAllele molecular inversion probe technology; 1353 SNPs passed quality control for a final marker density of 1.5cM.

References:
1. Ghazalpour, A., et al., Integrating genetic and network analysis to
     characterize genes related to mouse weight. PLoS Genet, 2006. 2(8):
     p. e130.
2. Itoh, Y., et al., Dosage compensation is less effective in birds than in
     mammals. J Biol, 2007. 6(1): p. 2.
3. Lum, P.Y., et al., Elucidating the murine brain transcriptional network in
     a segregating mouse population to identify core functional modules for
     obesity and diabetes. J Neurochem, 2006. 97 Suppl 1: p. 50-62.
4. Meng, H., et al., Identification of Abcc6 as the major causal gene for
     dystrophic cardiac calcification in mice through integrative genomics.
     Proc Natl Acad Sci U S A, 2007. 104(11): p. 4530-5.
5. Schadt, E.E., et al., Mapping the genetic architecture of gene expression in
     human liver. PLoS Biol, 2008. 6(5): p. e107.
6. van Nas, A., et al., Elucidating the role of gonadal hormones in sexually
     dimorphic gene coexpression networks. Endocrinology, 2009. 150(3):
     p. 1235-49.
7. Wang, S.S., et al., Identification of pathways for atherosclerosis in mice:
     integration of quantitative trait locus analysis and global gene
     expression data. Circ Res, 2007. 101(3): p. e11-30.
8. Yang, X., et al., Tissue-specific expression and regulation of sexually
     dimorphic genes in mice. Genome Res, 2006. 16(8): p. 995-1004.

^ Back to table

Human_Cancer_HCC_HKU

The HKU Hepatocarcinoma study (HKU-HCC) aimed to characterize the process of tumorigenesis in hepatocellular carcinoma (HCC) using genotyping, gene expression profiling and clinical endpoints in adjacent normal (AN) and tumor (TU) samples representing, respectively, the pre-cancer state and the results of tumor evolution. The HKU-HCC-100 is a subset of 100 matched paired TU and AN liver tissue samples collected from Asian subjects undergoing surgical resection for treatment of HCC. These 200 samples represent a subset of the 250 matched paired TU and AN samples screened. DNA was isolated from all AN and TU tissues and genotyped on the Illumina 650Y SNP genotyping array representing 655,352 tag SNP markers. Copy number aberration markers (sCNV markers) were then imputed for 32,711 locations in the genome from this high-density SNP panel. RNA samples were profiled on a custom Affymetrix microarray comprised of oligonucleotide probes targeting transcripts representing 37,585 known and predicted genes, including high-confidence non-coding RNA sequences.

References
            Zhang, B. & Horvath, S.
            A general framework for weighted gene co-expression network analysis.
            Stat Appl Genet Mol Biol 4, Article17 (2005).
           
            Langfelder P, Zhang B, Horvath S (2007)
            Defining clusters from a hierarchical cluster tree: the Dynamic
            Tree Cut library for R.

^ Back to table

Human_Cancer_Breast_BCCA

METABRIC database – 1600 breast cancers (all subtypes) generated jointly by Drs. Sam Aparicio at BCCA and Carlos Caldas at CRC Cambridge UK as part of the METABRIC project. Data types include - DNA variation (Affymetrix SNP6.0), gene expression (Illumina Infinium II Bead arrays) and clinical outcome (5 year minimum outcomes information).

^ Back to table

Human_Neuro-degenerative_Brain:Prefrontal cortex_Visual Cortex_Cerbellum_HBTRC

The ~700 individuals in this dataset are composed of approximately 400 Alzheimer’s disease (AD) cases, 220 Huntington's Disease and 400 controls matched for age, gender, and post mortem interval (PMI). Around 1M SNPs for these individuals are genotyped. Three brain regions (cerebellum, visual cortex, and dorsolateral prefrontal cortex) from the same individuals were profiled using a high-density microarray covering mRNAs, splice variants, miRNAs and known ncRNAs. Clinical outcomes available include age at onset, age at death, Braak scores, Vonsattel scores, Regional brain enlargement/atrophy.

^ Back to table

Human_Cancer_AML(pediatric)_FHCRC

The Therapeutically Applicable Research to Generate Effective Treatments (TARGET) Initiative is as an effort to use tools of modern genetics on the discovery of valid therapeutic targets in childhood cancers so that new, more effective treatments can be rapidly developed. Initial initiative included childhood ALL as well as Neuroblastoma. More recently NCI/CTEP expended the original TARGET initiative to include additional diseases. One of the diseases that was chosen as part of the TARGET expansion was childhood AML. TARGET AML proposes to use whole genome approaches using multiple platforms to generate data in order to identify patients with high risk of failure. Patient population includes total of 200 patients with no known high risk factors who have achieved an initial remission, with approximately 80 of these patients who have gone on to have a leukemic relapse. For studies using genomic DNA (SNP/CGH, sequencing, methylation), matching tumor and germline DNA will be used to to more accurately identify disease associated events from germ line polymorphisms. As part of this project, we are obtaining SNP genotyping data from matching tumor and germ line DNA for copy number and LOH determination, expression data from Gene ST human exon arrays and methylation status using the Illumina methylation arrays. We are also seeking additional funds to perform miRNA sequencing in the same cohort of patients. Furthermore, a contractor selected by NCI/CTEP will perform transcriptome sequencing or exome sequencing in this cohort of patients.

^ Back to table


View Sage Datasets Requiring Release