CNIO researchers are involved in the analysis of genotyping and copy-number data from more than 1000 bladder cancer cases and more than 1000 controls assayed with Illumina 1M arrays . The data coming from 18 hospitals located in five areas in Spain was generated in collaboration with the Core Genotyping Facility of the National Cancer Institute (NCI, USA) coordinated by Stephen Chanock. Click here for a description of the project and some examples of analysis with lower density data.
Researcher from these groups, are actively using vader cluster for high memory-consuming statistical computations, like pararllel survival analysis models and logistic regressions for 1 million markers and 2000 samples, genotype imputation, or copy number variants (CNV) detection algorithms based on hidden markov models (among other methods).
Software on vader for this project:
R: R is `GNU S', a freely available language and environment for statistical computing and graphics which provides a wide variety of statistical and graphical techniques: linear and nonlinear modelling, statistical tests, time series analysis, classification, clustering, etc
PennCNV : PennCNV implements a hidden Markov model (HMM) that integrates multiple sources of information to infer CNV calls for individual genotyped samples. It differs form segmentation-based algorithm in that it considered SNP allelic ratio distribution as well as other factors, in addition to signal intensity alone. In addition, PennCNV can optionally utilize family information to generate family-based CNV calls by several different algorithms. Furthermore, PennCNV can generate CNV calls given a specific set of candidate CNV regions, through a validation-calling algorithm.
BEAGLE : BEAGLE is a state of the art software package for analysis of large-scale genetic data sets with hundreds of thousands of markers genotyped on thousands of samples. BEAGLE can :
IMPUTE : IMPUTE is a program for estimating ("imputing") unobserved genotypes in SNP association studies. The program is designed to work seamlessly with the output of the genotype calling program CHIAMO and the population genetic simulator HAPGEN, and it produces output that can be analyzed using the program SNPTEST.