You are here: Home Services Scientific Computing SGI ia64 cluster

SGI ia64 cluster


Vader
is a cluster with 56 ia64 processors running a Suse Linux 10SP2 . This cluster is used to run both serial and parallel biocomputing related jobs. As this cluster has 160 GB of RAM memory we dedicate the cluster mainly to research projects that requires huge memory like sequence analysis and genotyping analysis.

alt

As an example, vader is used for the ENCODE Project ,whose role is to aid the gene annotation pipeline by highlighting interesting annotations based on the Principal Variant Pipeline. In order predict interesting changes in the structure and function of the annotated alternative splice isoforms; you first need the main functional isoform. Knowing which isoform has principal biological function allows more reliable predictions of function - Firestar method -, structure and conservation of exonic structure - CExonic method-. It should help research groups by indicating which is the main functional isoform for a gene.

 

 

Other of applications of Vader is give compute support for Web Services provided by INB (Instituto Nacional de Bioinformática). Some of these Web Services are using in genome analysis to retrieve biological annotations, as the functional annotation of protein and genes based on family identification and GO terms - FunCUT-SIAM method -.


Vader is also used in project related with genotyping (SNP`s) and CNV's (Copy Number Variants).

CNIO researchers are involved in the analysis of genotyping and copy-number data from more than 1000 bladder cancer cases and more than 1000 controls assayed with Illumina 1M arrays . The data coming from 18 hospitals located in five areas in Spain was generated in collaboration with the Core Genotyping Facility of the National Cancer Institute (NCI, USA) coordinated by Stephen Chanock. Click here for a description of the project and some examples of analysis with lower density data.

Researcher from these groups, are actively using vader cluster for high memory-consuming statistical computations, like pararllel survival analysis models and logistic regressions for 1 million markers and 2000 samples, genotype imputation, or copy number variants (CNV) detection algorithms  based on hidden markov models (among other methods).

Software on vader for this project:

R: R is `GNU S', a freely available language and environment for statistical computing and graphics which provides a wide variety of statistical and graphical techniques: linear and nonlinear modelling, statistical tests, time series analysis, classification, clustering, etc

PennCNV : PennCNV implements a hidden Markov model (HMM) that integrates multiple sources of information to infer CNV calls for individual genotyped samples. It differs form segmentation-based algorithm in that it considered SNP allelic ratio distribution as well as other factors, in addition to signal intensity alone. In addition, PennCNV can optionally utilize family information to generate family-based CNV calls by several different algorithms. Furthermore, PennCNV can generate CNV calls given a specific set of candidate CNV regions, through a validation-calling algorithm.

BEAGLEBEAGLE is a state of the art software package for analysis of large-scale genetic data sets with hundreds of thousands of markers genotyped on thousands of samples. BEAGLE can :

  1. Phase genotype data (i.e. infer haplotypes) for unrelated individuals, parent-offspring pairs, and parent-offspring trios.
  2. infer sporadic missing genotype data.
  3. impute ungenotyped markers that have been genotyped in a reference panel.
  4. perform single marker and haplotypic association analysis.

IMPUTE : IMPUTE is a program for estimating ("imputing") unobserved genotypes in SNP association studies. The program is designed to work seamlessly with the output of the genotype calling program CHIAMO and the population genetic simulator HAPGEN, and it produces output that can be analyzed using the program SNPTEST.

How to use vader :

See our wiki :
vader