ECCB

Sessions

Saturday
October 1, 2005

AREA 9: ALGORITHMS
Area chairs: Pedro Larrañaga and Martin Vingron

OP-37 Growing Bayesian Network Models of Gene Networks from Seed Genes
Jose M Peña (1), Johan Björkegren (2), Jesper Tegnér (1)
1) Linkoping University, 2) Karolinska Institutet

ABSTRACT: Motivation: For the last few years, Bayesian networks (BNs) have received increasing attention from the computational biology community as models of gene networks, though learning them from gene expression data is problematic: Most gene expression databases contain measurements for thousands of genes, but the existing algorithms for learning BNs from data do not scale to such highdimensional databases. This means that the user has to decide in advance which genes are included in the learning process, typically no more than a few hundreds, and which genes are excluded from it. This is not a trivial decision. We propose an alternative approach to overcome this problem.
Results: We propose a new algorithm for learning BN models of gene networks from gene expression data. Our algorithm receives a seed gene S and a positive integer R from the user, and returns a BN for those genes that depend on S such that less than R other genes mediate the dependency. Our algorithm grows the BN, which initially only contains S, by repeating the following step R+1 times and, then, pruning some genes: Find the parents and children of all the genes in the BN and add them to it. Intuitively, our algorithm provides the user with a window of radius R around S to look at the BN model of a gene network without having to exclude any gene in advance. We prove that our algorithm is correct under the faithfulness assumption. We evaluate our algorithm on simulated and biological data (Rosetta compendium) with satisfactory results.
CONTACT: jmp@ifm.liu.se

OP-38 Reconsidering Complete Search Algorithms for Protein Backbone NMR Assignment
Olga Vitek (1), Chris Bailey Kellogg (2), Bruce Craig (1), Jan Vitek (1)
1) Purdue University, 2) Dartmouth College

ABSTRACT: Motivation: Nuclear magnetic resonance (NMR) spectroscopy is widely used to determine and analyze protein structures. An essential step in NMR studies is determining the backbone resonance assignment, which maps individual atoms to experimentally measured resonance frequencies. Performing assignment is challenging due to the noise and ambiguity in NMR spectra. While automated procedures have been investigated, by-and-large they are still struggling to gain acceptance due to inherent limits in scalability and/or unacceptable levels of assignment error. To have confidence in the results, an algorithm should be complete, i.e., able to identify all solutions consistent with the data, including all arbitrary configurations of extra and missing peaks. The ensuing combinatorial explosion in the space of possible assignments has led to the perception that complete search is hopelessly inefficient and cannot scale to realistic data sets.
Results: This paper presents a complete branch-contract-and-bound search algorithm for backbone resonance assignment. The algorithm controls the search space by hierarchically agglomerating partial assignments and employing statistically sound pruning criteria. It considers all solutions consistent with the data, and uniformly treats all combinations of extra and missing data. We demonstrate our approach on experimental data from 5 proteins ranging in size from 70 to 154 residues. The algorithm assigns over 95% of the positions with over 98% accuracy. We also present results on simulated data from 259 proteins from the RefDB database, ranging in size from 25 to 257 residues. The median computation time for these cases is 1 minute, and the assignment accuracy is over 99%. These results demonstrate that complete search not only has the advantage of guaranteeing fair treatment of all feasible solutions, but is efficient enough to be employed effectively in practice.
Availability: The MBA2 software package is made available under an open-source software license. The data sets featured in the result section can also be obtained from the contact author.
CONTACT: ovitek@stat.purdue.edu

OP-39 RNA Secondary Structural Alignment with Conditional Random Fields
Kengo Sato (1), Yasubumi Sakakibara (1)
1) Keio University, Department of Biosciences and Informatics

ABSTRACT: Motivation: Computational identification of non-coding RNA regions on the genome has much attention to be investigated. However, it is essentially harder than gene-finding problems for protein-coding regions because non-coding RNA sequences do not have strong statistical signals. Since comparative sequence analysis is effective for non-coding RNA detection, efficient computational methods are expected for structural alignment of RNA sequences. Several methods have been proposed to accomplish the structural alignment tasks for RNA sequences, and we found that one of the most important points is to estimate an accurate score matrix for calculating structural alignments.
Results: We propose a novel approach for RNA structural alignment based on Conditional Random Fields (CRFs). Our approach has a specific feature compared with previous methods in the sense that the parameters for structural alignment are estimated such that the model can discriminate between correct alignments and incorrect alignments most likely, and has the generalization ability so that a satisfiable score matrix can be obtained even with a small number of sample data without overfitting. Experimental results clearly show that the parameter estimation with CRFs can outperform all the other existing methods for structural alignments of RNA sequences. Further, structural alignment search based on CRFs is more efficient for predicting non-coding RNA regions than the other scoring methods. These experimental results strongly support our discriminative method employing CRFs to estimate the score matrix parameters.
Availability: The source code of PHMMTS based on CRFs is available at http://phmmts.dna.bio.keio.ac.jp/ under the GNU public licence.
CONTACT: satoken@bio.keio.ac.jp

OP-40 Fast maximum-likelihood refinement of electron microscopy images
Sjors H.W. Scheres (1), Mikel Valle (1), Jose-Maria Carazo (1)
1) Centro Nacional de Biotecnologia - CSIC

ABSTRACT: Motivation: Maximum-likelihood image refinement is a promising candidate to improve attainable resolution limits in 3D-EM. However, its large CPU-requirements may prohibit its application to 3D-structure optimization.
Results: We speeded-up ML image refinement by reducing its search space over the alignment parameters. Application of this reduced-search approach to a cryo-EM dataset yielded practically identical results as the original approach, but in approximately one day instead of one week of CPU.
Availability: This work has been implemented in the publicdomain package Xmipp. Documentation and download instructions may be found at http://www.cnb.uam.es/~bioinfo
CONTACT: scheres@cnb.uam.es

OP-41 A quadratic programming approach for decomposing steady-state metabolic flux distributions onto elementary modes
Jean-Marc Schwartz (1), Minoru Kanehisa (1)
1) Bioinformatics Center, Kyoto University

ABSTRACT: Motivation: Steady-state flux distributions in metabolic networks can be expressed as non-negative combinations of elementary modes. However little understanding has been achieved so far in how individual elementary modes contribute to the reconstruction of actual physiological flux distributions. Results:We introduce an approach for decomposing steady-state flux distributions onto elementary modes based on quadratic programming. The decomposition is performed so as to favour modes that are closest to the actual state of the system, i.e. most relevant for biological interpretation. As an illustration, an application of this approach to a model of yeast glycolysis is presented.
Availability: Software is available upon request from the authors.
CONTACT: jean@kuicr.kyoto-u.ac.jp

OP-42 Web Servicing the Biological Office
Martin Szugat (1), Daniel Güttler (1), Katrin Fundel (1), Florian Sohler (1), Ralf Zimmer (1)
1) Department of Informatics, Ludwig-Maximilians-Universität München

ABSTRACT: Biologists routinely use standard Microsoft Office applications for standard analysis tasks. Despite ubiquitous internet resources, information needed for everyday work is often not directly and seamlessly available. Here we describe a very simple and easily extendable mechanism using Web Services to enrich standard MS Office applications with internet resources. The feature is similar in handling to spell-checker annotations. We demonstrate it with a thesaurus for biological objects, which maps names to database identifiers and vice versa via an appropriate synonym list. It is based on a Web Service providing synonyms and database identifiers for biological objects. The client application ProTag makes these features available in MS Office applications using Smart Tags and Add-Ins. This increases user efficiency and comfort and could also improve the currently awkward situation of naming biological objects. The features proposed here are directly available with only minor installation and learning efforts.
Availability: http://services.bio.ifi.lmu.de/prothesaurus/
CONTACT: Martin.Szugat@GMX.net

Back to Session Index