ECCB

Sessions

Thursday
September 29, 2005

AREA 4: GENES AND GENOMES
Sponsorized by: Silicon Graphics
Area chairs: Mar Alba and Pierre Rouzé

OP-15 Building Fragment Assembly String Graphs
Gene Myers, UC Berkeley

ABSTRACT:
We present a concept and formalism, the string graph, that represents all that is inferable about a DNA sequence from a collection of shotgun sequencing reads collected from it. We give time and space efficient algorithms for constructing a string graph given the collection of overlaps between the reads and in particular, present a novel linear expected time algorithm for transitive reduction in this context. The result demonstrates that the decomposition of reads into k-mers employed in the de Bruijn graph approach of Pevzner et al is not essential and in fact creates both efficiency problems and unecessary conceptual complexities. The current paper is the first in a series and presents the basic algorithm and preliminary results that demonstrate the efficiency and scalability of the method. The result is a step toward a next-generation whole genome shotgun assembler that will easily scale to mammalian genomes.
CONTACT: gene@eecs.berkeley.edu

OP-16 Finding Exact Optimal Motif in Matrix Representation by Partitioning
Henry Leung (1), Francis Chin (1)
1) The University of Hong Kong

ABSTRACT: Motivation: Finding common pattern, motif, in the promoter regions of coexpressed genes is an important problem in bioinformatics. A common representation of the motif is by probability matrix or PSSM (Position Specific Scoring Matrix). However, even for a motif of length 6 or 7, there is no algorithm that can guarantee finding the exact optimal matrix from an infinite number of possible matrices.
Results: This paper introduces the first algorithm called EOMM for finding the exact optimal matrix-represented motif, or simply optimal motif. Based on branch-and-bound search by partitioning the solution space recursively, EOMM can find the optimal motif of size of up to 8 or 9, and a motif of larger size with any desired accuracy on the principle that the smaller the error bound, the longer the running time. Experiments show that for some real and simulated data sets, EOMM finds the motif with very weak signals when the existing software, such as MEME, MITRA-PSSM, fails to do so.
CONTACT: cmleung2@cs.hku.hk

OP-17 Prediction of regulatory modules comprising microRNAs and target genes
Sungroh Yoon (1), Giovanni De Micheli (1)
1) Stanford University, nanni@stanford.edu

ABSTRACT: Motivation: MicroRNAs are small endogenous RNAs that can play important regulatory roles via the RNA-interference pathway by targeting mRNAs for cleavage or translational repression. We propose a computational method to predict miRNA regulatory modules or groups of miRNAs and target genes that are believed to participate cooperatively in posttranscriptional gene regulation.
Results: We tested our method with the human genes and miRNAs, predicting 431 miRNA regulatory modules. We analyze a module with genes: BTG2, WT1, PPM1D, PAK7 and RAB9B, and miRNAs: miR-15a and miR-16. Review of the literature and annotation with Gene Ontology terms reveal that the roles of these genes can indeed be closely related in specific biological processes, such as gene regulation involved in breast, renal, and prostate cancers. Furthermore, it has been reported that miR-15a and miR-16 are deleted together in certain types of cancer, suggesting a possible connection between these miRNAs and cancers. Given that most known functionalities of miRNAs are related to negative gene regulation, extending our approach and exploiting the insight thus obtained may provide clues to achieving practical accuracy in the reverse-engineering of gene regulatory networks.
Availability: A list of predicted modules is available upon request.
CONTACT: sryoon@stanford.edu

OP-18 Computational Discovery of Transcriptional Regulatory Rules
Tho Hoan Pham (1), Jose Carlos Clemente (1), Kenji Satou (1) (2), Tu Bao Ho (1) (2)
1) Japan Advanced Institute of Science and Technology, 2) Institute for Bioinformatics Research and Development (BIRD)

ABSTRACT: Motivation: Even in a simple organism like yeast S. cerevisiae, Transcription is an extremely complex process. The expression of sets of genes can be turned on or off by binding of specific transcription factors to the promoter regions of genes. Experimental and computational approaches have been proposed to establish mappings of DNA-binding locations of transcription factors. However, while location data obtained from experimental methods is noisy due to imperfections in the measuring methods, computational approaches suffer from over-prediction problems due to the short length of the sequence motifs bound by the transcription factors. Also, these interactions are usually environment-dependent: many regulators only bind to the promoter region of genes under specific environmental conditions. Even more, the presence of regulators at a promoter region indicates binding but not necessarily function: the regulator may act positively, negatively or not act at all. Therefore, identifying true and functional interactions between transcription factors and genes in specific environment conditions and describing the relationship between them are still open problems.
Results: We developed a method that combines expression data with genomic location information to discover (1) relevant transcription factors from the set of potential transcription factors of a target gene; and (2) the relationship between the expression behavior of a target gene and that of its relevant transcription factors. Our method is based on rule induction, a machine learning technique that can efficiently deal with noisy domains. When applied to genomic location data with a confidence criterion relaxed to p-value=0.005, and three different expression datasets of yeast S. cerevisiae, we obtained a set of regulatory rules describing the relationship between the expression behavior of a specific target gene and that of its relevant transcription factors. The resulting rules provide strong evidence of true positive gene-regulator interactions, as well as of protein-protein interactions that could serve to identify transcription complexes.
Availability: Supplementary files are available from http://www.jaist.ac.jp/˜h-pham/regulatory-rules
CONTACT: h-pham@jaist.ac.jp

Back to Session Index