Mining Structured Data in Bioinformatics
Prof. Stefan Kramer
The topic of the tutorial is mining in structured data. This is particularly
relevant for data mining applications in bioinformatics, since the majority
of biological data is not kept in databases consisting of a single, flat
table. Instead, we are frequently dealing with databases of structured
and linked objects. In other words, the objects in bioinformatics
databases often have a rich internal structure and are connected by some
relation. (Consider, for instance, databases of proteins, small molecules,
metabolic and regulatory networks, text databases, etc.) The tutorial
will give an overview of data mining techniques for sequences, trees,
graphs and relational databases. We will present techniques for both descriptive
and predictive data mining in this context. In descriptive mining, we
are looking for local patterns to characterize the data. In predictive
mining, we are looking for models that can be used to make predictions
for new, unseen cases. Along the two dimensions (types of data and predictive/descriptive),
the tutorial is organized as follows: the first two parts of the tutorial
are devoted to descriptive mining in databases of itemsets, strings and
sequences, trees, graphs and relational databases. The third part of the
tutorial deals with predictive mining based on propositionalization (i.e.,
feature construction using patterns), instance-based learning and kernel
methods for graph and relational databases.
Synopsis
|