Bioinformatics
Bioinformatics is a rapidly expanding field driven by the increasing
availability of massive genomic data and the research perspectives
of molecular biology and genetics. Also it
is well known that Bioinformatics is an integration of
mathematical, statistical and computer methods to analyze
biological, biochemical and biophysical
data. Our research strives to explore the
various aspects of bioinformatics, with tools of mathematics and
computer science.
On one hand, we are interested in data driven problems in the
bioinformatics. In particular, we intend to apply the tools
and techniques of mathematical modeling
and computer science to depict and
abstract the problem at hand. This process is guided by the
properties of real biological and
experimental knowledge and the elementary
analysis of the experimental data. We aim to discover the simple
essence hidden in the complex biological
phenomena. Ultimately, doing this well
depends on understanding of the problems under investigation, as it
is the standard by which each algorithmic
and modeling decision must be evaluated.
As the second step the constructed mathematical model is
carefully analyzed to find the special structure or
properties. Then efficient algorithm is
designed to explore the solution space. After
bechmarking the algorithm by simulated examples, the
algorithm is utilized to find biological
meaningful solution for real experimental
data. According to the obtained results and the suggestions and
comments from biologist, We will re-visit
the mathematical modeling and revise the
model by incorporating more concrete considerations from biological
view. In such a way, we expect to find the optimal
integration for biological problems and
mathematical methods.
On the other hand, the biological problems will boost the research
works on mathematical methods also. For
example, they will generate various needs
and provide new challenge for the mathematical modeling and
algorithm designing. We focus on this feedback process to
develop efficient and effective
techniques. We believe better mutually
interactions between problem and methodology will lead to better
production both in theory and practice.
Protein
Structure Comparison: One of those
topics is structure comparison problem, which can provide the
relationship or similarity between structures.
We proposed a novel method to
compare the protein structures in an accurate and
efficient manner. With such a method, we can reveal divergent
evolution, identify circular
permutations and further detect
active-sites besides high-quality structure alignment.
Specifically, we define the
structure alignment as a multi-objective
optimization problem, i.e., maximizing the number of aligned atoms
and
minimizing their root mean square distance. By controlling a single
distance-related parameter, theoretically we can obtain a variety of
optimal alignments corresponding to
different optimal matching patterns, i.e., from a
large matching portion to a small matching
portion. The number of variables in our algorithm increases
with the number of atoms of protein pairs in almost a linear manner.
In addition to solid theoretical background, numerical
experiments demonstrated significant
improvement of our approach over the existing methods in terms of
quality and
efficiency. In particular, we show that divergent evolution,
circular permutations and
active-sites (or structural motifs) can be identified by our
method. The software SAMO is available in http://www.aporc.org/doc/wiki/Samo.
Automatic Protein Structure Classification:
We
also work the automatic protein structure classification problem
in the machine learning framework. Specifically, the novel patterns
based on convex hull representation are
firstly extracted from a protein structure to
depict the relationship between structure and function, then
the classification
system is constructed by training and predicting by machine learning
methods such as neural networks,
hidden Markov models, and support vector machines. The
CATH protein structure classification scheme is highlighted
in the
classification experiments. The results indicate that the proposed
supervised classification scheme is able to provide useful
information on structure relationships.
(1). Protein
structure analysis and function prediction
(2). Identifying the structural motifs of proteins
(3). Identifying protein by Mass Spectrometry
(4). Inferring protein-protein interaction network with multi-domain
interactions
|