research

 

 

Bioinformatics

  • Research Philosophy

Bioinformatics is a rapidly expanding field driven by the increasing availability of massive genomic data and the research perspectives of molecular biology and genetics. Also it is well known that Bioinformatics is an integration of
mathematical, statistical and computer methods to analyze biological, biochemical and biophysical data. Our research strives to explore the various aspects of bioinformatics, with tools of mathematics and computer science.

On one hand, we are interested in data driven problems in the bioinformatics. In particular, we intend to apply the tools and techniques of mathematical modeling and computer science to depict and abstract the problem at hand. This process is guided by the properties of real biological and experimental knowledge and the elementary analysis of the experimental data. We aim to discover the simple essence hidden in the complex biological phenomena. Ultimately, doing this well depends on understanding of the problems under investigation, as it is the standard by which each algorithmic and modeling decision must be evaluated. As the second step the constructed mathematical model is carefully analyzed to find the special structure or properties. Then efficient algorithm is designed to explore the solution space. After bechmarking the algorithm by simulated examples, the algorithm is utilized to find biological meaningful solution for real experimental data. According to the obtained results and the suggestions and comments from biologist, We will re-visit the mathematical modeling and revise the model by incorporating more concrete considerations from biological view. In such a way, we expect to find the optimal integration for biological problems and mathematical methods.

On the other hand, the biological problems will boost the research works on mathematical methods also. For example, they will generate various needs and provide new challenge for the mathematical modeling and algorithm designing. We focus on this feedback process to develop efficient and effective techniques. We believe better mutually
interactions between problem and methodology will lead to better production both in theory and practice.

 

  • Current Research

Protein Structure Comparison: One of those topics is structure comparison problem, which can provide the relationship or similarity between structures.  We proposed a novel method to compare the protein structures in an accurate and efficient manner. With such a method, we can reveal divergent evolution,  identify circular permutations and further detect active-sites besides high-quality structure alignment.  Specifically, we define the structure alignment as a multi-objective optimization problem, i.e., maximizing the number of aligned atoms and
minimizing their root mean square distance. By controlling a single distance-related parameter, theoretically we can obtain a variety of optimal alignments corresponding to different optimal matching patterns, i.e., from a
large matching portion to a small matching portion. The number of variables in our algorithm  increases with the number of atoms of protein pairs in almost a linear manner. In addition to solid theoretical background, numerical experiments demonstrated significant improvement of our approach over the existing methods in terms of  quality and efficiency. In particular, we show that divergent evolution, circular  permutations and active-sites (or structural motifs) can be identified by our method. The software SAMO is available in  http://www.aporc.org/doc/wiki/Samo.


Automatic Protein Structure Classification:
We also work the automatic protein structure classification problem
in the machine learning framework. Specifically, the novel patterns based on convex hull representation are firstly extracted from a protein structure to depict the relationship between structure and function, then the classification
system is constructed by training and predicting by machine learning methods  such as neural networks, hidden Markov models, and support vector machines. The CATH protein structure classification scheme is highlighted in the
classification experiments. The results indicate that the proposed supervised classification scheme is able to provide useful information on structure relationships.

 

  • Future Research                  

(1). Protein structure analysis and function prediction
(2). Identifying the structural motifs of proteins
(3). Identifying protein by Mass Spectrometry
(4). Inferring protein-protein interaction network with multi-domain interactions