Seminar on
Applied Mathematics
PROGRAM
Sreda, 22.08.2007. u 14:15, Sala 2, MI SANU:
Zoran Obradovic, Director of the Center for Information Science and Technology, Temple University
DATA MINING APPROACH TO FUNCTIONAL CHARACTERIZATION OF
PROTEIN DISORDER
Abstract:
About 10 years ago we developed a prediction based method to show that
thousands of proteins lack fixed structure or are disordered (or unfolded)
under physiological conditions. This revolutionary discovery is followed by
many efforts aimed to development of improved protein disorder predictors
including three community-wide biannual competitions (CASP 5,6,7). In this
talk we will first briefly present our most recent predictor rated as the
best model in disorder category at the seventh critical assessment of
structure prediction experiments (CASP7, Nov. 26-30, 2006).
Next, we will describe how we used this predictor to develop a data mining
method that provided a leap jump in answering a challenging question of
understanding relationship between protein disorder and protein function. In
our approach a statistical evaluation is employed to rank the significance
of correlations of functions in the SwissProt database of about 200,000
sequences with intrinsic disorder. In our analysis protein sequence data
redundancy and the relationship between protein length and protein structure
were taken into account. Overall, out of the 710 Swiss-Prot functional
keywords that were each associated with at least 20 proteins, 238 were found
to be strongly positively correlated with predicted long intrinsically
disordered regions, whereas 302 were strongly negatively correlated with
such regions. The results of this large scale data mining based analysis
agree well with smaller extremely costly studies performed by manual
curation of biomedical experts.
Finally, we will discuss our machine learning approach to the problems of
identifying biomedical publications with relevant protein-disorder related
experimental evidence from MEDLINE, a major biomedical repository collecting
millions of articles from different journals. This learning task is
challenged by richness of biomedical terminology, diversity of experimental
evidence expressions and a small number of annotated articles that can be
used as labeled examples for training. We will describe our novel substring
construction algorithm which derives attributes from semantically related
terms with shared stems or morphemes. In this data-driven approach we used
Wilcoxon Rank-Sum test for feature selection followed by developing support
vector machine (SVM) and Naive Bayes methods for classification. In addition
to re-ranking retrieved literature based on their relevance to protein
disorder, we have also successfully applied the new algorithm to five
post-translational modification datasets where curators confirm that the
selected substrings are consistent with their much more costly manual
annotation.
The reported results were obtained through a collaboration with A.K.
Dunker, B. Han, Z. Hu, K. Peng, P. Radivojac, V. Uversky, S. Vucetic, X.
Xie, and C.H. Wu and were published in 2006 and 2007 at MBC Bioinformatics
7(1) 208, Bioinformatics 22(23):2876-82 and J. Protein Research
May;6(5):1882-1932.
Predavac ce odrzati jos jedno predavanje, u petak, 24-tog u 12, na ETF-u u sali 61.
DATA MINING SUPPORT FOR AEROSOL OPTICAL DEPTH RETRIEVAL AND ANALYSIS
Abstract: Aerosol Optical Depth (AOD) indicates the amount of depletion that a beam of
radiation undergoes as it passes through the atmosphere.
One of the biggest challenges of current climate research is to characterize
and quantify the effect of AOD on the Earth's radiation budget.
We will describe a novel data mining method for improving AOD prediction or
so called retrieval accuracy based on training neural networks
that take advantage of high resolution satellite observations and collocated
high quality ground based measurements. The experimental results
obtained using thousands of observations over the entire globe suggest that
ensembles of neural networks are more accurate than the operational
MODIS AOD retrieval algorithm. Our study of differences between neural
networks and the MODIS algorithm over the continental United States also
revealed information that can help improve quality of the MODIS algorithm.
The reported results were obtained through a collaboration with Bo Han,
Zhanqing Li, Wen Mi, Vladan Radosavljevic and Slobodan Vucetic funded by
NSF IIS-0612149 research grant.
Zoran Obradovic is Director at the Center for Information Science and
Technology and a Professor of Computer and Information Sciences at Temple
University. His research interests focus on developing data mining and
statistical learning technology for an efficient knowledge discovery at
large databases. Funded by NSF, NIH, NIJ, DOE and industry he contributed to
about 190 refereed articles on these and related topics and to several
academic and commercial software systems.For more details see
www.ist.temple.edu/~zoran.
RUKOVODIOCI SEMINARA
Vera Kovačević-Vujčić
Milan Dražić