Seminar on Applied Mathematics
Vir V. Phoha, Louisiana Tech University, SAD
Dynamic Fusion for Rare Event Detection in Computer Networks
Abstract. The talk consists of two parts: the first part gives a brief overview of the Center for Secure Cyberspace (CSC), its. infrastructure support for research, and a brief overview of some ongoing research projects. The second part consists of a description of a cascading algorithm for detecting rare events that fuses decisions dynamically. An abstract of the dynamic cascading algorithm follows. Fusion of multiple classifier systems can result in enhanced accuracy and performance. Traditional classifiers generally do not work well with data spaces having class skew, that is instances of one class overwhelm the instance of another class. To overcome the class skew problem for detection of rare events, we present .K-Means+ID3,. a method to cascade k-Means clustering and the ID3 decision tree learning methods for classifying anomalous and normal activities in a computer network systems. The k-Means clustering method first partitions the training instances into k clusters using Euclidean distance similarity. On each cluster, representing a density region of normal or anomaly instances, we build an ID3 decision tree. The decision tree on each cluster refines the decision boundaries by learning the subgroups within the cluster. To obtain a final decision on classification, the decisions of the k-Means and ID3 methods are combined using two rules: 1) the Nearest-neighbor rule and 2) the Nearestconsensus rule. Results show that the detection accuracy of the K-Means+ID3 method is as high as 96.24 percent at a false-positive-rate of 0.03 percent on network anomaly data.
Zoran Obradovic, Temple University i MI SANU
Sequence Alignment and Structural Disorder: A Substitution Matrix for an Extended Alphabet
Abstract. In protein sequence alignment algorithms, a substitution matrix of 20x20 alignment parameters is used to describe the rates of amino acid substitutions over time. Development and evaluation of most substitution matrices including the BLOSUM family was based almost entirely on fully structured proteins. Structurally disordered proteins (i.e. proteins that lack structure, either in part or as a whole) that have been shown to be very common in nature have a significantly different amino acid composition than ordered (i.e. structured) proteins. Furthermore, the sequence evolution rate is higher in unstructured as compared to structured regions of proteins containing both structured and unstructured regions. These results cast doubt on appropriateness of the BLOSUM substitution matrices for alignment of structurally disordered proteins. To address this problem, we take into the account the concept of structural disorder by extending the alphabet for sequence representation to 2x20=40 symbols, 20 for amino acids in disordered regions and 20 for amino acids in ordered regions. A 40x40 substitution matrix is required for alignment of sequences represented in the extended alphabet. Such an expanded matrix contains 20x20 submatrices that correspond to matching ordered-ordered, ordered-disordered, and disordered-disordered pairs of residues. In this talk we will describe an iterative procedure that we used to estimate such a 40x40 substitution matrix. The iterative procedure converged with stable results with respect to the choice of the sequences in the dataset. In the obtained 40x40 matrix we found substantial differences between the 20x20 submatrices corresponding to ordered-ordered, ordered-disordered, and disordered-disordered region matching. These differences provide evidence that for alignment of protein sequences that contain disordered segments, the discovered substitution matrix is more appropriate than the BLOSUM substitution matrices. At the same time, the new substitution matrix is applicable for sequence alignment of fully ordered proteins as its order-order submatrix is very similar to a BLOSUM matrix.