Applying the Hidden Markov Model Methodology for Unsupervised Learning of Temporal Data
Speaker: Gautam Biswas
email:
biswas@vuse.vanderbilt.edu
http://www.vuse.vanderbilt.edu/~biswas

The Hidden Markov Modeling (HMM) methodology has been successfully employed in the modeling and analysis of temporal processes. In particular, a number of impressive applications have been made in the field of speech recognition and synthesis. Given large speech corpora of word pronunciations, one or more HMMs have been employed for representing words and phrases. The speech recognition task involves matching actual pronunciations against the collection of HMMs, much like using finite state automata to track a sequence of symbols. For these applications, language experts use their knowledge to handcraft the HMM structures. The HMM learning problem is then a HMM parameter estimation process. A well known HMM parameter estimation process is the Baum-Welch process. It is a expectation maximization process, often works in conjunction with the Viterbi parameter initialization method. In our work, we have extended the application of the HMM methodology to modeling of dynamic processes in situations where sufficient domain knowledge may not be available to define model structure. Dynamic processes are characterized by time-varying features, i.e., variable values describing system behavior can change significantly over time. Our goals is to develop models that help in understanding the underlying phenomena governing dynamic system behavior, and use this information for decision making and problem solving tasks. HMM models provide a compact discrete-event representation for temporally evolving behavior, and the states of a HMM effectively model the set of potentially valid states in a dynamic process. Data collected from real world systems may include diverse phenomena, therefore, model building has to be preceded by partitioning the data objects into sets of homogeneous groups. The focus of this talk is on developing unsupervised classification or clustering, techniques to automatically par tition temporal data into homogeneous groups, and to construct HMM models for each group. We propose a Bayesian methodology with the HMM representation to drive the clustering process. Our proposed methodology improves upon existing HMM clustering methods in two ways: (i) an explicit HMM model size selection procedure is incorporated into the clustering process, i.e., the sizes of the individual HMMs are dynamically determined for each cluster. This improves the interpretability of cluster models, and the quality of the final clustering partition results, and (ii) a partition selection method to ensure an objective, data-driven selection of the number of clusters in the partition. The result is a simplified heuristic sequential search control algorithm that is computationally feasible. Our experiments with artificially generated data have show that the HMM model size selection algorithm is effective in re-discovering the structure of the generating HMMs, the HMM clustering with model size selection significantly outperforms HMM clustering using uniform HMM model sizes for re-discovering clustering partition structures, and the algorithm is not sensitive to data skewness. In addition, we demonstrate empirical results on the stability of the clustering algorithm, and data sufficiency issues related to HMM learning. More recently we have begun experimenting with real world data sets. The talk will conclude with a presentation of preliminary results on modeling of real data, their applications, and directions for future research in this area.


Advances in Applied Statistical Pattern Recognition
Speaker: Andrew Webb
Defence Research Agency, Malvern, UK
email:
webb@signal.dera.gov.uk

Research into techniques for combining outputs from different sensors, opinions from different experts and combining sensor and collateral information has received some considerable effort in recent years. A consistent approach to the 'fusion' of such disparate sources of data is provided by the Bayesian paradigm. Ambiguities in sensor outputs, sensor failure, noisy and correlated data, mixed numeric and symbolic data, and prior knowledge are handled in a consistent manner. A major difficulty of the Bayesian approach is that it is computationally costly for large problems. This talk will review two approaches to 'data fusion' that incorporate Bayesian principles without inheriting the full computational cost of the Bayesian approach, namely Bayesian networks and self-organising systems. The particle filter and stochastic vector quantisation will be introduced. Applications in engine monitoring, clutter suppression, tracking, and monitoring semiconductor growth, amongst others, will be described.


Model Selection - Computational Methods and Applications
Speaker: Stephen Roberts
www.robots.ox.ac.uk/~sjrob/

Data driven machine learning and all computational data analysis is performed under a model. It is clear that the choice of model, both in terms of its architecture and, importantly, its complexity, effect the analysis results. How then can models be evaluated and the appropriate model complexity inferred? Although many partially successful heuristic approaches exist, the overarching framework of Bayesian learning offers an elegant and principled methodology in which many alternative methods may be seen as special or limit cases. This talk will review the principles behind model selection paradigms and offer a series of case studies.


Latent Semantic Pursuit: Information Retrieval via Projection Pursuit
Speaker:Christian Posse
Principal Research and Product Development Manager
KangarooNet Inc.
255 Shoreline Drive, Suite 103
Redwood Shores, CA 94065, USA
email:
christian@kangaroonet.com

A common approach to information retrieval consists of literally matching terms in documents with those of a query. This strategy may retrieve irrelevant or inaccurate documents due to many ways of expressing a specific concept (synonymy) or to the fact that many words have multiple meanings (polysemy). A remedy consists of retrieving information based on the meaning of a document (and a query). A successful example of this approach is Latent Semantic Indexing (LSI). LSI extracts the latent or underlying concepts in word usage that are obscured by variability in word choice via Singular Value Decomposition (SVD).

In this talk we introduce Latent Semantic Pursuit (LSP). LSP produces latent concepts via Projection Pursuit, which has better feature extraction capabilities than SVD. LSP improves on LSI in two directions. It provides better recall/precision performances and significant storage reduction (which implies significant lower query time).


Recursive Training of Neural Networks for Pattern Recognition
Speaker:Mayer Aladjem
email:
aladjem@ee.bgu.ac.il
www.ee.bgu.ac.il/~aladjem

The training of neural networks for pattern recognition is carried out by minimizing an error function which allows the outputs of the network to represent classification functions. We are interested in error functions which are highly nonlinear with respect to the adjustable weights of the networks. Such objectives appear in the multi-layer networks for classification and in the networks for non-parametric linear discriminant analysis. In these cases the training must be carried out by an iterative optimization algorithm. The primary goal is to find the global minimum of the error function. By a naive use of a training algorithm (a local minimizer of the error function) the computed value for the observed minimum can be merely a local minimum. The solution depends strongly on the starting point of the local optimizer. This talk will discuss our recursive method for searching for several small local minima of the error functions. It is not a global minimization method, but rather a tool for escaping from a minimum already found and directing the local optimizer to a new solution. The results and analysis of the experiments with linear and non-linear classification functions and comparative studies of other methods for the minimization of error functions will be presented.


Copyright © 2001, International Computing Sciences Conventions
webmaster
last update: May 11, 2001