|
|
|
Abstract
This session will present issues and advances in the use of computational models to predict the biological activity of chemicals from their physico-chemical features. We examine how the molecular features of chemicals are derived using new computational techniques, and present advances in the evaluation of biologic endpoints. A new approach is presented for developing an expert system for the prediction of carcinogenicity using short term follow-up data. A simulation-based scheme for assessing the "goodness" of different modeling approaches is described. New structure-activity relationship (SAR) modeling approaches based on model ensembles are presented and evaluated.
Organizers
Vincent C. Arena
University of Pittsburgh
Department of Biostatistics
318 Parran Hall
Pittsburgh, PA 15261
USA
Phone: [412]624-5383
Fax: [412]624-2183
Email: arena@pitt.edu
Sati Mazumdar
University of Pittsburgh
Department of Biostatistics
306 Parran Hall
Pittsburgh, PA 15261
USA
Phone: [412]624-3028
Fax: [412]624-2183
Email: maz1@pitt.edu
Chairperson
Sati Mazumdar
University of Pittsburgh
Introduction and Overview
Sati Mazumdar
University of Pittsburgh
Computational models: How do we know if they are any good?
Vincent C. Arena
Wei Li
Sati Mazumdar
Nancy B. Sussman
presenter
Vincent C. Arena
University of Pittsburgh
Abstract
Computational models are used in human health risk assessment to predict the biological activity of potentially health-threatening agents. Structure-activity relationships (SAR) that relate chemical structure to biological activity are examples of such models. Once a model is developed, it is used to predict the activity of new chemicals. The success or failure of an SAR model lies in its ability to correctly predict the true activity of a chemical. The validation of an SAR model is traditionally based on a single empirical data set where the prediction measures may only be germane to the individual data set and do not represent the true performance of the SAR modeling approach. This paper presents a simulation-based scheme to address this problem. Data are simulated, mimicking chemicals with a set of physico-chemical features and biological status (active or inactive) spanning over a broad range of different associations between covariates and outcome with varying signal strengths that might potentially be found in real world data. A validation procedure is then used to estimate the prediction measures of an SAR model. By repeating this process, the empirical distributions of the prediction measures are obtained which provide information about the "goodness" of the model. Prediction measures used in this paper are sensitivity, specificity, and accuracy. The simulation-based scheme allows various forms for the underlying relationship between physico-chemical features and biological activity. Thus, the prediction performance of a modeling approach is evaluated under a broad range of data and not just that found in a single empirical data set. We illustrate this scheme for SAR modeling approaches (e.g., traditional, bagging, and hierarchal Bayes modeling) and different validation procedures (e.g., train-and-test, k-fold cross validation) with flow diagrams.
Chemistry in Silico as a tool for hazard assessment and drug discovery: Prospects and problems
Subhash C. Basak
Denise Mills
Brian D. Gute
Gregory D. Grunwald
presenter
Subhash C. Basak
University of Minnesota Duluth
Natural Resources Research Institute
Department of Biochemistry and Molecular Biology
School of Medicine
University of Minnesota - Duluth
5013 Miller Trunk Highway
Duluth, MN 55811
Phone: [218]720?4230
Fax: [218]720?4328
Email: sbasak@wyle.nrri.umn.edu
Abstract
Both rational drug design and hazard assessment of environmental pollutants require the input of a many physicochemical, biomedicinal and toxicological properties of large numbers of chemicals in the various decision making steps. Unfortunately, most of the candidate chemicals have very little or no laboratory data prerequisite to their proper evaluation. Modern combinatorial chemistry is quickly producing large libraries of real or virtual chemicals for which almost no property is known except their molecular structure. How does the drug designer or risk assessor operate under such a data-poor situation?
One solution to this quagmire has been the use of calculated properties estimated from the molecular structure of chemicals instead of their experimental data. Molecular descriptors can be calculated using different formalisms of representation of chemical structure and their quantification using diverse methods. Such in silico techniques lead to the development of quantitative structure-property/activity relationship (QSPR/ QSAR) models, which lead to reasonable estimates of properties of chemicals.
In this presentation, problems related to the representation and characterization of molecular structure using some well known methods will be discussed along with a critical evaluation of their utility and limitations.
Computational Predictive System for Rodent Organ-Specific Carcinogenicity
J. F.Young
W. Tong
H. Fang
R.D. Beger
J.J. Chen
M.A. Cheeseman
R.L Kodell
presenter
Ralph L. Kodell
Division of Biometry and Risk Assessment, HFT-20
National Center for Toxicological Research
3900 NCTR Road
Jefferson, AR 72079 USA
Email: rkodell@nctr.fda.gov
Phone: 870-543-7008
Fax: 870-543-7662
Abstract
A new approach is proposed for developing an expert system for prediction of organ-specific rodent carcinogenicity by applying structure activity relationships (SAR) in conjunction with data on short-term toxicity tests (STT) and nuclear magnetic resonance ( 13 C-NMR) spectroscopy. The set of chemicals used to develop the system is a set of 1298 chemicals contained in the Carcinogenic Potency Database (CPDB) maintained on the Internet by the National Institute of Environmental Health Sciences. For all chemicals in this database, additional input for each chemical will include standard SAR features, 2D and 3D structural data, short-term test data, and NMR data. Because both short- term toxicity tests and rodent carcinogenicity bioassays are subject to experimental variation and subjective interpretation, it is recognized that the maximum achievable concordance level of any predictive system may be somewhat below 100%. For this reason, great emphasis is placed on having the best toxicological data possible for both input and output. Chemicals will be divided into three groups: Group 1 will include all chemicals showing carcinogenicity of a specific organ (e.g., liver); Group 2 will include all chemicals showing carcinogenicity in some organ, but not the specific organ of interest; Group 3 will include all chemicals showing no carcinogenicity in any organ. At least two predictive models will be developed. One model, the organ-specific model, will seek to distinguish chemicals in Group 1 from chemicals in Groups 2 and 3. A second model, which will follow the usual approach to predicting carcinogenicity, will seek to distinguish chemicals in Groups 1 and 2 from chemicals in Group 3. Comparisons between the two models will demonstrate to what extent an organ-specific model is more predictive than a model that predicts "generic" carcinogenicity. Once this initial analysis is complete, the database will be re-divided to study another type of organ-specific carcinogenicity (e.g., kidney) and additional models will be developed.
Decision Tree (CART) SAR Models for Developmental Toxicity Based on the FDA/TERIS Database
Nancy B. Sussman
Shui Yu,
Vincent C. Arena
Sati Mazumdar
B. Padmakumari Thampatty
presenter
Sati Mazumdar
University of Pittsburgh
Abstract
Humans are exposed to thousands of environmental chemicals for which no toxicity information is available. Developmental effects are among the health risks of greatest concern with regard to these untested chemicals. Structure-activity relationships (SARs) are computational models that could be used to predict the biological activity of potential developmental toxicants. However, at this time, no adequate SAR models of developmental toxicity are available for risk assessment. Thus there is a need for developmental SAR model research. In the present study, a developmental database was compiled by combining toxicity information from the Teratogen Information System (TERIS) and the Food and Drug Administration (FDA) guidelines. We implemented a nonparametric decision tree modeling procedure using Classification and Regression Tree (CART) and a model ensemble approach termed bagging. We then assessed the empirical distributions of the prediction measures of the bagging approach compared to those of a single model. CART developmental SAR models exhibited modest prediction accuracy. Bagging tended to enhance the prediction, particularly with regard to sensitivity, but caused some reduction in specificity. Also, the model ensemble approach reduced the variability of prediction measures compared to the single model approach. Further research with CART and bagging has the potential to derive developmental SAR models that would be useful in health risk assessment.
Co-chair
Marjan Družovec
University of Maribor
Faculty of Mechanical Engineering
Smetanova 17, Maribor, Slovenia
marjan.druzovec@uni-mb.si
phone +386 2 2207 582
fax: +386 2 2207 990
Purpose of the Session
Discovering knowledge using data mining approaches in a given area requires a skilled miner with good domain knowledge. In this session, we will discuss problems of discovering knowledge and applying it in the domain of medicine, where the results are very promising and important.
Co-chair
Boštjan Brumen
University of Maribor
Faculty of Electrical Engineering and Computer Science
Smetanova 17, Maribor, Slovenia
brumen@uni-mb.si
phone +386 2 2355 129
fax: +386 2 2355 134
Purpose of the Session
Knowledge discovery in databases (KDD) is a complex process of extracting and representing useful information from data. Data mining constitutes a very important subtask in the overall KDD process. Mining experiments may be tried in many variations, using many approaches and many techniques to solve a single problem. For this reason, no universal best approach is describable for data mining; making good decisions is based on both, scientific methods used and the background knowledge of humans.
During our session, we will concentrate on algorithms and different techniques for data mining, mostly with basic problems of data mining algorithms.
The list of papers
Purpose
Explore possible applications of the positive use of human perception
Abstract
Many computer systems developed for people
require in the last phase us to evaluate the performance of the system or
to judge if the system is really valid or not.
These evaluations are often done through human perception.
We do it by seeing the output, by observing the behavior, or by just feeling
the system in some way.
If so, using our perceptual evaluation in the middle of
constructing such a system might accelerate the leaning speed and
even improve the performance.
In this session, we collect possible theories and applications
as long as it is about positive use of human perception.
This session includes