Gathering in an informal setting, workshop participants had the opportunity to meet and discuss selected technical topics in an atmosphere which fosters the active exchange of ideas among researchers and practitioners. To encourage interaction and a broad exchange of ideas, the workshop was kept small.
The topic of the workshop were computational methods for intelligent data analysis aimed at narrowing the gap between data gathering and data comprehension, as well as their applications in medicine and pharmacology.
This paper outlines the methodologies that can be used to perform a distributed intelligent data analysis in a telemedicine system for diabetic patients management. We present a decision-support system architecture based on two modules, a Patient Unit and a Medical Unit, connected by telecommunication services. We outline how the two modules can cooperatively interpret the data by resorting to temporal abstraction techniques combined with time series analysis.
We present Ptah, a system for supporting medical doctors in making decisions related to the therapy of nosocomial (hospital-acquired) infections. The system is based on a chronologically organized database of infections and therapies. It facilitates four types of analyses related to the effectiveness of antibiotics and resistance of bacteria to antibiotics. The underlying methods construct time series of resistance vectors from the database, and present their results graphically.
Diagnosis is often considered as a classification task. This assumes that sufficient relevant information (symptom values) is already available. Related CBR approaches perform a match of the new ``problem'' (e.g. described by a symptom vector) to similar cases from the case base. The ``solutions'' (diagnosis) contained in the found cases are proposed as possible solution of the new case. This is mainly directed to ``standard'' solutions.
More subtle diagnosis support is needed for non-standard, ``unexpected'' problems. The experience with such cases is an important skill of experts. Cases of this kind are usually not described by the common predefined categories, and the diagnosis process is more complicated. Nevertheless, it can be guided by previous cases to get hints concerning feasible tests and therapies, together with expected results. Previous cases are used for argumentations in complicated situations.
Cases from Urology are investigated. As a flexible technique, Case Retrieval Nets are used.
Most pruning methods for decision trees minimize a classification error rate. In uncertain domains, some sub-trees which don't lessen the error rate can be relevant to point out some populations of specific interest or to give a representation of a large data file. We propose a pruning method where we build a new attribute binding the root of a tree with its leaves, each value of this attribute corresponding to a branch leading to a leaf. It permits computation of the global quality of a tree. The best sub-tree for pruning is the one that yields the highest quality pruned tree. This pruning method is not tied to the use of the pruned tree as a classifier. The graphical representation of the global quality index as a function of the number of pruned sub-trees allow the selection of a few trees in the list of the nested pruned trees from the entire (unpruned) tree. We give typical examples in medicine, highlighting routine use of induction in this domain even if the targeted diagnosis cannot be reached for many cases from the findings under investigation.
Diterpenes are organic compounds of low molecular weight that are based on a skeleton of 20 carbon atoms. They are of significant chemical and commercial interest because of their use as lead compounds in the search for new pharmaceutical effectors. The structure elucidation of diterpenes based on 13C NMR spectra is usually done manually by human experts with specialized background knowledge on peak patterns and chemical structures. Given a database of peak patterns for diterpenes with known structure, we applied machine learning to discover correlations between peak patterns and chemical structure. Three machine learning approaches were used: neural networks, nearest neighbor pattern classification, and decision-tree induction. Simple pre-processing of the raw 13C NMR spectra according to expert background knowledge yielded noticeable performance improvements. All three approaches achieve very high classification accuracy, but decision-tree induction has the advantage of explicitly stating the knowledge discovered.
In the paper some rule induction methods specific for medical domains are presented. As an example the application of inductive learning system ILLM to a breast cancer domain is described. The learning domain has been 699 cases of fine-needle aspiration biopsy collected in the Wisconsin Breast Cancer Database. The unique characteristics of the ILLM have been used to construct the rules of increased sensitivity and improved interobserver reproducibility with a hope that these properties might significantly influence the diagnosis reliability in practical applications. In the work a complete description of the generated rules that might be used by physicians and computer based systems, if attribute coding is done in accordance with the learning cases, is offered.
In this paper, we present an approach to evaluate, filter, and sort findings of Data Mining methods by applying multiple, subjective interestingness measures. A theory of interestingness for Knowledge Discovery in Databases (KDD) is proposed which is based on a language-oriented KDD model. Interestingness is seen as a relation between user and information. We operationalize interestingness by decomposing it in several facets (e.g. novelty) for which measures are definable. User models hold prior knowledge, goals, and long--term interests necessary to evaluate interestingness subjectively. We introduce the Data Mining Assistant which helps to narrow the gap between Data Mining and Knowledge Discovery by evaluating, sorting, and translating Data Mining results. In the evaluation phase interestingness ratings of the system have been compared with those of users. Adjusting the parameters of our interestingness measure differently, we aim at adapting the interestingness measure of the system to this of users. The theoretical concepts are illustrated and evaluated with an example from a medical domain.
This work presents a method for the classification of EEG spectra by means of Kohonen's self-organizing map \cite{kn:Kohonen82} \cite{kn:Kohonen95}.
We use EEG data recorded by 19 electrodes (channels), sampled at 128 Hz. Data vectors are extracted at intervals of half a second with a duration of one second each, resulting in vectors overlapping half a second. Before the training of the map, the sample vectors were compressed by either the Fast-Fourier-Transform or the Wavelet-Transform.
Data preprocessed by the Fourier-Transform result in short-time power spectra. These spectra are filtered by butterworth filters that meet the EEG frequency bands of the delta-, theta-, alpha-, beta- and gamma-rhythms. Data preprocessed by the Wavelet-Transform result in wavelet components that are combined and averaged.
The pre-processed vectors form "clusters" on the trained self-organizing map that are related to specific EEG-patterns.
Temporal abstraction, the derivation of abstractions from time-stamped data, is a central process in medical knowledge-based systems. Important types of temporal abstractions include periodic occurrences, trends, and other temporal patterns. The paper discusses the derivation of periodic occurrences at a theoretical, domain-independent level, and in the context of a specific temporal ontology.
Machine learning methods may be used to induce diagnostic rules from patient records with known diagnoses. In a medical application it is crucial that a machine learning system is capable of detecting regularities in the data by appropriately dealing with imperfect data, i.e., data that contains various kinds of errors, either random or systematic. The paper presents a compression-based method that is capable of detecting data which is suspected to contain errors and is therefore unsuited for the extraction of regularities genuine to this dataset. This noise elimination method is applied to a problem of early diagnosis of rheumatic diseases which is known to be a difficult problem, due both to its nature and to the imperfections in the dataset. The method is evaluated by applying the noise elimination algorithm in conjunction with the CN2 rule induction algorithm, and by comparing their performance to earlier results obtained by CN2 in this diagnostic domain.
The paper presents an application of the decision tree induction to the problem of prognosis of the outcome after severe head injury six months after the accident. Machine learning techniques and tools have already been applied in a variety of medical domains to help solving diagnostic and prognostic problems. These tools enable the induction of diagnostic and prognostic knowledge, for example in the form of rules or decision trees from training data. Patient records with corresponding diagnoses and prognoses are provided as input.
The study shows that induced decision trees are useful for the analysis of importance of clinical parameters and of their combinations for the prediction of outcome after severe head injury. Among the parameters studied, the brainstem syndrome (BSS) turns out to be the most important. Since this syndrome is subjectively evaluated, an experiment was made in which BSS was replaced by basic attributes from which BSS is estimated. It was shown that BSS can be replaced by the motor reaction to pain stimuli, which, in conditions of this study, has a similar predictive power. Due to a small number of patient data available for this study the induced decision trees cannot yet be considered as a reliable prognostic tool. Nevertheless, meaningful regularities have been discovered that help in the analysis of this difficult prognostic task.
Intensive Care depends on sophisticated life support technology; the effective management of device-supported patients is complex, involving the interpretation of several time-dependent variables. he ASSOCIATE system analyses historical data for summarisation and patient state assessment and processes raw ICU data in real-time for intelligent alarming. It uses a temporal expert system based on associational reasoning and applies three consecutive processes: 'filtering', which is used to remove noise; 'segmentation' to generate temporal intervals from the filtered data - intervals which are characterised by a common direction of change (i.e increasing, decreasing or steady); and 'interpretation' which performs summarisation and patient state-assessments from a historical point of view and intelligent alarming from a real-time point of view.
We have applied fourteen classifiers to the problem of Coronary artery disease progression. The classifiers were taken from different paradigms of machine learning (symbolic, statistical and neural) in order to encapsulate the different approaches. The unsolved problem of Coronary artery disease progression consists of predicting the stenosis (narrowing of the coronary artery) change on the basis of clinical, laboratory and epidemiological attributes. A total of 263 patients belonging to two classes (stenosis changed vs. non-changed) were described with 25 attributes. The overall results are not promising and suggest that the attributes used are not sufficiently relevant to enable a prediction of Coronary artery disease progression. It should also be pointed out that the simplest classifiers (the naive Bayes, Linear discriminant analysis) generally yield the best results. This phenomenon seems to be typical for medical data and is consistent with previous experience.
The many advances which have been achieved in the field of machine learning require additional application-oriented research so that they can be successfully applied to real world problems. For the domain of clinical studies in the pharmaceutical industry, we have therefore developed a new learning procedure on the basis of explanation-based abstraction. The background knowledge which is usually needed for automated learning is never directly available in a formalized manner. As an alternative to a complete formalization, it is therefore shown how a partial de-automation of a learning algorithm and interactive components are key success factors for utilizing machine learning in the industrial practice.
Developing practical tools to aid in understanding physiological systems is a formidable undertaking. This paper presents a method that uses a property structure for the domain being investigated. Furthermore, it employs realistic models to present examples of the behavior of the system. From these examples the principles that relate the properties are inferred through the use of machine learning. To allow prediction of property values in quantitative domain, interval logic and fuzzy logic based methods for qualitative model interpretation are proposed. The principles are expressed as qualitative rules that derive the values of the properties. The structured approach and the qualitative representation of principles provide a simplified means to reason about the roles of properties and meaning of principles of the physiological systems being investigated.