Nature Inspired Metaheuristics for Classification Rule Induction in Data Mining with an Application in Power Plant Risk Management

Autor
P. Peherstorfer
Masterarbeit
MT0507 (November, 2005)
Betreut von
o. Univ.-Prof. Dr. Michael Schrefl
Prof. Dr. Markus Stumptner
Ausgeführt an
Universität Linz, Institut für Wirtschaftsinformatik - Data & Knowledge Engineering
University of South Australia Department for Computer and Information Science

Vorwort

As a result of the ubiquitous usage of information technology and the continuing computerization of almost every part of human endeavour, vast amounts of data are accumulated in ever increasing databases and data warehouses. Ever since the dawn of computerization, many companies and institutions are investing enormous amounts to collect data but only recently they started to take advantage of the valuable and potentially useful information hidden within. The need for automated systems for uncovering patterns and trends in large data repositories arises due to the inadequacy of human analysists when searching for complex dependencies in such quantities of data and the high costs involved. The automated process of extracting implicit, previously unknown, and potentially useful information from data is termed Data Mining. A subset of Data Mining applications aims at predicting future scenarios from data that describe what happened in the past by estimating the classification of unseen examples. To discover patterns and hypothesis in datasets which allow such predicitions, elaborate search strategies have to be incorporated. The objective of this thesis is to familiarize the reader with the different aspects of such search strategies and to further investigate a specific family of search methods, namely nature inspired search algorithms. In particular, two search methods are scrutinized: Genetic Algorithms and Ant Colony Algorithms. The thesis is organized in layers, as illustrated in Figure 1. The first two chapters are of theoretical nature and provide a basic introduction to the three principal parts, namely Metaheuristics, Data Mining and Metaheuristics in Data Mining. On this basis, the practical part is developed, consisting of two standard implementations of the investigated algorithms and one improved, hybridized variant. The closing chapter is made up of a Case Study application of the improved implementation, namely risk management in a power plant.