Data analysis using fuzzy expressions - creating comprehensible computational models from data

M. Drobics. Data analysis using fuzzy expressions - creating comprehensible computational models from data. 6, 2005.

  • Mario Drobics
Abstract This work is concerned with the general problem of creating comprehensible computationalmodels from data. To derive such models from data, various different methods likestatistical regression, artificial neural nets, and inductive learning of decision rules/treesexist. Though all these methods differ in structural and algorithmic form and in how theysearch for a solution (some kind of optimization process), they all try to abstract generalconcepts from specific examples (inductive learning).When performing inductive learning using real world data, the problem arises thatthe available data might contain imprecise (due to measurement errors) or vague information(e.g. linguistic expression). As boolean logic can only distinguishes between trueor false, a number of problems arise when dealing with this kind of unpreciseness. Fuzzylogic has proven to overcome these shortcomings. Furthermore, fuzzy logic can not onlybe used to describe real world phenomenons using logical expressions, but to computevalues in the real domain, too.In this work we will discuss the theoretical foundations of fuzzy set theory which arenecessary to create computational models. We will furthermore discuss the problem ofdata preprocessing and data cleaning. Then we will present four novel methods to learncomputational models from a given set of samples. By using fuzzy logic to define theunderlying language and by integrating ordering based predicates we ensure, that thesemodels are not only computationally accurate, but also compact and easy to comprehend.Due to the generality of this approach these methods can be applied for classification, aswell as for regression learning problems. Furthermore, these methods can also be used toderive linguistic descriptions of (previously unknown) patterns hidden in the data. Wewill illustrate the generality of our approach by presenting different applications of theproposed methods for all three types of learning problems. Finally, we will compare theobtained results with those of other well known methods.The concepts and methods described in this work have been successfully applied in anumber of real world applications and are available within the machine learning frameworkfor Mathematica which has mainly been developed by the author.