Graphical Models for Data Mining (NNN.5321)
Project nummer:
nnn5321
Omschrijving van het onderzoek
Data mining can be defined as the non-trivial extraction of implicit, previously unknown, and potentially useful information from data. Commercial data mining is booming for several reasons: there are lots of data produced and stored, computing power is growing stronger and cheaper, the markets are very competitive, and many commercial packages and consultants are available. Graphical models, also known as probabilistic or (Bayesian) belief networks, form a representation of variables in the domain and the (probabilistic) relationships between them. More and more they are being seen as a convenient high-level language for structuring complex relationships and thus as the solution for handling uncertainty when reasoning about complex databases. Building a graphical model from data largely corresponds to learning the structure of the model.
Despite their potential, there are only a limited amount of concrete applications of graphical models on complex databases. From a technical point of view, there are three problems that might explain this, all of which we plan to work on.
- Exact computation in graphical models becomes intractable for any reasonable number of variables. One has to resort to approximations. We plan to use and extend recently developed variational methods, and to integrate them with structural learning.
- Current methods mainly focus on discrete variables, whereas many databases contain variables with all kinds of modalities. Also here, variational methods will be of help, since they are based on approximating distributions with similar functional forms for all kinds of modalities, which makes them relatively easy to combine.
- Elicitation of prior beliefs about plausible structures has hardly been studied, but can be very helpful to guide the search for sensible structures. Here we will investigate whether maximum-entropy techniques can help to translate a limited set of prior beliefs to a prior distribution over structures.
The ultimate goal of this project is to show the usefulness of graphical models for data mining. The developed methods will be tested on real-world data and implemented in a prototype software package. Our main application area will be marketing and sales, because in this area the benefits of data mining are already widely appreciated. The involvement of the users in this project is essential. We will test the algorithms on the data supplied by these companies, eliciting and incorporating their expert opinions about plausible structures. More importantly even, we will give them hands-on experience to interactively mine their own data, collect their feedback, and improve the prototype software.
Resultaten van het onderzoek
Er zijn nog geen resultaten bekend.
Gebruikers
Three companies are involved in this project.
Projectleider
| Dr. H.J. Kappen |
Katholieke Universiteit Nijmegen Medische Fysica en Biofysica |
Postbus 9101 6500 HB Nijmegen |
Status van het project
| Gestart
| : 01-05-2001
|
| Einddatum
| : 01-11-2004 |
.