|
Publications |
|
Distributed Data Mining vs Sampling Techniques: a ComparisonAbstract - To address the of mining a huge volume of geographically distributed databases, we propose two approaches. The first one is to download only a sample of each database. The second option is to mine each distributed database remotely and to download the resulting models to a central site and then aggregate these models. In this paper, we present an overview of the most common sampling techniques. We then present a new technique of distributed data-mining based on rule set models, where the aggregation technique is based on a confidence coefficient associated with each rule and on very small samples from each database. Finally, we present a comparison between the best sampling techniques that we found in the literature, and our approach of model aggregation. This work is sponsored by NSERC. Bibtex:
@article{Aounallah491, Dernière modification: 2004/08/31 par squirion |
|||
©2002-. Laboratoire de Vision et Systèmes Numériques. Tous droits réservés |