CVSL Logo
FrancaisHome
AboutPeopleResearchPublicationsEventsProfile
About
Publications

 

 

 

CERVIM

REPARTI

MIVIM

Distributed Data Mining vs Sampling Techniques: a Comparison


Mohamed Aounallah, Sébastien Quirion and Guy Mineau


Abstract - To address the of mining a huge volume of geographically distributed databases, we propose two approaches. The first one is to download only a sample of each database. The second option is to mine each distributed database remotely and to download the resulting models to a central site and then aggregate these models. In this paper, we present an overview of the most common sampling techniques. We then present a new technique of distributed data-mining based on rule set models, where the aggregation technique is based on a confidence coefficient associated with each rule and on very small samples from each database. Finally, we present a comparison between the best sampling techniques that we found in the literature, and our approach of model aggregation. This work is sponsored by NSERC.

download document

Bibtex:

@article{Aounallah491,
    author    = { Mohamed Aounallah and Sébastien Quirion and Guy Mineau },
    title     = { Distributed Data Mining vs Sampling Techniques: a Comparison },
    pages     = { 454 - 460 },
    year      = { 2004 },
    month     = { May },
    journal   = { Advances in Artificial Intelligence: 17th Conference of the Canadian Society for Computational Studies of Intelligence },
    ISBN      = { 3-540-22004-6 },
    location  = { London, Ontario, Canada }
}

Last modification: 2004/08/31 by squirion

     
   
   

©2002-. Computer Vision and Systems Laboratory. All rights reserved