Logo LVSN
EnglishAccueil
A proposPersonnesRecherchePublicationsEvenementsProfil
A propos
Publications

 

 

 

 

CERVIM

REPARTI

MIVIM

A Novel Mixed Values k-Prototypes Algorithm with Application to Health Care Databases Mining


Ahmed Najjar, Christian Gagné and Daniel Reinharz


Abstract - The current availability of large datasets composed of heterogeneous objects stresses the importance of large-scale clustering of mixed complex items. Several algorithms have been developed for mixed datasets composed of numerical and categorical variables, a well-known algorithm being the k-prototypes. This algorithm is efficient for clustering large datasets given its linear complexity. However, many fields are handling more complex data, for example variable-size sets of categorical values mixed with numerical and categorical values, which cannot be processed as is by the k-prototypes algorithm. We are proposing a variation of the k-prototypes clustering algorithm that can handle these complex entities, by using a bag-of-words representation for the multivalued categorical variables. We evaluate our approach on a real-world application to the clustering of administrative health care databases in Quebec, with results illustrating the good performances of our method.

download document

Bibtex:

@inproceedings{Najjar1079,
    author    = { Ahmed Najjar and Christian Gagné and Daniel Reinharz },
    title     = { A Novel Mixed Values k-Prototypes Algorithm with Application to Health Care Databases Mining },
    booktitle = { Proc. of the IEEE Symposium Series on Computational Intelligence (IEEE SSCI 2014) },
    year      = { 2014 },
    month     = { December },
    location  = { Orlando, FL, USA }
}

Dernière modification: 2014/12/26 par cgagne

     
   
   

©2002-. Laboratoire de Vision et Systèmes Numériques. Tous droits réservés