Stream Clustering of Tweets

Sophie Baillargeon, Simon Hallé and Christian Gagné

Abstract - This paper proposes an approach to cluster social media posts. It aims at taking full advantage of this recent source of newsworthy information and at facilitating the work of users who need to monitor public events in real-time. The emphasis is on developing a stream clustering algorithm able to process incoming tweets. A first implementation of the algorithm, focusing on the tweets' text, was tuned and tested on a dataset of manually annotated messages. Results show that the algorithm produces a partition of tweets similar to the manual partition obtained from humans. In future work, we plan to extend this algorithm with additional features and integrate the resulting analytical capabilities to a real-time social media monitoring platform called CrowdStack.

download document


    author    = { Sophie Baillargeon and Simon Hallé and Christian Gagné },
    title     = { Stream Clustering of Tweets },
    booktitle = { First International Workshop on Social Network Analysis Surveillance Techniques (SNAST 2016) },
    year      = { 2016 },
    month     = { 8 }

Last modification: Jul 15 2016 2:30PM by cgagne


©2002-. Computer Vision and Systems Laboratory. All rights reserved