Time-slice Action Prediction (TAP) dataset

Abstract

We introduce a new dataset, named Time-slice Action Prediction (TAP) dataset, to evaluate the uncertainty occurring when predicting actions based on time-slices.

We have extracted 2132 time-slices from 4 challenging datasets, i.e., UT-Interaction (segmented sets 1 and 2), HMDB, TV Interaction, and Hollywood datasets. Each time-slice contains one of seven interactions: handshake, high five, hug, kick, kiss, punch, and push. The dataset also contains 204 negative examples, time-slices of full videos that do not have any of the mentioned interactions.

We grouped together videos from constrained and unconstrained datasets. Constrain, here, refers to the restriction in the settings and activity executing. UTinteraction is our constrained dataset which contain acted interactions with a fixed background and profile viewpoint that are performed for research purpose. On the other hand, unconstrained datasets include activities which are taken in realistic settings, e.g. from TV shows. Unconstrained datasets are more challenging for activity recognition. HMDB, TV Interaction, and Hollywood are our unconstrained datasets. We selected videos of these datasets based on the camera angle ranging from -45 to +45 degree. All time-slices was annotated by multiple online annotators (using the Crowdflower platform). 3 annotators rated each time-slice on how likely a specific action is occurring. For each time-slice and for each action, the annotator was asked to pick one of 5 likelihoods: “Definitely Not Occurring (1)”,“Unlikely to Occur (2)”,“Neither likely nor unlikely (3)”,“Likely to Occur (4)” ,“Definitely Occurring(5)”.

Please cite the following publication if you use this dataset:

Download

To download the dataset, please click here

Contact Details

For further details please contact Maryam Ziaeefard