Data sets

The following data sets were produced in the Vici project:

The D-TUNA dataset of Dutch referring expressions. An XML annotated data set of 2400 Dutch referring expressions produced by sixty speakers, inspired by the English TUNA data set. The D-TUNA corpus contains both spoken and typed referring expressions. This data set is distributed by the TST-Centrale (Dutch HLT Agency) of the Nederlandse Taalunie (Dutch Language Union). For more information, contact Ruud Koolen.

The Tie corpus. An experimental data set which was collected to study the effect of gestures on speech, involving thirty eight pairs of participants that took part in a director-matcher game in which the director had to watch video clips depicting a person tying different kinds of tie-knots and, after watching these video clips, instruct the matcher to tie an actual tie in the same manner as in the video clips. Half of the describers were unable to see the matcher because an opaque screen had been placed between the participants and all describers had to sit on their hands for half of the experiment. For more information, contact Marieke Hoetjes.

The Greebles data set. An experimental data set that was collected to study reduction in speech and gesture in repeated references. Fourty eight pairs of participants took part in a director-matcher task in which the director had to describe abstract objects (so-called "Greebles" from Michael Tarr’s lab) in such a way that the matcher could select the correct object from a range of similar objects. Several objects had to be described more than once, leading to repeated referring expressions. The initial and repeated references were annotated for their semantic content (in XML) and for the gestures that were produced. For more information, contact Marieke Hoetjes or Ruud Koolen.

More data sets are currently being collected.

© 2010 - Lennard van de Laar - Dualler