SSHOC Speech-to-text Workshop - Linking Social Survey and Linguistic Infrastructures through speech interviews

Date: April 16, 2021
Place: Online
Organizer: Social Sciences & Humanities Open Cloud

Survey Infrastructures systematically interview tens of thousands of individuals across Europe each year. Respondents are selected at random from all walks of life, and the hour-long interviews provide a range of data which has value for researchers and subsequently policy makers.

While complex life histories or events may be coded into the structured taxonomies required for cutting-edge sociological research, a large proportion of the information conveyed in an interview is lost. A respondent's tone of voice, linguistic fluidity, and depth of vocabulary for example can provide insights about cognitive function, socio-economic status or verbal reasoning skills.

Making use of this lost data requires the integration of social survey and linguistic infrastructures. Such integration underpins the EOSC vision. As such, the basis for the work within SSHOC on analysing voice recorded interviews seeks to provide both a proof of concept and a framework for future research that explores this approach.

 

  • Judith Koops from the Generations and Gender Programme, will provide an overview of the project. She will focus on the advantages of collaboration between the different infrastructures and new insights generated over the course of the project.

  • Joris Mulder from the LISS panel will demonstrate the tools used for collecting audio data through existing survey software in online interviews. He will provide an evaluation of the challenges encountered in this project as well as the way these issues were solved. 

  • Henk van den Heuvel from the Speech and Tech team will then describe the tools used for analysis of Oral History data which could be adapted for analysis of survey interviews. In particular he will address the so-called Transcription Chain, which is based on automatic speech-to-text conversion. The resulting text can, after manual correction, be processed by NLP tools to obtain more insights into its linguistic structure, or for topic detection or text summarisation, amongst others.

  • Giovanni Borghesan from the European Values Study will lead the interactive session where participants will discuss potential applications for the tools, the use of the data for new avenues of scientific research, as well as ways to improve the collection, processing and archiving of audio data.


    Register here! 


    Share the event on Twitter https://twitter.com/SSHOpenCloud/status/1375135061226500101/photo/1


    UNI-FDVCESSDA coretrust_logo RDA_Node
    ADP is part of the Social Sciences Research Institute of the Faculty of Social Sciences. The Slovenian Research Agency provides funding of the ADP within the infrastructure program "Network of Research and Infrastructural Centres" The ADP is a member of the umbrella organization of the European Social Science Data Archives CESSDA ERIC. © ADP (ISSN 2385-9415) | 1997 - 2017 | arhiv.podatkov@fdv.uni-lj.si