RI: SMALL: TOWARDS A RESOURCE SUITE FOR CONVERSATIONAL SPEECH-TO-SPEECH TRANSLATION RESEARCH

RI: SMALL: TOWARDS A RESOURCE SUITE FOR CONVERSATIONAL SPEECH-TO-SPEECH TRANSLATION RESEARCH

PI: Nigel Ward

Co-PI: Olac Fuentes

Sponsor: NATIONAL SCIENCE FOUNDATION

Computer Science

Amount awarded: $600,000

Machine translation research has made astounding progress, from text-to-text, to speech-to-text, and most recently, to speech-to-speech. However the latter only works well for certain use cases. This project will enable the development of systems able to better support people not only talking at but also talking with speakers of different languages. Specifically, it will focus on the aspects of communication beyond words, including pitch and other prosodic features. The ultimate outcome will be speech-to-speech translation systems that support deeper communication among people, both across national boundaries and within our language-diverse nation, empowering individuals and strengthening social cohesion. Further, by increasing knowledge of how prosody supports effective communication in dialog, this will ultimately enable language teachers and others to help people communicate better also when unaided by technology. In addition, better representations and methods for modeling prosody and the pragmatic aspects of language will enable artificial intelligence systems --- smart speakers, smartphones, smart cars, robots, and so on --- to better support their users, in more contexts and in more languages. More technically, this project will address issues in prosody, as a major barrier to widening the utility of speech-to-speech translation. Accordingly, the aims of the proposed project are to advance knowledge of how prosody relates across languages and the ability to use machine learning to model this. The driving goal will be the construction of models that take as input an utterance of one language and predict appropriate prosody for the translation in a second language, initially for Spanish and English. This will be accomplished by the application of various machine-learning techniques, including the exploitation of explainable features, designed to match human perceptions, and of features extracted by pretrained models. The work will start with stand-alone prosody modeling and then progress to the application of prosodic knowledge or modeling techniques to improve the quality of fully-functional systems, likely including both cascaded systems and end-to-end systems. In support of this modeling work there will be a thrust to create a new corpus of pragmatically-faithful paired utterances across two languages, and a thrust to develop automatic metrics for judging the pragmatic fidelity of model output to the source-language input. These resources will enable creation of a shared task to be offered as a challenge for the research community. In addition, the project will include smaller efforts in language description, in the form of case studies in specific aspects of the differences in prosody between Spanish and English, and in studying user perceptions and evaluations of translation systems with different levels of prosodic competence.

Posting date: Wed, 05/01/2024

Award start date: Mon, 04/01/2024

Award end date: Wed, 03/31/2027