skip to main content
10.1145/1639642.1639704acmconferencesArticle/Chapter ViewAbstractPublication PagesassetsConference Proceedingsconference-collections
poster

VocaliD: personalizing text-to-speech synthesis for individuals with severe speech impairment

Published: 25 October 2009 Publication History

Abstract

Speech synthesis options on assistive communication devices are very limited and do not reflect the user's vocal quality or personality. Previous work suggests that speakers with severe speech impairment can control prosodic aspects of their voice, and often retain the ability to produce sustained vowel-like utterances. This project leverages these residual phonatory abilities in order to build an adaptive text-to-speech synthesizer that is intelligible, yet conveys the user's vocal identity. Our VocaliD system combines the source characteristics of the disordered speaker with the filter characteristics of an age-matched healthy speaker using voice transformation techniques, in order to produce a personalized voice. Usability testing indicated that listeners were 94% accurate in transcribing morphed samples and 79.5% accurate in matching morphed samples from the same speaker.

References

[1]
Ansel, B. M.,&Kent, R. D. (1992). Acoustic-phonetic contrasts and intelligibility in the dysarthria associated with mixed cerebral palsy. J. of Speech, Language&Hearing Research. 35(2), 296--308.
[2]
Bunnell, H. T., Gray, J., Pennington, C. and Yarrington, D. 2005. A system for creating personalized synthetic voices. Presented at ASSETS, Baltimore, MD.
[3]
IEEE Recommended Practices for Speech Quality Measurements. (1969). IEEE Transactions on Audio and Electroacoustics, 17, 227--46.
[4]
Kain, A. and Macon, M. (1998). Personalizing a speech synthesizer by voice adaptation. Proceedings of the Third ESCA/COCOSDA International Speech Synthesis Workshop, 225--230.
[5]
Matas, J., Mathy-laikko, P., Beukelman, D., and Legresley, K. (1985). Identifying the non-speaking population: A demographic study. Augmentative and Alternative Communication, 1, 17--31.
[6]
Matsumoto, H., Hiki, S., Sone, T. and Nimura, T. (1973). Multidimensional representation of personal quality of vowels and its acoustical correlates. IEEE Transactions on Audio and Electroacoustics, 21(5), 428--436.
[7]
Patel, R. (2003). Acoustic characteristics of the question-statement contrast in severe dysarthria due to cerebral palsy. J. of Speech, Language&Hearing Research, 46, 1401--1415.
[8]
Patel, R., and Campellone, P. (2009). Production and identification of contrastive stress in dysarthria. J. of Speech Language&Hearing Research, 56, 206--222.
[9]
SUSGEN (Semantically Unpredictable Sentence Generator) software available at http://www.asel.udel.edu/speech/download/susgen.tgz
[10]
Toda, T., Black, A., and Tokuda, K. (2005). Spectral Conversion Based on Maximum Likelihood Estimation Considering Global Variance of Converted Parameter ICASSP, Philadelphia, Pennsylvania.

Cited By

View all
  • (2024)Trident of Poseidon: A Generalized Approach for Detecting Deepfake VoicesProceedings of the 2024 on ACM SIGSAC Conference on Computer and Communications Security10.1145/3658644.3690311(2222-2235)Online publication date: 2-Dec-2024
  • (2024)Natural Language Processing for Smart HealthcareIEEE Reviews in Biomedical Engineering10.1109/RBME.2022.321027017(4-18)Online publication date: 2024
  • (2024)The development of synthetic child speech in three South African languagesAugmentative and Alternative Communication10.1080/07434618.2024.2374312(1-12)Online publication date: 11-Jul-2024
  • Show More Cited By

Index Terms

  1. VocaliD: personalizing text-to-speech synthesis for individuals with severe speech impairment

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    Assets '09: Proceedings of the 11th international ACM SIGACCESS conference on Computers and accessibility
    October 2009
    290 pages
    ISBN:9781605585581
    DOI:10.1145/1639642

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 25 October 2009

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. assistive communication
    2. dysarthria
    3. speech generation devices
    4. synthesis
    5. text-to-speech

    Qualifiers

    • Poster

    Conference

    ASSETS09
    Sponsor:

    Acceptance Rates

    Overall Acceptance Rate 436 of 1,556 submissions, 28%

    Upcoming Conference

    ASSETS '25

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)38
    • Downloads (Last 6 weeks)5
    Reflects downloads up to 15 Feb 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)Trident of Poseidon: A Generalized Approach for Detecting Deepfake VoicesProceedings of the 2024 on ACM SIGSAC Conference on Computer and Communications Security10.1145/3658644.3690311(2222-2235)Online publication date: 2-Dec-2024
    • (2024)Natural Language Processing for Smart HealthcareIEEE Reviews in Biomedical Engineering10.1109/RBME.2022.321027017(4-18)Online publication date: 2024
    • (2024)The development of synthetic child speech in three South African languagesAugmentative and Alternative Communication10.1080/07434618.2024.2374312(1-12)Online publication date: 11-Jul-2024
    • (2023)Enhance Communication for Autistic People Using a Combination of AAC Mobile Applications and Custom Voice Generator System2023 International Conference on Digital Age & Technological Advances for Sustainable Development (ICDATA)10.1109/ICDATA58816.2023.00033(139-144)Online publication date: 3-May-2023
    • (2022)A Situational Analysis of Current Speech-Synthesis Systems for Child Voices: A Scoping Review of Qualitative and Quantitative EvidenceApplied Sciences10.3390/app1211562312:11(5623)Online publication date: 1-Jun-2022
    • (2022)An assistive interface protocol for communication between visually and hearing-speech impaired persons in internet platformDisability and Rehabilitation: Assistive Technology10.1080/17483107.2022.207889819:1(233-246)Online publication date: 26-May-2022
    • (2019)Voice Banking to Support People Who Use Speech-Generating Devices: New Zealand Voice Donors' PerspectivesPerspectives of the ASHA Special Interest Groups10.1044/2019_PERS-SIG2-2018-0011(1-8)Online publication date: 4-Jul-2019
    • (2017)An exploration of the accentuation effect: errors in memory for voice fundamental frequency (F0) and speech rateLanguage, Cognition and Neuroscience10.1080/23273798.2017.136867633:1(98-110)Online publication date: 30-Aug-2017
    • (2016)Clear Speech: Technologies that Enable the Expression and Reception of LanguageSynthesis Lectures on Assistive, Rehabilitative, and Health-Preserving Technologies10.2200/S00672ED1V01Y201509ARH0085:1(1-103)Online publication date: 7-Mar-2016
    • (2016)Don't Say Yes, Say YesProceedings of the 2016 CHI Conference Extended Abstracts on Human Factors in Computing Systems10.1145/2851581.2890245(3643-3646)Online publication date: 7-May-2016
    • Show More Cited By

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media