Advertisement
If you have a new account but are having problems posting or verifying your account, please email us on hello@boards.ie for help. Thanks :)
Hello all! Please ensure that you are posting a new thread or question in the appropriate forum. The Feedback forum is overwhelmed with questions that are having to be moved elsewhere. If you need help to verify your account contact hello@boards.ie
Hi there,
There is an issue with role permissions that is being worked on at the moment.
If you are having trouble with access or permissions on regional forums please post here to get access: https://www.boards.ie/discussion/2058365403/you-do-not-have-permission-for-that#latest

Voice comparison

  • 12-11-2009 10:29am
    #1
    Registered Users, Registered Users 2 Posts: 121 ✭✭


    Hi.
    I'm trying to do voice comparison using any of the following languages: C++, Java or C#. I am doing an application that will compare a non chinese speakers pronunciation with a natives'.

    Does anybody know any library or toolkit that could help me with this task? (I have found a lot of sites that talk about voice recognition but none about comparison)

    Thank you all.


Comments

  • Closed Accounts Posts: 8,015 ✭✭✭CreepingDeath


    What do you mean by comparing... what output are you expecting ?

    I imagine the most straight forward approach is to record both sound samples and display a spectral analysis in the form of a graph.
    In fact, you'll have 2 graphs, one for each sample.

    Then maybe try and overlap them.

    Spectrum analysis with java

    You could probably work out a percentage correlation figure of one graph against another.
    So for each attempt the non-native speaker makes, they are given a percentage score of say 70%, 78% etc...
    Allowing them to repeat their attempts and get feedback on how well they are doing.


  • Registered Users, Registered Users 2 Posts: 121 ✭✭poncho000


    yes i think you have the main point. I will be pre-recording a number of phrases and these will played out to the learner. Then, the learner will be asked to repeat after each phrase into a microphone. So then a comparison will be done between the learners pronunciation and the pre-recorded one.

    Yourr suggestion is good and I will look further into it. Ive done some searching into it and it does seem it will involve plotting some graphs and comparing them. But really im not familiar with this area so im wondering what libraries or toolkits i need to be looking at.


  • Closed Accounts Posts: 8,015 ✭✭✭CreepingDeath


    One tricky problem I imagine is taking into account the speed of the talker.

    You might have a slow talker who stretches out their words, making a comparison of graphs trickier, unless you can somehow "normalise" the graphs, ie. scale the width & height down to a common size.


  • Registered Users, Registered Users 2 Posts: 47 bundaegi


    Hi,

    Just a few comments on this problem.

    There has been quite a lot of work done previously in this area. Pronunciation comparison is typically viewed as a sub-topic within Computer Aided Language Learning (CALL), so some searches using those keywords may be helpful.

    As some of the previous comments have suggested, accurately comparing the pronunciations of two speakers is quite a difficult task. The classic approach is to:
    1) Use a speech recognition system, usually HMM-based, to segment the non-native speaker's speech into phones (small units of speech). This segmentation identifies the start and end times of these individual phones, thus allowing varying durations to be facilitated.
    2) The spectrum and prosody of these phones can then be compared to those produced by a native speaker (or ideally models of phones built from a large number of native speakers, to model an 'average' native pronunciation).

    More details can be found in this paper. [Neumeyer, L. Automatic Scoring of Pronunciation Quality, Speech Communication, vol. 30 (2-3), February 2000, pp. 83-93.]

    Alternatively, a simpler approach would involve omitting the speech recognition part and using a technique known as dynamic time warping to match the durations of the two utterances.

    As regards a library or toolkit to help with this task, you will probably need a signal processing library to provide Fourier transforms, and other related functions. There are many of these available for each of the languages mentioned, e.g. GENIAL for C++.


Advertisement