Date of Award
12-2012
Document Type
Dissertation
Degree Name
Doctor of Philosophy (PhD)
Legacy Department
Computer Engineering
Committee Chair/Advisor
Gowdy, John N.
Committee Member
Schalkoff , Robert
Committee Member
Birchfield , Stanley
Committee Member
Park , Chanseok
Abstract
This dissertation focuses on determining specific vowel phonemes which work best for speaker identification and speaker verification, and also developing new algorithms to improve speaker identification accuracy. Results from the first part of our research indicate that the vowels /i/, /E/ and /u/ were the ones having the highest recognition scores for both the Gaussian mixture model (GMM) and vector quantization (VQ) methods (at most one classification error). For VQ, /i/, /I/, /e/, /E/ and /@/ had no classification errors. Persons speaking /E/, /o/ and /u/ have been verified well by both GMM and VQ methods in our experiments. For VQ, the verification results are consistent with the identification results since the same five phonemes performed the best and had less than one verification error.
After determining several ideal vowel phonemes, we developed new algorithms for improved speaker identification accuracy. Phoneme weighting methods (which performed classification based on the ideal phonemes we found from the previous experiments) and other weighting methods based on energy were used. The energy weighting methods performed better than the phoneme weighting methods in our experiments. The first energy weighting method ignores the speech frames which have relatively small magnitude. Instead of ignoring the frames which have relatively small magnitude, the second method emphasizes speech frames which have relatively large magnitude. The third method and the adjusted third method are a combination of the previous two methods. The error reduction rate was 7.9% after applying the first method relative to a baseline system (which used Mel frequency cepstral coefficients (MFCCs) as feature and VQ as classifier). After applying the second method and the adjusted third method, the error reduction rate was 28.9% relative to a baseline system.
Recommended Citation
Fang, Eric, "Phoneme Weighting and Energy-Based Weighting for Speaker Recognition" (2012). All Dissertations. 1063.
https://open.clemson.edu/all_dissertations/1063