Automatic speech annotation based on enhanced wavelet Packets Best Tree Encoding (EWPBTE) feature

dc.AffiliationOctober University for modern sciences and Arts (MSA)
dc.contributor.authorMohamed M.H.
dc.contributor.authorHassan A.M.A.
dc.contributor.authorHassan N.M.H.
dc.contributor.otherDepartment of Electronics and Communications Engineering
dc.contributor.otherOctober University for Modern Sciences and Arts
dc.contributor.other6 October City
dc.contributor.otherEgypt; Faculty of Engineering-Fayoum University
dc.contributor.otherEgypt
dc.date.accessioned2020-01-09T20:41:31Z
dc.date.available2020-01-09T20:41:31Z
dc.date.issued2016
dc.descriptionScopus
dc.description.abstractThis paper aimed at introducing a completely automated Arabic phone recognition system based on Enhanced Wavelet Packets Best Tree Encoding (EWPBTE) 15-point speech feature. The process of enhancing of WPBTE is provided by adding energy component to WPBTE, which is implemented in Matlab software and makes an enhancement of 65 % to recognizer accuracy which is the most contribution in this paper. EWPBTE is used to find phoneme boundaries along speech utterance. Hidden Markov Model (HMM) and Gaussian Mixtures are used for building the statistical models through this research. HMM Tool Kit (HTK) software is utilized for implementation of the model. The System can identify spoken phone at 57.01% recognition rate based on Mel Frequency Cepstral Coefficients (MFCC), 21.07% recognition rate based on WPBTE and 86.23% recognition rate based on EWPBTE. The proposed EWPBTE vector is 15 components compared to 39 components of MFCC. This makes it very promising features vector to be under research and in development phase. � 2016 IEEE.en_US
dc.identifier.doihttps://doi.org/10.1109/ICEEOT.2016.7755165
dc.identifier.isbn9.78E+12
dc.identifier.otherhttps://doi.org/10.1109/ICEEOT.2016.7755165
dc.identifier.urihttps://ieeexplore.ieee.org/document/7755165
dc.language.isoEnglishen_US
dc.publisherInstitute of Electrical and Electronics Engineers Inc.en_US
dc.relation.ispartofseriesInternational Conference on Electrical, Electronics, and Optimization Techniques, ICEEOT 2016
dc.subjectAccuracyen_US
dc.subjectComponentsen_US
dc.subjectGaussian Mixtureen_US
dc.subjectPhoneen_US
dc.subjectRecognition Rateen_US
dc.subjectCharacter recognitionen_US
dc.subjectEncoding (symbols)en_US
dc.subjectForestryen_US
dc.subjectHidden Markov modelsen_US
dc.subjectMarkov processesen_US
dc.subjectMATLABen_US
dc.subjectTelephone setsen_US
dc.subjectTrellis codesen_US
dc.subjectAccuracyen_US
dc.subjectComponentsen_US
dc.subjectDevelopment phaseen_US
dc.subjectEnergy componentsen_US
dc.subjectGaussian mixturesen_US
dc.subjectMel-frequency cepstral coefficientsen_US
dc.subjectPhoneen_US
dc.subjectPhone recognitionen_US
dc.subjectSpeech recognitionen_US
dc.titleAutomatic speech annotation based on enhanced wavelet Packets Best Tree Encoding (EWPBTE) featureen_US
dc.typeConference Paperen_US
dcterms.isReferencedByZheng, J., Stolcke, A., Improved iscriminative training using phone lattices (2013) Proceedings of InterSpeech, pp. 215-222. , Lisbon, Portugal, September; Gody, A.M., Seoud, R.A.A., Hassan, M., Automatic speech annotation using HMM based on best tree encoding (BTE) feature (2013) The Eleventh Conference on Language Engineering, pp. 214-225. , Cairo, Egypt; Zweig, G., (1998) Speech Recognition with Dynamic Bayesian Networks, , PhD thesis, University of alifornia, Berkeley; Young, S., Odell, J.J., Woodland, P.C., Tree-based state tying for high accuracy acoustic modeling (2009) ARPA Human Language Technology Workshop, pp. 304-312; Demuynck, K., Laureys, T., A comparison of different approaches to automatic speech segmentation (2012) Lecture Notes in Computer Science, 2448, pp. 385-406. , Springer Berlin/Heidelberg; �aric, Z.M., Turajlic, S.R., A new approach to speech segmentation based on the maximum likelihood (2014) Journal of Circuits, Systems, and Signal Processing, Birkh�user Boston, 14 (5), pp. 615-663. , September; Chin-Teng, D.-J., Rui-Cheng, G.-D., Noisy speech segmentation/enhancement with multiband analysis and neural fuzzy networks (2012) Lecture Notes in Computer Science, 2275, pp. 81-94. , Springer Berlin/Heidelberg; Zhu, Q., Chen, Y., Morgan, N., On using MLP features in LVCSR (2012) Proceedings of ICSLP, pp. 324-336. , Jeju, Korea; Zweig, G., Padmanabhan, M., Boosting Gaussian mixtures in an LVCSR system (2011) Proceedings of ICASSP, pp. 216-230. , Istanbul; Gody, A.M., Voiced/unvoiced and silent classification using HMM classifier based on wavelet packets WPBTE features (2012) The 8th Conference on Language Engineering, pp. 564-573. , Cairo, Egypt; Zhang, B., Matsoukas, S., Schwartz, R., Discriminatively trained region dependent feature transforms for speech recognition (2012) Proceedings of ICASSP, pp. 340-352. , Toulouse; Gales, M.J.F., Young, S.J., The application of hidden Markov models in speech recognition (2011) Foundations and Trends in Signal Processing (3), pp. 195-204; Mporas, I., Ganchev, T., Fakotakis, N., Phonetic segmentation using multiple speech features (2011) International Journal of Speech Technology, 11 (2), pp. 73-85. , Springer Netherlands, June; Zen, H., Tokuda, K., Kitamura, T., A Viterbi algorithm for a trajectory model derived from HMM with explicit relationship between static and dynamic features (2013) Proceedings of ICASSP, pp. 315-330. , Montreal, Canada; Gody, A.M., Wavelet packets best tree 4-points encoded (WPBTE) features (2012) The 8th Conference on Language Engineering, pp. 345-356. , Cairo, Egypt; Yu, K., Gales, M.J.F., Bayesian adaptive inference and adaptive training (2009) IEEE Transactions Speech and Audio Processing, 15 (6), pp. 1932-1943. , August; Yu, K., Gales, M.J.F., Woodland, P.C., Unsupervised training with directed manual transcription for recognising Mandarin broadcast audio (2012) Proceedings of InterSpeech, pp. 132-145. , Antwerp; Yu, K., Gales, M.J.F., Discriminative cluster adaptive training (2012) IEEE Transactions on Speech and Audio Processing, 14 (5), pp. 1694-1703; Chen, Y., Wang, Q., A speaker based unsupervised speech segmentation algorithm used in conversational speech (2012) Lecture Notes in Computer Science, 4798, pp. 396-402. , Springer Berlin/Heidelberg; Young, S., Gales, M., Liu, X.A., Woodland, P., (2008) The HTK Book, , Version 3.41, Cambridge University Engineering Department
dcterms.sourceScopus

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
avatar_scholar_256.png
Size:
6.31 KB
Format:
Portable Network Graphics
Description: