Automatic speech annotation based on enhanced wavelet Packets Best Tree Encoding (EWPBTE) feature

Mohamed M.H.; Hassan A.M.A.; Hassan N.M.H.

doi:https://doi.org/10.1109/ICEEOT.2016.7755165

Automatic speech annotation based on enhanced wavelet Packets Best Tree Encoding (EWPBTE) feature

dc.Affiliation	October University for modern sciences and Arts (MSA)
dc.contributor.author	Mohamed M.H.
dc.contributor.author	Hassan A.M.A.
dc.contributor.author	Hassan N.M.H.
dc.contributor.other	Department of Electronics and Communications Engineering
dc.contributor.other	October University for Modern Sciences and Arts
dc.contributor.other	6 October City
dc.contributor.other	Egypt; Faculty of Engineering-Fayoum University
dc.contributor.other	Egypt
dc.date.accessioned	2020-01-09T20:41:31Z
dc.date.available	2020-01-09T20:41:31Z
dc.date.issued	2016
dc.description	Scopus
dc.description.abstract	This paper aimed at introducing a completely automated Arabic phone recognition system based on Enhanced Wavelet Packets Best Tree Encoding (EWPBTE) 15-point speech feature. The process of enhancing of WPBTE is provided by adding energy component to WPBTE, which is implemented in Matlab software and makes an enhancement of 65 % to recognizer accuracy which is the most contribution in this paper. EWPBTE is used to find phoneme boundaries along speech utterance. Hidden Markov Model (HMM) and Gaussian Mixtures are used for building the statistical models through this research. HMM Tool Kit (HTK) software is utilized for implementation of the model. The System can identify spoken phone at 57.01% recognition rate based on Mel Frequency Cepstral Coefficients (MFCC), 21.07% recognition rate based on WPBTE and 86.23% recognition rate based on EWPBTE. The proposed EWPBTE vector is 15 components compared to 39 components of MFCC. This makes it very promising features vector to be under research and in development phase. � 2016 IEEE.	en_US
dc.identifier.doi	https://doi.org/10.1109/ICEEOT.2016.7755165
dc.identifier.isbn	9.78E+12
dc.identifier.other	https://doi.org/10.1109/ICEEOT.2016.7755165
dc.identifier.uri	https://ieeexplore.ieee.org/document/7755165
dc.language.iso	English	en_US
dc.publisher	Institute of Electrical and Electronics Engineers Inc.	en_US
dc.relation.ispartofseries	International Conference on Electrical, Electronics, and Optimization Techniques, ICEEOT 2016
dc.subject	Accuracy	en_US
dc.subject	Components	en_US
dc.subject	Gaussian Mixture	en_US
dc.subject	Phone	en_US
dc.subject	Recognition Rate	en_US
dc.subject	Character recognition	en_US
dc.subject	Encoding (symbols)	en_US
dc.subject	Forestry	en_US
dc.subject	Hidden Markov models	en_US
dc.subject	Markov processes	en_US
dc.subject	MATLAB	en_US
dc.subject	Telephone sets	en_US
dc.subject	Trellis codes	en_US
dc.subject	Accuracy	en_US
dc.subject	Components	en_US
dc.subject	Development phase	en_US
dc.subject	Energy components	en_US
dc.subject	Gaussian mixtures	en_US
dc.subject	Mel-frequency cepstral coefficients	en_US
dc.subject	Phone	en_US
dc.subject	Phone recognition	en_US
dc.subject	Speech recognition	en_US
dc.title	Automatic speech annotation based on enhanced wavelet Packets Best Tree Encoding (EWPBTE) feature	en_US
dc.type	Conference Paper	en_US
dcterms.isReferencedBy	Zheng, J., Stolcke, A., Improved iscriminative training using phone lattices (2013) Proceedings of InterSpeech, pp. 215-222. , Lisbon, Portugal, September; Gody, A.M., Seoud, R.A.A., Hassan, M., Automatic speech annotation using HMM based on best tree encoding (BTE) feature (2013) The Eleventh Conference on Language Engineering, pp. 214-225. , Cairo, Egypt; Zweig, G., (1998) Speech Recognition with Dynamic Bayesian Networks, , PhD thesis, University of alifornia, Berkeley; Young, S., Odell, J.J., Woodland, P.C., Tree-based state tying for high accuracy acoustic modeling (2009) ARPA Human Language Technology Workshop, pp. 304-312; Demuynck, K., Laureys, T., A comparison of different approaches to automatic speech segmentation (2012) Lecture Notes in Computer Science, 2448, pp. 385-406. , Springer Berlin/Heidelberg; �aric, Z.M., Turajlic, S.R., A new approach to speech segmentation based on the maximum likelihood (2014) Journal of Circuits, Systems, and Signal Processing, Birkh�user Boston, 14 (5), pp. 615-663. , September; Chin-Teng, D.-J., Rui-Cheng, G.-D., Noisy speech segmentation/enhancement with multiband analysis and neural fuzzy networks (2012) Lecture Notes in Computer Science, 2275, pp. 81-94. , Springer Berlin/Heidelberg; Zhu, Q., Chen, Y., Morgan, N., On using MLP features in LVCSR (2012) Proceedings of ICSLP, pp. 324-336. , Jeju, Korea; Zweig, G., Padmanabhan, M., Boosting Gaussian mixtures in an LVCSR system (2011) Proceedings of ICASSP, pp. 216-230. , Istanbul; Gody, A.M., Voiced/unvoiced and silent classification using HMM classifier based on wavelet packets WPBTE features (2012) The 8th Conference on Language Engineering, pp. 564-573. , Cairo, Egypt; Zhang, B., Matsoukas, S., Schwartz, R., Discriminatively trained region dependent feature transforms for speech recognition (2012) Proceedings of ICASSP, pp. 340-352. , Toulouse; Gales, M.J.F., Young, S.J., The application of hidden Markov models in speech recognition (2011) Foundations and Trends in Signal Processing (3), pp. 195-204; Mporas, I., Ganchev, T., Fakotakis, N., Phonetic segmentation using multiple speech features (2011) International Journal of Speech Technology, 11 (2), pp. 73-85. , Springer Netherlands, June; Zen, H., Tokuda, K., Kitamura, T., A Viterbi algorithm for a trajectory model derived from HMM with explicit relationship between static and dynamic features (2013) Proceedings of ICASSP, pp. 315-330. , Montreal, Canada; Gody, A.M., Wavelet packets best tree 4-points encoded (WPBTE) features (2012) The 8th Conference on Language Engineering, pp. 345-356. , Cairo, Egypt; Yu, K., Gales, M.J.F., Bayesian adaptive inference and adaptive training (2009) IEEE Transactions Speech and Audio Processing, 15 (6), pp. 1932-1943. , August; Yu, K., Gales, M.J.F., Woodland, P.C., Unsupervised training with directed manual transcription for recognising Mandarin broadcast audio (2012) Proceedings of InterSpeech, pp. 132-145. , Antwerp; Yu, K., Gales, M.J.F., Discriminative cluster adaptive training (2012) IEEE Transactions on Speech and Audio Processing, 14 (5), pp. 1694-1703; Chen, Y., Wang, Q., A speaker based unsupervised speech segmentation algorithm used in conversational speech (2012) Lecture Notes in Computer Science, 4798, pp. 396-402. , Springer Berlin/Heidelberg; Young, S., Gales, M., Liu, X.A., Woodland, P., (2008) The HTK Book, , Version 3.41, Cambridge University Engineering Department
dcterms.source	Scopus

Files

Original bundle

Now showing 1 - 1 of 1

Name:: avatar_scholar_256.png
Size:: 6.31 KB
Format:: Portable Network Graphics
Description:

Download

Collections

Faculty Of Engineering Research Paper