Automatic speech annotation based on enhanced wavelet Packets Best Tree Encoding (EWPBTE) feature
dc.Affiliation | October University for modern sciences and Arts (MSA) | |
dc.contributor.author | Mohamed M.H. | |
dc.contributor.author | Hassan A.M.A. | |
dc.contributor.author | Hassan N.M.H. | |
dc.contributor.other | Department of Electronics and Communications Engineering | |
dc.contributor.other | October University for Modern Sciences and Arts | |
dc.contributor.other | 6 October City | |
dc.contributor.other | Egypt; Faculty of Engineering-Fayoum University | |
dc.contributor.other | Egypt | |
dc.date.accessioned | 2020-01-09T20:41:31Z | |
dc.date.available | 2020-01-09T20:41:31Z | |
dc.date.issued | 2016 | |
dc.description | Scopus | |
dc.description.abstract | This paper aimed at introducing a completely automated Arabic phone recognition system based on Enhanced Wavelet Packets Best Tree Encoding (EWPBTE) 15-point speech feature. The process of enhancing of WPBTE is provided by adding energy component to WPBTE, which is implemented in Matlab software and makes an enhancement of 65 % to recognizer accuracy which is the most contribution in this paper. EWPBTE is used to find phoneme boundaries along speech utterance. Hidden Markov Model (HMM) and Gaussian Mixtures are used for building the statistical models through this research. HMM Tool Kit (HTK) software is utilized for implementation of the model. The System can identify spoken phone at 57.01% recognition rate based on Mel Frequency Cepstral Coefficients (MFCC), 21.07% recognition rate based on WPBTE and 86.23% recognition rate based on EWPBTE. The proposed EWPBTE vector is 15 components compared to 39 components of MFCC. This makes it very promising features vector to be under research and in development phase. � 2016 IEEE. | en_US |
dc.identifier.doi | https://doi.org/10.1109/ICEEOT.2016.7755165 | |
dc.identifier.isbn | 9.78E+12 | |
dc.identifier.other | https://doi.org/10.1109/ICEEOT.2016.7755165 | |
dc.identifier.uri | https://ieeexplore.ieee.org/document/7755165 | |
dc.language.iso | English | en_US |
dc.publisher | Institute of Electrical and Electronics Engineers Inc. | en_US |
dc.relation.ispartofseries | International Conference on Electrical, Electronics, and Optimization Techniques, ICEEOT 2016 | |
dc.subject | Accuracy | en_US |
dc.subject | Components | en_US |
dc.subject | Gaussian Mixture | en_US |
dc.subject | Phone | en_US |
dc.subject | Recognition Rate | en_US |
dc.subject | Character recognition | en_US |
dc.subject | Encoding (symbols) | en_US |
dc.subject | Forestry | en_US |
dc.subject | Hidden Markov models | en_US |
dc.subject | Markov processes | en_US |
dc.subject | MATLAB | en_US |
dc.subject | Telephone sets | en_US |
dc.subject | Trellis codes | en_US |
dc.subject | Accuracy | en_US |
dc.subject | Components | en_US |
dc.subject | Development phase | en_US |
dc.subject | Energy components | en_US |
dc.subject | Gaussian mixtures | en_US |
dc.subject | Mel-frequency cepstral coefficients | en_US |
dc.subject | Phone | en_US |
dc.subject | Phone recognition | en_US |
dc.subject | Speech recognition | en_US |
dc.title | Automatic speech annotation based on enhanced wavelet Packets Best Tree Encoding (EWPBTE) feature | en_US |
dc.type | Conference Paper | en_US |
dcterms.isReferencedBy | Zheng, J., Stolcke, A., Improved iscriminative training using phone lattices (2013) Proceedings of InterSpeech, pp. 215-222. , Lisbon, Portugal, September; Gody, A.M., Seoud, R.A.A., Hassan, M., Automatic speech annotation using HMM based on best tree encoding (BTE) feature (2013) The Eleventh Conference on Language Engineering, pp. 214-225. , Cairo, Egypt; Zweig, G., (1998) Speech Recognition with Dynamic Bayesian Networks, , PhD thesis, University of alifornia, Berkeley; Young, S., Odell, J.J., Woodland, P.C., Tree-based state tying for high accuracy acoustic modeling (2009) ARPA Human Language Technology Workshop, pp. 304-312; Demuynck, K., Laureys, T., A comparison of different approaches to automatic speech segmentation (2012) Lecture Notes in Computer Science, 2448, pp. 385-406. , Springer Berlin/Heidelberg; �aric, Z.M., Turajlic, S.R., A new approach to speech segmentation based on the maximum likelihood (2014) Journal of Circuits, Systems, and Signal Processing, Birkh�user Boston, 14 (5), pp. 615-663. , September; Chin-Teng, D.-J., Rui-Cheng, G.-D., Noisy speech segmentation/enhancement with multiband analysis and neural fuzzy networks (2012) Lecture Notes in Computer Science, 2275, pp. 81-94. , Springer Berlin/Heidelberg; Zhu, Q., Chen, Y., Morgan, N., On using MLP features in LVCSR (2012) Proceedings of ICSLP, pp. 324-336. , Jeju, Korea; Zweig, G., Padmanabhan, M., Boosting Gaussian mixtures in an LVCSR system (2011) Proceedings of ICASSP, pp. 216-230. , Istanbul; Gody, A.M., Voiced/unvoiced and silent classification using HMM classifier based on wavelet packets WPBTE features (2012) The 8th Conference on Language Engineering, pp. 564-573. , Cairo, Egypt; Zhang, B., Matsoukas, S., Schwartz, R., Discriminatively trained region dependent feature transforms for speech recognition (2012) Proceedings of ICASSP, pp. 340-352. , Toulouse; Gales, M.J.F., Young, S.J., The application of hidden Markov models in speech recognition (2011) Foundations and Trends in Signal Processing (3), pp. 195-204; Mporas, I., Ganchev, T., Fakotakis, N., Phonetic segmentation using multiple speech features (2011) International Journal of Speech Technology, 11 (2), pp. 73-85. , Springer Netherlands, June; Zen, H., Tokuda, K., Kitamura, T., A Viterbi algorithm for a trajectory model derived from HMM with explicit relationship between static and dynamic features (2013) Proceedings of ICASSP, pp. 315-330. , Montreal, Canada; Gody, A.M., Wavelet packets best tree 4-points encoded (WPBTE) features (2012) The 8th Conference on Language Engineering, pp. 345-356. , Cairo, Egypt; Yu, K., Gales, M.J.F., Bayesian adaptive inference and adaptive training (2009) IEEE Transactions Speech and Audio Processing, 15 (6), pp. 1932-1943. , August; Yu, K., Gales, M.J.F., Woodland, P.C., Unsupervised training with directed manual transcription for recognising Mandarin broadcast audio (2012) Proceedings of InterSpeech, pp. 132-145. , Antwerp; Yu, K., Gales, M.J.F., Discriminative cluster adaptive training (2012) IEEE Transactions on Speech and Audio Processing, 14 (5), pp. 1694-1703; Chen, Y., Wang, Q., A speaker based unsupervised speech segmentation algorithm used in conversational speech (2012) Lecture Notes in Computer Science, 4798, pp. 396-402. , Springer Berlin/Heidelberg; Young, S., Gales, M., Liu, X.A., Woodland, P., (2008) The HTK Book, , Version 3.41, Cambridge University Engineering Department | |
dcterms.source | Scopus |
Files
Original bundle
1 - 1 of 1
Loading...
- Name:
- avatar_scholar_256.png
- Size:
- 6.31 KB
- Format:
- Portable Network Graphics
- Description: