The Use of MSVM and HMM for Sentence Alignment
Date
2012-06
Authors
Journal Title
Journal ISSN
Volume Title
Type
Article
Publisher
KOREA INFORMATION PROCESSING SOC
Series Info
JOURNAL OF INFORMATION PROCESSING SYSTEMS;Volume: 8 Issue: 2 Pages: 301-314
Scientific Journal Rankings
Abstract
—In this paper, two new approaches to align English-Arabic sentences in
bilingual parallel corpora based on the Multi-Class Support Vector Machine (MSVM) and
the Hidden Markov Model (HMM) classifiers are presented. A feature vector is extracted
from the text pair that is under consideration. This vector contains text features such as
length, punctuation score, and cognate score values. A set of manually prepared training
data was assigned to train the Multi-Class Support Vector Machine and Hidden Markov
Model. Another set of data was used for testing. The results of the MSVM and HMM
outperform the results of the length based approach. Moreover these new approaches
are valid for any language pairs and are quite flexible since the feature vector may
contain less, more, or different features, such as a lexical matching feature and Hanzi
characters in Japanese-Chinese texts, than the ones used in the current research
Description
Accession Number: WOS:000420351000006
Keywords
University for October University for Hidden Markov model, Multi-Class Support Vector Machine, Machine Translation, Parallel Corpora, English/ Arabic Parallel Corpus, Sentence Alignment