An unsupervised linguistic‑based model for automatic glossary term extraction from a single PDF textbook

dc.AffiliationOctober university for modern sciences and Arts MSA
dc.contributor.authorSoliman, Ashraf 
dc.date.accessioned2023-05-10T06:41:25Z
dc.date.available2023-05-10T06:41:25Z
dc.date.issued2023-05
dc.description.abstractTerm extraction from textbooks is the cornerstone of many diferent intelligent natu- ral language processing systems, especially those that support learners and educators in the education system. This paper proposes a novel unsupervised domain-inde- pendent model that automatically extracts relevant and domain-related key terms from a single PDF textbook, without relying on a statistical technique or external knowledge base. It only relies on the basic linguistic techniques of the natural lan- guage processing: pattern recognition, sentence tokenization, part-of-speech tag- ging, and chunking. The model takes a PDF textbook as an input and produces a list of key terms as an output. Furthermore, the model proposes a novel classifcation of sentences from which the concept of defning sentences is proposed. The defning sentences are the main textual units that the model revolves around to identify the key terms. The architecture of the proposed work consists of 21 processes distrib- uted across three phases. The frst phase consists of fve processes for extracting text from a PDF textbook and cleaning it for the next phases. The second phase consists of eight processes for identifying the defning sentences and extracting them from all the textbook’s sentences. The last phase consists of eight processes for identify- ing and extracting the key terms from every defning sentence. The proposed work was evaluated by two experiments in which two PDF textbooks from diferent felds are used. The experimental evaluation showed that the results were promising.en_US
dc.description.urihttps://www.scimagojr.com/journalsearch.php?q=144955&tip=sid&clean=0
dc.identifier.doihttps://doi.org/10.1007/s10639-023-11818-1
dc.identifier.otherhttps://doi.org/10.1007/s10639-023-11818-1
dc.identifier.urihttp://repository.msa.edu.eg/xmlui/handle/123456789/5571
dc.language.isoen_USen_US
dc.relation.ispartofseriesEducation and Information Technologies;
dc.subjectTerm extraction ·en_US
dc.subjectLinguistic techniques ·en_US
dc.subjectNatural language processing ·en_US
dc.subjectUnsupervised machine learning ·en_US
dc.subjectPDF textbooks ·en_US
dc.subjectIntelligent Tutoringen_US
dc.titleAn unsupervised linguistic‑based model for automatic glossary term extraction from a single PDF textbooken_US
dc.typeArticleen_US

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
s10639-023-11818-1.pdf
Size:
1.59 MB
Format:
Adobe Portable Document Format
Description:

License bundle

Now showing 1 - 1 of 1
No Thumbnail Available
Name:
license.txt
Size:
51 B
Format:
Item-specific license agreed upon to submission
Description: