DRTX: A Duplicate Resolution Tool forXMLRepositories
dc.Affiliation | October University for modern sciences and Arts (MSA) | |
dc.contributor.author | Mohamed Abd El-ghfar, Randa | |
dc.contributor.author | Hamed El-Bastawissy, Ali | |
dc.date.accessioned | 2020-02-03T07:06:14Z | |
dc.date.available | 2020-02-03T07:06:14Z | |
dc.date.issued | 2012 | |
dc.description | MSA Google Scholar | en_US |
dc.description.abstract | Detecting duplicates in XMLis not trivial due to structural diversityand object dependency. This paper suggests a duplicate detection and resolution tool (DRTX) which is an efficient XMLduplicates detector and resolution that applies two famous techniques of duplicates detection, normal edit distance (NED) and token based damerau-levenshtein distance algorithm (TBED) thencompare the results and suggests thebetter similarity for each of them. DRTX is not only a duplicate detection and resolution system but it also provides two extra services: -first the XMLfile merger which is used to merge XMLdocuments thus solves the structure heterogeneity problem, second dirty XMLgenerator which is used to insert known duplicate problems on clean XMLfile to apply the mentioned algorithms on that file therefore explore how much the system can detect accurately these problems.Tominimize the number of pair-wise element duplicates comparison, a set of filters were used to increase the efficiency of DRTX while its effectiveness is adjustable. Experimental results show that there is no algorithm better than the other but each of them hasits own use ie.NED is better to use at lower threshold similarity values while TBED is better at higher ones | en_US |
dc.description.sponsorship | IJCSNS International Journal of Computer Science and Network Security | en_US |
dc.description.uri | https://www.scimagojr.com/journalsearch.php?q=21100985663&tip=sid&clean=0 | |
dc.identifier.issn | 1738-7906 | |
dc.identifier.uri | https://cutt.ly/BrI2KmJ | |
dc.language.iso | en | en_US |
dc.publisher | International Journal of Computer Science and Network Security | en_US |
dc.relation.ispartofseries | International Journal of Computer Science and Network Security;VOL : 12 ISU : 7 | |
dc.subject | October University for University for Duplicates detection | en_US |
dc.subject | XML | en_US |
dc.subject | similarity | en_US |
dc.subject | Data cleaning | en_US |
dc.subject | efficiency and effectiveness of detection Algorithms | en_US |
dc.title | DRTX: A Duplicate Resolution Tool forXMLRepositories | en_US |
dc.type | Article | en_US |
Files
Original bundle
1 - 1 of 1
Loading...
- Name:
- avatar_scholar_256.png
- Size:
- 6.31 KB
- Format:
- Portable Network Graphics
- Description:
- Faculty Of Computer Science Research Paper
License bundle
1 - 1 of 1
No Thumbnail Available
- Name:
- license.txt
- Size:
- 51 B
- Format:
- Item-specific license agreed upon to submission
- Description: