DRTX: A Duplicate Resolution Tool forXMLRepositories

dc.AffiliationOctober University for modern sciences and Arts (MSA)
dc.contributor.authorMohamed Abd El-ghfar, Randa
dc.contributor.authorHamed El-Bastawissy, Ali
dc.date.accessioned2020-02-03T07:06:14Z
dc.date.available2020-02-03T07:06:14Z
dc.date.issued2012
dc.descriptionMSA Google Scholaren_US
dc.description.abstractDetecting duplicates in XMLis not trivial due to structural diversityand object dependency. This paper suggests a duplicate detection and resolution tool (DRTX) which is an efficient XMLduplicates detector and resolution that applies two famous techniques of duplicates detection, normal edit distance (NED) and token based damerau-levenshtein distance algorithm (TBED) thencompare the results and suggests thebetter similarity for each of them. DRTX is not only a duplicate detection and resolution system but it also provides two extra services: -first the XMLfile merger which is used to merge XMLdocuments thus solves the structure heterogeneity problem, second dirty XMLgenerator which is used to insert known duplicate problems on clean XMLfile to apply the mentioned algorithms on that file therefore explore how much the system can detect accurately these problems.Tominimize the number of pair-wise element duplicates comparison, a set of filters were used to increase the efficiency of DRTX while its effectiveness is adjustable. Experimental results show that there is no algorithm better than the other but each of them hasits own use ie.NED is better to use at lower threshold similarity values while TBED is better at higher onesen_US
dc.description.sponsorshipIJCSNS International Journal of Computer Science and Network Securityen_US
dc.description.urihttps://www.scimagojr.com/journalsearch.php?q=21100985663&tip=sid&clean=0
dc.identifier.issn1738-7906
dc.identifier.urihttps://cutt.ly/BrI2KmJ
dc.language.isoenen_US
dc.publisherInternational Journal of Computer Science and Network Securityen_US
dc.relation.ispartofseriesInternational Journal of Computer Science and Network Security;VOL : 12 ISU : 7
dc.subjectOctober University for University for Duplicates detectionen_US
dc.subjectXMLen_US
dc.subjectsimilarityen_US
dc.subjectData cleaningen_US
dc.subjectefficiency and effectiveness of detection Algorithmsen_US
dc.titleDRTX: A Duplicate Resolution Tool forXMLRepositoriesen_US
dc.typeArticleen_US

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
avatar_scholar_256.png
Size:
6.31 KB
Format:
Portable Network Graphics
Description:
Faculty Of Computer Science Research Paper

License bundle

Now showing 1 - 1 of 1
No Thumbnail Available
Name:
license.txt
Size:
51 B
Format:
Item-specific license agreed upon to submission
Description: