Browsing by Author "El-Bastawissy, Ali"

Now showing 1 - 10 of 10

AEDA: Arabic edit distance algorithm Towards a new approach for Arabic name matching
(IEEE, 2011) H Abdel Ghafour, Hesham; El-Bastawissy, Ali; A Heggazy, Abdel Fattah
String matching algorithms play a vital & crucial role in many applications such as search engines, object identification, hand written recognition, name searching in large databases, data cleansing, and automatic spell checking. Many algorithms have been developed to measure string similarity but most of them designed mainly to handle Latin-based languages. In this paper, we propose a new algorithm for Arabic string matching which takes into consideration the unique features of the Arabic language and the different similarity levels of the Arabic letters such as phonetic similarity and character form similarity in addition to keyboard distance.
Comparative Study of Record Linkage Approaches for Big Data
(Walailak University, 2020-06) Abd El-Ghafar, Randa M; El-Bastawissy, Ali; Nasr, Eman; Gheith, Mervat H.
Record linkage is a challenging task for Big Data. This paper presents a comparative study of record linkage approaches for Big Data. We compare based on three dimensions; record linkage phases, dataset properties, and parallel processing approach for Big Data. As far as we know, current state of art only conducts comparative studies between record linkage approaches. We only found one comparative study covers the whole record linkage framework of the relational database. Our focus on the dimensions of the parallel processing approaches for Big Data and dataset properties are novel. Our research revealed five findings. First, data exploration is almost a non-existing phase despite its importance of exploring the dataset being examined. Second, techniques that handle data standardization and preparation phase of the first dimension are not extensively covered in the literature which can directly affect the results’ quality. Third, record linkage in unstructured data is not yet explored in literature. Fourth, MapReduce has been used in about 50% of the selected studies to handle the parallel processing of Big Data, but due to its limitations, more recent and efficient approaches have been used. These approaches include Apache Spark and Apache Flink. Apache Spark is just recently adapted to resolve duplicates due to its supporting of in-memory computation, which makes the whole linkage process more efficient. Although the comparative study, includes many recent studies supporting Apache Spark, it is not yet well explored in literature, as more researches need to be conducted. In addition, Apache Flink is still rarely used to solve the record linkage problem of Big Data. Fifth, pruning techniques which used to eliminate unnecessary comparisons, are not adequately applied in the covered studies despite their effect on reducing the search space which results in a more effective Record Linkage process.
Directional Skyline Queries
(SPRINGER, 2012) El-Dawy, Eman; MO Mokhtar, Hoda; El-Bastawissy, Ali
Continuous monitoring of queries over moving objects has become an important topic as it supports a wide range of useful mobile applications. A continuous skyline query involves both static and dynamic dimensions. In the dynamic dimension, the data object not only has a distance from the query object, but it also has a direction with respect to the query object motion. In this paper, we propose a direction-oriented continuous skyline query algorithm to compute the skyline objects with respect to the current position of the user. The goal of the proposed algorithm is to help the user to retrieve the best objects that satisfy his/her constraints and fall either in any direction around the query object, or is aligned along the object’s direction of motion. We also create a pre-computed skyline data set that facilitates skyline update, and enhances query running time and performance. Finally, we present experimental results to demonstrate the performance and efficiency of our proposed algorithms
A Flexible Tool for Web Service Selection in Service Oriented Architecture
(International Journal of Advanced Computer Science and Applications, 2011) El-Bastawissy, Ali; Nagy, Walaa; M. O. Mokhtar, Hoda
Web Services are emerging technologies that enable application to application communication and reuse of services over Web. Semantic Web improves the quality of existing tasks, including Web services discovery, invocation, composition, monitoring, and recovery through describing Web services capabilities and content in a computer interpretable language. To provide most of the requested Web services, a Web service matchmaker is usually required. Web service matchmaking is the process of finding an appropriate provider for a requester through a middle agent. To provide the right service for the right user request, Quality of service (QoS)-based Web service selection is widely used. Employing QoS in Web service selection helps to satisfy user requirements through discovering the best service(s) in terms of the required QoS. Inspired by the mode of the Internet Web search engine, like Yahoo, Google, in this paper we provide a QoS-based service selection algorithm that is able to identify the best candidate semantic Web service(s) given the description of the requested service(s) and QoS criteria of user requirements. In addition, our proposed approach proposes a ranking method for those services. We also show how we employ data warehousing techniques to model the service selection problem. The proposed algorithm integrates traditional match making mechanism with data warehousing techniques. This integration of methodologies enables us to employ the historical preference of the user to provide better selection in future searches. The main result of the paper is a generic framework that is implemented to demonstrate the feasibility of the proposed algorithm for QoS-based Web application. Our presented experimental results show that the algorithm indeed performs well and increases the system reliability.
ICCPN: Interval-based Conditional Colored Petri Net
(IEEE, 2010) MA Helal, Iman; El-Bastawissy, Ali; Hegazy, Osman
Nowadays, rules make part of any software system including real-time applications and games, meanwhile an event can trigger many different rules according to the conditions controlling these rules. Although rules are core part to many kinds of systems, its maintenance and update are not easy without affecting the whole application. Hence, many systems have presented rules as a separate layer from the application; such as: SAMOS, Sentinel, Snoop, SnoopIB and CCPN. CCPN is a model that was used in an Amplified CDBB-500 architecture; which is a system supporting active database within its architecture. In this paper, we propose some extensions on CCPN to be able to present rules as a separate layer from the application, to support time-based events, and to add other important features which were agreed and implemented in other systems such as: Snoop and SnoopIB
Inconsistency Resolution In The Virtual Database Environment Using Fuzzy Logic
(Mohamed, 2016) Abdelrahman, Amr; El-Bastawissy, Ali; Kholief, Mohamed
Data integration from different data sources may result in data inconsistencies due to different representation of the same objects at the data source. Many researchers have tried to solve this problem manually or using source features. None of them took the user’s preferences to source features into account. This paper proposes using fuzzy logic with multiple constraints, in accordance with user preference, to resolve inconsistencies. This approach uses token-based cleaner, a content based inconsistency detection algorithm, to detect inconsistencies. Then, uses fuzzy logic to resolve inconsistencies. An experiment was conducted using our fuzzy algorithm on a trained dataset that reflects our designated point of view. The result indicates that multiple constraints decision making is a suitable technique for resolving inconsistencies.
Lake Data Warehouse Architecture for Big Data Solutions
(SAI, 2020) Saddad, Emad; Mokhtar, Hoda M. O.; El-Bastawissy, Ali; Hazman, Maryam
Traditional Data Warehouse is a multidimensional repository. It is nonvolatile, subject-oriented, integrated, time- variant, and non-operational data. It is gathered from multiple heterogeneous data sources. We need to adapt traditional Data Warehouse architecture to deal with the new challenges imposed by the abundance of data and the current big data characteristics, containing volume, value, variety, validity, volatility, visualization, variability, and venue. The new architecture also needs to handle existing drawbacks, including availability, scalability, and consequently query performance. This paper introduces a novel Data Warehouse architecture, named Lake Data Warehouse Architecture, to provide the traditional Data Warehouse with the capabilities to overcome the challenges. Lake Data Warehouse Architecture depends on merging the traditional Data Warehouse architecture with big data technologies, like the Hadoop framework and Apache Spark. It provides a hybrid solution in a complementary way. The main advantage of the proposed architecture is that it integrates the current features in traditional Data Warehouses and big data features acquired through integrating the traditional Data Warehouse with Hadoop and Spark ecosystems. Furthermore, it is tailored to handle a tremendous volume of data while maintaining availability, reliability, and scalability.
Multi-level continuous skyline queries (MCSQ)
(IEEE, 2011) El-Dawy, Eman; MO Mokhtar, Hoda; El-Bastawissy, Ali
Most of the current work on skyline queries mainly dealt with querying static query points over static data sets. With the advances in wireless communication, mobile computing, and positioning technologies, it has become possible to obtain and manage (model, index, query, etc.) the trajectories of moving objects in real life, and consequently the need for continuous skyline query processing has become more and more pressing. In this paper, we address the problem of efficiently maintaining continuous skyline queries which contain both static and dynamic attributes. We present a Multi-level Continuous Skyline Query (MCSQ) algorithm, which basically creates a pre-computed skyline data set, facilitates skyline update, and enhances query running time and performance. Our algorithm in brief proceeds as follows: First, we distinguish the data points that are permanently in the skyline and use them to derive a search bound. Second, we establish a pre-computed data set for dynamic skyline that depends on the number of skyline levels (M) which is later used to update the first (initial) skyline points. Finally, every time the skyline needs to be updated we use the pre-computed data sets of skyline to update the previous skyline set and consequently updating first skyline. Finally, we present experimental results to demonstrate the performance and efficiency of our algorithm.
Perspectives of an Enterprise Integration Plug-in System Architecture for Networked Manufacturing Systems
(EOS ASSOC, 2019-04) Hamed, Mohamed; El-Bastawissy, Ali; Mansour, Hesham; Ghannam, Adel N
This paper is a part of a research project funded by the Science and Technology Development Fund (STDF), to create a product serving the digital transformation of Egyptian manufacturing companies. It allows a manufacturing company to be a member of a distributed manufacturing network. The resulting system can be plugged into any ERP system. In this work, the limitation of a centralized integration entity to satisfy loosely coupling of distributed systems is overcome. The SOA framework and the remote method invocation (RMI) are applied using SOAP-XML technology. Enterprise integration patterns (EIP) were used in the architecture design.
Towards analternativeData Warehouses Architecture
(Trends in Innovative Computing, 2014) Saddad, Emad; El-Bastawissy, Ali; Hegazy, Osman; Hazman, Maryam
Data warehouses (DWs)are centralized data repositories that integrate data from various transactional, legacy, or external systems, applications, and sources. DW provides an environment separate from the operational systems and is completely designed for decision-support, analytical-reporting, ad-hoc queries, and data mining. Recently, the structure and the volume of data stored on computer systems are growing at an accelerated rate.In current DWs architectures based on n-ary-Relational DBMSs, DWs are increasing their data volume; high disk space consumption; slow query response time, and complex database administration are common problems in these environments. Furthermore, there are a number of factors making developing and maintaininga data warehouse system a painful process such as: setting up a data warehouse can takea long time, over-provisioningcan lead to high costs, organizations may lack the expertise needed to set up and maintain a data warehouse, and system crashes and downtime or system overload can have numerous consequences for an organization. Also, DWs depend on static number of external data sources that may be incomplete, do not use the same definitions, and not always available.The lack of a proper data model and an adequate architecture specifically targeted towards these environments are the root causes of these all problems. So, this paper try to explain why we need an alternative DWs architecture that takes all benefits of existing traditional DWs architecture and solving its problems, dealing with modern environment such as cloud computing, handling in efficient manner the next generation databases (NoSQL) mostly addressing some of the points, and dealing with web applications scalability (such as: big data)