Incorporating Connectivity in k-Nearest Neighbors Regression
Loading...
Date
2023-07
Authors
Journal Title
Journal ISSN
Volume Title
Type
Article
Publisher
Series Info
1st International Conference of Intelligent Methods, Systems and Applications, IMSA 2023;Pages 551 - 5562023
Scientific Journal Rankings
Abstract
The standard k-nearest neighbors' approach to regression analysis encounters a number of problems when used on datasets with various density distributions. This paper proposes a kNNR-relative ensemble regressor based on connectivity. In each cross-validation round, the pipeline starts by clustering the input data using any partitioning algorithm. Then, a random sample of the edges is selected from each partition favoring the edges with small distances. After that, the selected edges are transformed into a dataset in which feature values represent the amount of increase or decrease in each dimension compared to the source node's values and the label of each feature vector is the difference in the label of the source and the destination. Then a regressor is built for each cluster based on the output of the transformer. In order to predict a label for an unseen object, the nearest centroid is identified, and, k-nearest neighbors from the corresponding cluster are identified as the source nodes. Then, a vector representing the difference between the unseen object and each source node is computed and fed to the regressor model of the corresponding cluster, the output label is the predicted difference so it is added to the label of the source node. The diversity between the suggested decision model and the traditional kNN regressor, termed kNNR motivates us to include the kNNR in the suggested ensemble. The k-nearest neighbors of kNNR are also selected from the nearest cluster. The weighted average of the predicted labels offered by the base models serves as the final output label. The sample size, the number of neighbors to be used, and the number of clusters can all be fine-tuned via cross validation. The ensemble is evaluated, and the results showed that the ensemble achieved a significant increase in effectiveness compared to its base regressors and several related algorithms.
Description
Keywords
clustering; Nearest Neighbors; Regression