METHODS OF METRIC AND NON-METRIC MULTIDIMENSIONAL SCALING FOR ARBITRARY DATA MATRICES IN R
DOI: 10.31673/2412-4338.2024.039411
Abstract
This article explores two approaches to multidimensional scaling (MDS)—metric and non-metric—in the context of their implementation in the R programming language for analyzing arbitrary data matrices. Multidimensional scaling is a powerful tool for visualizing and interpreting complex multidimensional data. Given the rapid growth in information volumes, efficient methods for analyzing big data have become increasingly relevant. The aim of this study is to compare the effectiveness and accuracy of metric and non-metric MDS methods, identify their advantages and disadvantages, and provide practical recommendations for their use in solving various tasks across multiple fields, such as sociology, marketing, political science, psychology, and more.
The article provides an overview of the theoretical foundations of multidimensional scaling, describes the algorithms underlying MDS implementation in R, and analyzes the specifics of applying each method. The metric MDS approach is based on the assumption of a linear relationship between distances in the input data and the scaling results, allowing for precise results when working with structured data. The non-metric approach, on the other hand, is more flexible and can handle more abstract data, particularly in cases where the distances between objects are not easily quantifiable.
To evaluate the effectiveness of both methods, examples using real and simulated data were developed, demonstrating how each approach behaves in different situations. Metric scaling performed better when working with data that adheres to linear assumptions, while non-metric MDS proved to be more adaptive for data with non-linear relationships. Experimental results show that both approaches can be useful for data with different structures and sizes, but the choice of method depends on the specific requirements of the analysis.
A significant outcome of this study is the development of a set of recommendations for choosing an MDS method depending on the type of data being analyzed. For example, for well-structured data, such as geographic or demographic data, the metric approach is preferable, whereas for more complex and unstructured data, as in psychological studies, non-metric scaling is more suitable.
The article also provides R code examples for implementing both approaches, which can be used for further research and practical work with multidimensional scaling. The presented examples demonstrate how both methods can be integrated into the data analysis process to uncover hidden patterns and build visual models based on multidimensional data. Additionally, the authors highlight prospects for future research in applying multidimensional scaling to big data analysis and developing new methodologies for processing heterogeneous information arrays.
Thus, this article makes a significant contribution to the development of modern approaches to data analysis in programming languages and opens new perspectives for the application of multidimensional scaling in various scientific and business fields. The proposed recommendations for choosing a scaling method can be useful for researchers and practitioners working with large data volumes and aiming to employ the latest methods for their analysis.
Keywords: multidimensional scaling, metric MDS, non-metric MDS, data analysis, R, big data, visualization, Euclidean metric, statistical processing, Big Data.
References
1. Borg, I., Groenen, P. J. F. "Modern Multidimensional Scaling: Theory and Applications." Springer, 2005
2. Cox, T. F., Cox, M. A. A. "Multidimensional Scaling." Chapman and Hall/CRC, 2001
3. Kruskal, J. B. "Multidimensional scaling by optimizing goodness of fit to a nonmetric hypothesis." Psychometrika, 1964
4. Torgerson, W. S. "Multidimensional scaling: I. Theory and method." Psychometrika, 1952
5. Venables, W. N., Ripley, B. D. "Modern Applied Statistics with S." Springer, 2002
6. Official documentation of the MASS package in R: https://cran.r-project.org/web/packages/MASS/index.html
7. Official documentation of the smacof package in R: https://cran.r-project.org/web/packages/smacof/index.html
8. Articles on the R-bloggers website about multidimensional scaling: https://www.r-bloggers.com/
9. Zhebka V., Skladannyi P., Bazak Y., Bondarchuk A., Storchak K. Methods for Predicting Failures in a Smart Home / CEUR Workshop Proceedings, 2024, 3665, p. 70–78
10. Malinov V., Zhebka V., Kokhan I., Storchak K., Dovzhenko T. Cryptocurrency as a Tool for Attracting Investment and Ensuring the Strategic Development of the Bioenergy Potential of Processing Enterprises in Ukraine / Lecture Notes on Data Engineering and Communications Technologies, 2024, 195, p. 387–405