How do you deal with non-Euclidean or nonlinear data dissimilarity in multidimensional scaling?
Multidimensional scaling (MDS) is a data visualization technique that aims to represent the similarities or distances between objects in a low-dimensional space, such as a two-dimensional plot. However, not all data dissimilarities are linear or Euclidean, meaning that they do not follow the rules of geometry or arithmetic. For example, some data may have nonlinear relationships, such as exponential or logarithmic functions, or some data may have non-Euclidean distances, such as geodesic or angular distances. How do you deal with these kinds of data dissimilarities in MDS? Here are some tips and tricks to help you.
The first step is to choose a dissimilarity measure that reflects the nature of your data and your research question. There are many different types of dissimilarity measures, such as correlation, cosine, city-block, or Hamming. Some of them are linear, meaning that they are proportional to the differences between the data values, while others are nonlinear, meaning that they are affected by the magnitude or scale of the data values. Some of them are Euclidean, meaning that they follow the Pythagorean theorem, while others are non-Euclidean, meaning that they account for the curvature or shape of the data space. You should select a dissimilarity measure that matches the characteristics and assumptions of your data.
The second step is to apply a transformation or normalization to your data or dissimilarities, if needed. This can help to reduce the effects of outliers, skewness, or heterogeneity in your data, and make the dissimilarities more suitable for MDS. For example, you can apply a logarithmic or power transformation to your data values, if they have a nonlinear relationship or a wide range of scales. You can also apply a standardization or normalization to your dissimilarities, if they have different units or ranges. This can help to make the dissimilarities more comparable and consistent.
The third step is to choose an appropriate MDS method that can handle your data dissimilarities. There are two main types of MDS methods: metric and nonmetric. Metric MDS methods assume that the dissimilarities are linear and Euclidean, and they try to preserve the exact values of the dissimilarities in the low-dimensional space. Nonmetric MDS methods do not assume that the dissimilarities are linear or Euclidean, and they only try to preserve the rank order of the dissimilarities in the low-dimensional space. Nonmetric MDS methods are more flexible and robust, and they can handle nonlinear or non-Euclidean dissimilarities better than metric MDS methods.
The fourth step is to evaluate the quality and interpretation of the MDS solution. You can use various criteria and techniques to assess how well the MDS solution represents the data dissimilarities, and how meaningful and informative the MDS plot is. For example, you can use the stress value, which measures the discrepancy between the dissimilarities and the distances in the low-dimensional space, to judge the goodness of fit of the MDS solution. You can also use the scree plot, which shows the variance explained by each dimension, to decide how many dimensions to retain in the MDS solution. You can also use the biplot, which shows the loadings of the variables on the dimensions, to interpret the meaning and direction of the dimensions.
The fifth step is to explore the MDS plot and its features, and see what insights and patterns you can discover from it. You can use various tools and methods to enhance and enrich the MDS plot, and make it more interactive and informative. For example, you can use color, shape, size, or labels to distinguish different groups or categories of objects in the MDS plot. You can also use clustering, overlaying, or brushing to identify and highlight different clusters or regions of interest in the MDS plot. You can also use zooming, rotating, or animating to view the MDS plot from different perspectives and angles.
The sixth step is to compare and contrast different MDS solutions, and see how they differ or agree with each other. You can use various criteria and techniques to compare and contrast different MDS solutions, and see how they reflect different aspects or perspectives of the data dissimilarities. For example, you can use the Procrustes analysis, which measures the similarity between two MDS solutions by rotating and scaling them, to see how consistent or robust the MDS solutions are. You can also use the Shepard diagram, which plots the dissimilarities against the distances in the low-dimensional space, to see how linear or nonlinear the relationship between the dissimilarities and the distances is.
Rate this article
More relevant reading
-
StatisticsWhat are some effective ways to use MDS for comparing different groups of data?
-
Data AnalyticsHow can you use feature hashing to deal with high-dimensional text data?
-
AlgorithmsHow can you scale up your algorithm for complex data?
-
AlgorithmsWhat are the best sorting algorithms for different data types?