7: Multidimensional Data - Analysis of Structure
“Data Mining”, “Big Data”, “Machine Learning”, “Pattern Recognition” phrases often mean all statistical methods, analytical and visual which help to understand the structure of data. Data might be of any kind, but it is usually multidimensional , which is best represented with the table of multiple columns a.k.a. variables (which might be of different types: measurement, ranked or categorical) and rows a.k.a objects. So more traditional name for these methods is “multivariate data analysis” or “multivariate statistics”.
Data mining is often based on the idea of classification , arrange objects into non-intersecting, frequently hierarchical groups. We use classification all the time (but sometimes do not realize it). We open the door and enter the room, the first thing is to recognize (classify) what is inside. Our brain has the outstanding power of classification, but computers and software are speedily advancing and becoming more brain-like. This is why data mining is related with artificial intelligence. There are even methods calling “neural networks”!
In this chapter, along with the other data, we will frequently use the embedded iris data taken from works of Ronald Fisher\(^{[1]}\). There are four characters measured on three species of irises (Figure \(\PageIndex{1}\)), and fifth column is the species name.
References
1. Fisher R.A. 1936. The use of multiple measurements in taxonomic problems. Annals of Eugenics. 7(2): 179–188.