How to define machine learning to cover clustering, classification, and regression? What unites these problems?
1 Answers
I report three definitions of machine learning (ML) and I also explain that ML can be divided into multiple sub-tasks or sub-categories in this answer. However, it may not always be clear why classification, regression, or clustering can be considered machine learning tasks or can be solved with ML algorithms/programs, so let me explain why these tasks can be solved with ML, based on Tom Mitchell's definition of an ML algorithm that I report below for completeness.
A computer program is said to learn from experience $E$ with respect to some class of tasks $T$ and performance measure $P$, if its performance at tasks in $T$, as measured by $P$, improves with experience $E$.
So, according to this definition, for a computer program to be a "machine learner", we need to identify $E$, $T$, and $P$, and show that the computer program improves with $E$ at performing the task(s) $T$, according to $P$.
For the case of classification or regression (these are the tasks, which is just a synonym for problems), let's suppose that we are given the training labelled dataset $D = \{(x_1, y_i), \dots, (x_N, y_N) \}$ of $N$ tuples $(x_i, y_i)$, where $x_i$ are the inputs to the program (or model) and $y_i$ (the label, aka class, hence the name classification) is the output that the program should produce. In the case of classification, $y_i$ is an element of a discrete set, e.g. $\{0, 1\}$ in the case of binary classification, while, in the case of regression, $y_i \in \mathbb{R}$ is a real number.
So, in this case, the experience $E$ is $D$ (the training labeled dataset). The task $T$ is classification or regression, depending on whether $y_i$ belongs to a discrete or continuous space. The performance measure $P$ can be e.g. the cross-entropy (which is typically used to solve the classification task) or the mean squared error (which is typically used to solve the regression task). (I will not recall the definitions of these performance measures here: you can take a book on ML for the details).
So, we have identified $E$, $T$ and $P$. Now, we also need to argue why a program's performance measured by $P$ would improve with $E$. Let's suppose that you are using a neural network trained with gradient descent to solve these tasks. After one epoch/iteration of gradient descent, the loss will be smaller, so the performance of the program will be higher. So, this would indeed be a machine learner, according to Mitchell's definition of ML.
In the case of clustering (the task $T$), the only difference is that the dataset $D$ (i.e. the experience $E$) is unlabelled, i.e. we do not have labels, but we are given only the inputs, and the goal, as you know, is to group these inputs based on their similarity, according to some notion of similarity, which is your performance measure $P$.
The other definitions of ML that I report in the other answer are also consistent with Mitchell's definition. More precisely, most of these definitions are based on the idea that an ML algorithm is an algorithm that "finds patterns in data". To solve the classification, regression and clustering tasks/problems, an ML algorithm/program needs to find patterns in the data (either explicitly, like in the case of clustering, or indirectly, like in the case of classification), in order for the program's performance to improve.
Moreover, given that you brought this up in the now-deleted comments, I don't think that the definition of machine learning necessarily implies prediction (i.e. the use of a model to forecast something about future data), but, yes, ML is often used for prediction.
It's also important to note that, when someone says "machine learning" without further details or information (like you did in your post/question), one may refer to
- the field/area of study that studies and applies ML algorithms, or
- an ML algorithm (and sometimes model).
Mitchell's definition is the definition of what it means for an algorithm/program to be called an ML algorithm/program. However, my definition of ML as a field immediately follows from the definition of an ML algorithm.
Finally, we could say that a problem that can be formulated in such a way that an ML algorithm can be applied is an ML problem. Note that Mitchell's definition does not mention "ML problem" anywhere, but just "problem", so this is not a circular definition. So, we could say that classification, regression, and clustering are "ML problems" because they can be solved with ML algorithms. More generally, all these three problems can be thought of as "function approximation" problems, i.e., in all these cases, the solutions are functions.
 
    
    - 42,615
- 12
- 119
- 217
