This is a 'reversal' of the usual clustering approach. Normally you cluster objects, and you use their features to define similarity (as proximity in 'feature-space'). So you start off with a set of objects, and you end up with k groups of similar objects, for a specific value of k.
In ML you might want to train a classifier based on features, but not all features are equally discriminant. So you turn the process on its head, and you transpose the object/feature matrix, so that you're now looking at features and which objects they occur in, rather than objects and the features they possess.
You can now cluster this matrix, and you group features together by how similar they are. You don't really want features which are shared by too many objects, as they have less value in distinguishing between objects. You decide how many features you want (your value k), and then start clustering. You end up with k clusters of features which are similar to each other within the same cluster.
You can now choose either the centroid, or an amalgamation of each of the cluster's elements, and you end up with k features that are of good quality. Essentially you have reduced the dimensionality of your feature space, which makes it easier to identify patterns using ML.
This is similar to using PCA for reducing dimensionality, but a bit more flexible I guess, since you have more choice about what clustering algorithm to use.