Parametric models with vectors representing the knowledge of input features are not unique in machine learning (ML) at all, they've been used in traditional statistical inference and regression models as well for a long time such as the standard linear and logistic regression models. On the other hand, many other popular ML models are non-parametric such as decision trees, K-NNs, SVMs, and Gaussian Processes, etc, and all their input features knowledge are represented as potentially high-dimensional vectors.
Each knowledge representation formalism has their unique advantage and weakness, though recently some people argue that in theory they may be ultimately isomorphic as discussed in a recent interesting post. In general there're two main knowledge representation schools known as symbolic computationalism and distributed connectionism, and they've been around for many years and not seem a hot topic in contemporary ML or DL field, though there's a hot ML field called representation learning which is not about any target knowledge but input latent features' end-to-end representation and transformation for downstream tasks instead of the traditional manual feature engineering. Comparatively speaking, the connectionist neural network parametric distributed knowledge representation is extremely implicit, fully-connected, integrated and non-reducible as described to be like a blackbox by many people. You can further read Rumelhart, Hinton, et al's (1986) "Learning internal representations by error propagation".
Although our learning results do not guaantee that we can find a solution for all solvable problems, our analyses and results have shown that as a practical matter, the error propagation scheme leads to solutions in virtually every case. In short, we believe that we have answered Minsky and Papert's challenge and have found a learning result sufficiently powerful to demonstrate that their pessimism about learning in multilayer machines was misplaced. One way to view the procedure we have been describing is as a parallel computer that, having been shown the appropriate input/output exemplars specifying some function, programs itself to compute that function in general. Parallel computers are notoriously difficult to program. Here we have a mechanism whereby we do not actually have to know how to write the program in order to get the system to do it. Parker (1985) has emphasized this point.