I've recently read much about feature engineering in continuous (uncountable) feature spaces. Now I am interested what methods exist in the setting of large discrete state spaces. For example consider a board game with grid as a basic layout. Each position on a grid can contain exactly one of multiple elements and the agent makes decisions according to the current board position. If the grid is large enough, say 30x30, and there are only two different elements we could model the states as a linear model with $2*30*30 = 1800$ variables (using dummy variables) and this model can't even distinguish relationships between positions. For this we would need to use $\binom{90}{2}$ or even $\binom{90}{k}$, $k = 2, 3, 4$ more features.
How would one approach this problem? Are the methods for feature selection for linear approximations, which even automatically find/learn non-linear combinations? What was the approach to solving these problems when NN where not around?