2

i converted my machine learning model (random forest regressor) into c code using 'emlearn' library but the size of .c file is 3.8 MB which is incompatible for microcontrollers. so i want my c code in KBs what should i do?

2 Answers2

2

This verges on "programming bug" which is not discussed on this site but there's a salient point which likely should be made here. Without knowing a lot of details, I'll take a guess.

The model is likely too large for an embedded system. Especially when we train forests we often train the individual trees to node purity which can be relatively large.

Attempt to regularize your model (e.g. max depth) and see if you can bring the model size/executable size down.

foreverska
  • 2,347
  • 4
  • 21
1

The emlearn documentation provides some information on how to optimize tree-based models. The first thing you should do is to reduce the number of trees, and the depth of the trees. The documentation also links example code for how to do this kind of hyper-parameter optimization. It is often possible to reduce from 100 (scikit-learn default) to 10 trees, without much/any reduction in predictive performance. Combined with depth regularization, it is often possible to achieve 100x reduction in model size.

PS: You should also measure the compiled size and not the text size of the code. The model/code when compiled will generally be considerably smaller than the .c text size.

Jon Nordby
  • 111
  • 3