1

I have a dataset which has users (rows) with the list of their interests (IABs), which looks like this

user_id | gender | list of interests
--------+--------+--------------------------------
user 1  | male   | games, productivity
user 2  | female | games, lifestyle, design
user 3  | male   | travel, games, messaging
user 4  | male   | messaging, blogging, lifestyle
...

Since the number of unique interests are few (~500) and the number of rows are high (~67M), what are the feature engineering practices that I should follow to get an ML model score a better accuracy?

P.S.: Simple model with one hot/count hot vectorization yields an accuracy of ~52%

theodre7
  • 11
  • 2

0 Answers0