Best feature engineering approach for interest-based age classification

Asked Mar 11 '23 at 17:44

Active Mar 11 '23 at 18:08

Viewed 38 times

I have a dataset which has users (rows) with the list of their interests (IABs), which looks like this

user_id | gender | list of interests
--------+--------+--------------------------------
user 1  | male   | games, productivity
user 2  | female | games, lifestyle, design
user 3  | male   | travel, games, messaging
user 4  | male   | messaging, blogging, lifestyle
...

Since the number of unique interests are few (~500) and the number of rows are high (~67M), what are the feature engineering practices that I should follow to get an ML model score a better accuracy?

P.S.: Simple model with one hot/count hot vectorization yields an accuracy of ~52%

edited Mar 11 '23 at 18:08

asked Mar 11 '23 at 17:44

theodre7

Best feature engineering approach for interest-based age classification

0 Answers0