2

I have a dataset with a lot of binary categorical features and a single continuous target value. I would like to cluster them, but I am not quite sure what to use.

In the past, I have used DBSCAN for something similar and it worked well, but that dataset also had lots of continuous features.

Do you have any tips and suggestions?

Would you suggest matrix factorization and then cluster?

Oliver Mason
  • 5,477
  • 14
  • 32
user199590
  • 125
  • 6

1 Answers1

1

Any clustering algorithm should work -- the main issue is the similarity or distance metric that determines how similar (or different) two elements are. This is often something like Euclidean distance, but that won't work well with binary data.

I would suggest using the Jaccard Index or Dice Coefficient. These will be suitable for use as a metric when clustering such data.

Oliver Mason
  • 5,477
  • 14
  • 32