0

I'm new to machine learning and I've been working through a dataset of ~3000 records with ~100 features. I've been hand rolling Python and R scripts to analyse the data. For example, plotting the distribution of each feature to see how normal it is, identify outliers, etc. Another example is plotting heatmaps of the features against themselves to identify strong correlations.

Whilst this has been a useful learning exercise, going forward I suspect there are tools that automate a lot of this data analysis for you, and produce the useful plots and possibly give recommendations on transforms, etc? I've had a search around but don't appear to be finding anything, I guess I'm not using the right terminology to find what I'm looking for.

If any useful open source tools for this kind of thing spring to mind that would be very helpful.

Cosmicnet
  • 103
  • 3

2 Answers2

1

This is a great question because indeed there are many tools out there to make this part of the process faster. I have used as I usually stick to the following two:

You can also search for alternatives Hopefully, the community can help complement my answer.

Echo9k
  • 126
  • 2
0

all you mentioned belongs to any statistical analysis - it is suited for Normal Data & is done with least-squares methods, but Maximum-Likelihood-Estimations can be applied for non-normal data (e.g. for rare-events analysis)... ML, in general, do not need Normality of data distribution, but some preprocessing is needed from the developer by his desire - see opportunities for preprocessing in sklearn's preprocessing class

JeeyCi
  • 101
  • 1