Questions tagged [scikit-learn]

For questions related to the Python's package scikit-learn (or sklearn).

32 questions
6
votes
2 answers

Can ML be used to curve fit data based on dataset of example fits?

Say I have x,y data connected by a function with some additional parameters (a,b,c): $$ y = f(x ; a, b, c) $$ Now given a set of data points (x and y) I want to determine a,b,c. If I know the model for $f$, this is a simple curve fitting problem.…
argentum2f
  • 181
  • 1
  • 7
5
votes
2 answers

Why isn't my decision tree classifier able to solve the XOR problem properly?

I was trying to solve an XOR problem, and the dataset seems like the one in the image. I plotted the tree and got this result: As I understand, the tree should have depth 2 and four leaves. The first comparison is annoying, because it is close to…
4
votes
0 answers

When computing the ROC-AUC score for multi-class classification problems, when should we use One-vs-Rest and One-vs-One?

The sklearn's documentation of the method roc_auc_score states that the parameter multi_class can take the value 'OvR' (which stands for One-vs-Rest) or 'OvO' (which stands for One-vs-One). These values are only applicable for multi-class…
2
votes
2 answers

integrating machine learning models on microcontroller

i converted my machine learning model (random forest regressor) into c code using 'emlearn' library but the size of .c file is 3.8 MB which is incompatible for microcontrollers. so i want my c code in KBs what should i do?
2
votes
0 answers

How matrix factorization helps with recommendations when it converges to the initial user-items matrix?

We can say that matrix factorization of a matrix $R$, in general, is finding two matrices $P$ and $Q$ such that $R \approx P.Q^{T}$ with some constraints on $P$ and $Q$. Looking at some matrix factorization algorithms on the internet like…
KindNewbie
  • 21
  • 2
2
votes
0 answers

Suitable deep learning algorithms for spatial / geometric data

I have a task of classifying spatial data from a geographic information system. More precisely, I need a way to filter out unnecessary line segments from the CAD system before loading into the GIS (see the attached picture, colors for illustrative…
2
votes
1 answer

Is it compulsary to normalize the dataset if doing so can negatively impact a Binary Logistic regression performance?

I am using raw data set with 4 feature variables (Total Cholesterol, Systolic Blood Pressure, Diastolic Blood Pressure, and Cigraeette count) to do a Binominal Classification (find stroke likelihood) using Logistic Regression Algorithm. I made sure…
1
vote
1 answer

How to train Scikit-learn model for classifying movements using accelerometer data?

I am working on a motion classification task using accelerometer data collected at 25Hz during different exercises. The goal is to classify movements such as: Pull-ups Push-ups Dips Each batch of data consists of 50samples (2 seconds), where each…
Gripen
  • 111
  • 1
1
vote
1 answer

Why isn't class_weight='balanced' impacting my F1 score for an imbalanced dataset (SVM)

I'm using MNIST to test how a class imbalance can impact an SVM model. I have a training set with 50 examples of '0'. I then am increasing the number of '1' training examples (starting from 1 example of '1' up to 999 examples of '1' in the training…
1
vote
0 answers

Using ML to uncover procedural logic

The game Elite Dangerous has a proceduraly generated galaxy of some 400 billion star systems. Each star system in the game can be uniquely identified bu a 64bit number (id64) which is used as a seed for building the system but can also be decoded…
1
vote
1 answer

Unexpected behaviour on using class weights in loss

I’m working on a classification problem (500 classes). My NN has 3 fully connected layers, followed by an LSTM layer. I use nn.CrossEntropyLoss() as my loss function. To tackle the problem of class imbalance, I use sklearn’s class_weight while…
helloworld
  • 65
  • 1
  • 6
1
vote
1 answer

Why does sklearn perceptron converge for linearly inseparable data points?

I learned that the perceptron algorithm only converges if the dataset is linearly separable. I am implementing this algorithm using scikit learn. The blue and orange points are from the training set, while red and green are from the test set.…
1
vote
1 answer

How can I interpret the value returned by score(X) method of sklearn.neighbors.KernelDensity?

For sklearn.neighbors.KernelDensity, its score(X) method according to the sklearn KDE documentation says: Compute the log-likelihood of each sample under the model For 'gaussian' kernel, I have implemented hyper-parameter tuning for the…
1
vote
1 answer

Interpretation of feature selection based on the model

The description of feature selection based on a random forest uses trees without pruning. Do I need to use tree pruning? The thing is, if I don't cut the trees, the forest will retrain. Below in the picture is the importance of features based on 500…
1
vote
0 answers

How can I split the data into training and validation sets such that entries with a certain value are kept together?

I have the following kind of data frame. These are just example: A 1 Normal A 2 Normal A 3 Stress B 1 Normal B 2 Stress B 3 Stress C 1 Normal C 2 Normal C 3 Normal I want to do 5-fold cross-validation and splitting the data using skf =…
1
2 3