2

I was trying to build a prediction system where I have the input data arranged in multiple columns. The input data would be of the type where I have

  • weather,
  • service type (bronze, silver, gold),
  • size (xs, s, m, l, xl, xxl),
  • time,
  • availability,
  • pin code, and
  • the result (target).

Each of the data types is arranged in columns with a specific code. I have read this, this, this , this, and this.

They are helpful but do not give me a clear picture. I would like to achieve a multi-vs-one prediction. Most of the schemes available are one-vs-one where the data is a 1*1 entity.

Here is a sample code that I was working with:

regressionModel = linear_model.LinearRegression()
    """ 3. Processing is not necessary for current concept """
    y = pd.DataFrame(modifiedDFSet['Code'])
    print(y.shape)
    drop2 = ['Code']
    X = modifiedDFSet.drop(drop2)
    print(X.shape)
    """ 4. Data Scaling, Data Imputation is not necessary. Training and Test data is prepared using train-test-split """
    train_data, test_data = train_test_split(X, test_size=0.20, random_state=42)
    """ 5. the Regression Model """
    # h = .02  # step size in the mesh
    # logreg = linear_model.LinearRegression()
    # we create an instance of Neighbours Classifier and fit the data.
    regressionModel.fit(X, y)
    d_predictions = regressionModel.predict(y)

X.shape and y.shape would yield (500, 6) and (500, 1), respectively, which would obviously cause a dimensional error in the d_predictions, meaning the regression model does not take multiple column inputs.

I have a hypothesis that I can create a scoring scheme that will take into account the importance of each of the columns and create a scheme that creates a score and the end result would be a one-vs-one regression problem. Looking for some direction with respect to my hypothesis. Is it correct, wrong or halfway?

nbro
  • 42,615
  • 12
  • 119
  • 217

1 Answers1

1

I think the model will have no problem taking a multicolumn input. In fact, from your code, this is exactly how you trained it. It expects an input of size [k, 6], where k is k>=1.

Instead you are feeding it with [k, 1] sized data, which are the dimensions of y. So you if you run it like this it should work:

regressionModel.predict(X)

hellmean
  • 140
  • 6