multi vs one prediction using Regression

Question

I was trying to build a prediction system where I have the input data arranged in multiple columns. The input data would be of the type where I have

weather,
service type (bronze, silver, gold),
size (xs, s, m, l, xl, xxl),
time,
availability,
pin code, and
the result (target).

Each of the data types is arranged in columns with a specific code. I have read this, this, this , this, and this.

They are helpful but do not give me a clear picture. I would like to achieve a multi-vs-one prediction. Most of the schemes available are one-vs-one where the data is a 1*1 entity.

Here is a sample code that I was working with:

regressionModel = linear_model.LinearRegression()
    """ 3. Processing is not necessary for current concept """
    y = pd.DataFrame(modifiedDFSet['Code'])
    print(y.shape)
    drop2 = ['Code']
    X = modifiedDFSet.drop(drop2)
    print(X.shape)
    """ 4. Data Scaling, Data Imputation is not necessary. Training and Test data is prepared using train-test-split """
    train_data, test_data = train_test_split(X, test_size=0.20, random_state=42)
    """ 5. the Regression Model """
    # h = .02  # step size in the mesh
    # logreg = linear_model.LinearRegression()
    # we create an instance of Neighbours Classifier and fit the data.
    regressionModel.fit(X, y)
    d_predictions = regressionModel.predict(y)

X.shape and y.shape would yield (500, 6) and (500, 1), respectively, which would obviously cause a dimensional error in the d_predictions, meaning the regression model does not take multiple column inputs.

I have a hypothesis that I can create a scoring scheme that will take into account the importance of each of the columns and create a scheme that creates a score and the end result would be a one-vs-one regression problem. Looking for some direction with respect to my hypothesis. Is it correct, wrong or halfway?

score 1 · Answer 1 · answered Mar 10 '18 at 10:50

I think the model will have no problem taking a multicolumn input. In fact, from your code, this is exactly how you trained it. It expects an input of size [k, 6], where k is k>=1.

Instead you are feeding it with [k, 1] sized data, which are the dimensions of y. So you if you run it like this it should work:

regressionModel.predict(X)

multi vs one prediction using Regression

1 Answers1