2

I tried executing this code in windows and it ran flawlessly but in Ubuntu as soon as I run this code, Ubuntu freezes for 3-4 mins and then the result come and then Ubuntu behaves laggy until I restart it.

import numpy as np
import pandas as pd
from sklearn.preprocessing import StandardScaler
from sklearn.model_selection import train_test_split
from sklearn import preprocessing
from xgboost import XGBClassifier
import xgboost as xgb
from sklearn.metrics import accuracy_score

dataset_len = 40000000 dlen = int(dataset_len/2) X_11 = pd.Series(np.random.normal(2,2,dlen)) X_12 = pd.Series(np.random.normal(9,2,dlen)) X_1 = pd.concat([X_11, X_12]).reset_index(drop=True) X_21 = pd.Series(np.random.normal(1,3,dlen)) X_22 = pd.Series(np.random.normal(7,3,dlen)) X_2 = pd.concat([X_21, X_22]).reset_index(drop=True) X_31 = pd.Series(np.random.normal(3,1,dlen)) X_32 = pd.Series(np.random.normal(3,4,dlen)) X_3 = pd.concat([X_31, X_32]).reset_index(drop=True) X_41 = pd.Series(np.random.normal(1,1,dlen)) X_42 = pd.Series(np.random.normal(5,2,dlen)) X_4 = pd.concat([X_41, X_42]).reset_index(drop=True) Y = pd.Series(np.repeat([0,1],dlen)) df = pd.concat([X_1, X_2, X_3, X_4, Y], axis=1) df.columns = ['X1', 'X2', 'X3', 'X_4', 'Y'] df.head()

vidarlo
  • 23,497
New
  • 39

1 Answers1

1

The results of free -m after running the code shows that you've filled your swap space. This is bad; your system is effectively out of memory at that point. It can't write any more data to swap, and it has to start killing processes to make memory available.

Windows has a dynamic swap size, where the pagefile expands as needed. Linux has a bit more static approach to swap, where it's fixed and pre-allocated. This probably leads to the behaviour you see, because Ubuntu starts killing of processes as you're out of memory.

You can increase the available swap space to make things a bit better, but ultimately you need more RAM to run memory-intensive calculations.

vidarlo
  • 23,497