やったこと
sklearn.ensemble.RandomForestClassifier を利用して、
RandomForest 学習させてみます。
確認環境
$ ipython --version
6.1.0
$ jupyter --version
4.3.0
$ python --version
Python 3.6.2 :: Anaconda custom (64-bit)import sklearn
print(sklearn.__version__)出力結果
0.21.2調査
特徴のラベル
1) Alcohol
2) Malic acid
3) Ash
4) Alcalinity of ash
5) Magnesium
6) Total phenols
7) Flavanoids
8) Nonflavanoid phenols
9) Proanthocyanins
10)Color intensity
11)Hue
12)OD280/OD315 of diluted wines
13)ProlineRandoForest で学習してみる
import pandas as pd
df_wine = pd.read_csv('https://archive.ics.uci.edu/ml/machine-learning-databases/wine/wine.data', header=None)
from sklearn.ensemble import RandomForestClassifier
X, y = df_wine.iloc[:, 1:].values, df_wine.iloc[:, 0].values
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=0)
forest = RandomForestClassifier(n_estimators=10000, random_state=0, n_jobs=-1)
forest.fit(X_train, y_train)
# 特徴の重要度を取得
importances = forest.feature_importances_結果を表示する
indicies = np.argsort(importances)[::-1]
for f in range(X_train.shape[1]):
print("%2d) label: %s importance: %f" % (f + 1, indicies[f], importances[indicies[f]]))出力結果
1) label: 9 importance: 0.182483
2) label: 12 importance: 0.158610
3) label: 6 importance: 0.150948
4) label: 11 importance: 0.131987
5) label: 0 importance: 0.106589
6) label: 10 importance: 0.078243
7) label: 5 importance: 0.060718
8) label: 3 importance: 0.032033
9) label: 1 importance: 0.025400
10) label: 8 importance: 0.022351
11) label: 4 importance: 0.022078
12) label: 7 importance: 0.014645
13) label: 2 importance: 0.013916