sklearn.ensemble.RandomForestClassifier を使ってみる

Python

Published: 2019-08-14

やったこと

sklearn.ensemble.RandomForestClassifier を利用して、

RandomForest 学習させてみます。

確認環境

$ ipython --version
6.1.0
$ jupyter --version
4.3.0
$ python --version
Python 3.6.2 :: Anaconda custom (64-bit)

import sklearn
print(sklearn.__version__)

出力結果

0.21.2

調査

特徴のラベル

1) Alcohol
2) Malic acid
3) Ash
4) Alcalinity of ash
5) Magnesium
6) Total phenols
7) Flavanoids
8) Nonflavanoid phenols
9) Proanthocyanins
10)Color intensity
11)Hue
12)OD280/OD315 of diluted wines
13)Proline

RandoForest で学習してみる

import pandas as pd
df_wine = pd.read_csv('https://archive.ics.uci.edu/ml/machine-learning-databases/wine/wine.data', header=None)

from sklearn.ensemble import RandomForestClassifier
X, y = df_wine.iloc[:, 1:].values, df_wine.iloc[:, 0].values

from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=0)

forest = RandomForestClassifier(n_estimators=10000, random_state=0, n_jobs=-1)
forest.fit(X_train, y_train)

# 特徴の重要度を取得
importances = forest.feature_importances_

結果を表示する

indicies = np.argsort(importances)[::-1]
for f in range(X_train.shape[1]):
    print("%2d) label: %s importance: %f" % (f + 1, indicies[f], importances[indicies[f]]))

出力結果

 1) label: 9 importance: 0.182483
 2) label: 12 importance: 0.158610
 3) label: 6 importance: 0.150948
 4) label: 11 importance: 0.131987
 5) label: 0 importance: 0.106589
 6) label: 10 importance: 0.078243
 7) label: 5 importance: 0.060718
 8) label: 3 importance: 0.032033
 9) label: 1 importance: 0.025400
10) label: 8 importance: 0.022351
11) label: 4 importance: 0.022078
12) label: 7 importance: 0.014645
13) label: 2 importance: 0.013916

参考

UCI Machine Learning Repository: Wine Data Set
3.2.4.3.1. sklearn.ensemble.RandomForestClassifier — scikit-learn 0.21.3 documentation
Python機械学習プログラミング