やったこと
sklearn.ensemble.RandomForestClassifier を利用して、
RandomForest 学習させてみます。
確認環境
$ ipython --version
6.1.0
$ jupyter --version
4.3.0
$ python --version
Python 3.6.2 :: Anaconda custom (64-bit)
import sklearn
print(sklearn.__version__)
出力結果
0.21.2
調査
特徴のラベル
1) Alcohol
2) Malic acid
3) Ash
4) Alcalinity of ash
5) Magnesium
6) Total phenols
7) Flavanoids
8) Nonflavanoid phenols
9) Proanthocyanins
10)Color intensity
11)Hue
12)OD280/OD315 of diluted wines
13)Proline
RandoForest で学習してみる
import pandas as pd
df_wine = pd.read_csv('https://archive.ics.uci.edu/ml/machine-learning-databases/wine/wine.data', header=None)
from sklearn.ensemble import RandomForestClassifier
X, y = df_wine.iloc[:, 1:].values, df_wine.iloc[:, 0].values
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=0)
forest = RandomForestClassifier(n_estimators=10000, random_state=0, n_jobs=-1)
forest.fit(X_train, y_train)
# 特徴の重要度を取得
importances = forest.feature_importances_
結果を表示する
indicies = np.argsort(importances)[::-1]
for f in range(X_train.shape[1]):
print("%2d) label: %s importance: %f" % (f + 1, indicies[f], importances[indicies[f]]))
出力結果
1) label: 9 importance: 0.182483
2) label: 12 importance: 0.158610
3) label: 6 importance: 0.150948
4) label: 11 importance: 0.131987
5) label: 0 importance: 0.106589
6) label: 10 importance: 0.078243
7) label: 5 importance: 0.060718
8) label: 3 importance: 0.032033
9) label: 1 importance: 0.025400
10) label: 8 importance: 0.022351
11) label: 4 importance: 0.022078
12) label: 7 importance: 0.014645
13) label: 2 importance: 0.013916