Conceptually, in sklearn you can set RandomForestClassifier with setting following hyper-parameters which can reduce the random forest into a decision tree.
- n_estimators=1, which is obvious setting that when we want to use Decision Tree
- max_features=None, the default setting for RandomForestClassifier is "auto", which is setting max_features=sqrt(n_features). Decision Tree uses all features, therefore, we need to set max_features=None to incorporate Decision Tree
- bootstrap=False, another aspect of Random Forest is that it is using a subset of samples for training while Decision Tree uses all samples. Therefore, we need to set bootstrap=False to incorporate a Decition Tree.
In summary, the following code should be the same as a Decision Tree in terms of Random Forest.
clf = RandomForestClassifier(n_estimators=1, max_features=None, bootstrap=False)
When to use it?
It might be useful when we need to GridSearchCV hyper-parameters for RandomForestClassifier and would like to know if a simple Decision Tree is enough to get the best score in terms of whatever you are optimizing. Still, it is not easy, it is notable that GridSearchCV will get the first the classifier of all different hyper-parameter options with best scores with np.argmin().
No comments:
Post a Comment