25.4 Viewing Random Forests

The last part of the course will introduce you to the random forest package, some nifty things you can do with it, and some visuals. IMPORTANT NOTE!!! This implementation of random forest can’t handle catgegorical predictors directly. You need to convert them to a model matrix. This isn’t that hard, but if you’re not aware of it you can get spurious results

Variable importance.

A neat output of an RF model is a measure of proximity between rows. Proximity is the proportion of times two observations appear in the same leaf node. This can be very useful in cases where rows contain both continuous and categorical data, a typically difficult situation for most metrics.

If we do 1 - proximity we can turn this into a distance matrix and do things like multidimensional scaling.