Machine Learning is the sub-branch of Artificial Intelligence. It gives a system the ability to learn and become better from past experiences. Decision tree and random forest are two Supervised Machine Learning techniques. A decision tree is a simple and decision-making diagram. Certainly, for a much larger dataset, a single decision tree is not sufficient to find the prediction. On the other hand, A random forest is a collection of decision trees. Here the output depends on the outputs of all its decision trees.

What is the decision tree?

Places like Starbucks check people decision-making abilitiesđŸ˜‰. For one cup of coffee, we need to make 7-8 decisions- small, large, sugar-free, strong, mild, dark, low fat, no fat etc. Decision Tree works just like that.

Decision Tree is a Supervised Machine Learning Algorithm use to solve both regression and classification problems. It builds the model in the form of a tree structure with decision nodes and leaf nodes. A decision node consists of two or more branches. Leaf node represents a decision. A topmost decision node is the root node. It can deal with both the categorical and continuous data.

Advantages

  • Simple and easy to understand.
  • Don’t need to do much computation.
  • Can handle both categorical and continuous data.
  • Easy visualization.
  • Provides a clear idea of what all features are important for classification.

Disadvantages

  • Prone to errors like overfitting, error due to bias and variance.
  • Less accurate predictions.

What is the Random Forest?

Suppose you want to go on vacation with family and you want to go to a place where you can enjoy very much. So to find a place you have check online, ask your friends or travel blogs. And then have made a list of all the recommended places. Then you ask your family members to vote, the place with the maximum number of votes will be your final choice.

This whole process consists of two steps- Firstly make the list of all the recommended places. Secondly, do the voting process to select the best place. The whole process of getting the recommendation and selecting the best place is the Random Forest Algorithm.

It is an ensemble method of decision trees generated on randomly split data. The group of trees is called the forest. Each tree depends on the independent random sample and is generated using an attribute selection such as information gain, gain ratio etc. For classification problems, we choose the most popular tree as a final result where each tree votes. And for regression problems, the average of all the trees is considered as the final result.

Advantages

  • Builds a robust model.
  • Does not suffer from overfitting problem.
  • Can use for both classification and regression problems.
  • Gives highly accurate predictions.
  • Powerful than other non-linear models.

Disadvantages

  • Time Consuming. Since it has multiple decision trees, therefore, is slow in generating predictions.
  • Complex to interpret.

Decision Tree vs Random Forest

Decision trees are simple but suffer from some serious problems- overfitting, error due to variance or error due to bias. Random Forest is the collection of decision trees with a single and aggregated result. Using multiple trees in the random forest reduces the chances of overfitting. And they are complex to understand. A decision tree is easy to read and understand whereas random forest is more complicated to interpret.

A single decision tree is not accurate in predicting the results but is fast to implement. More trees will give a more robust model and prevents overfitting. In the forest, we need to generate, process and analyze each and every tree. Therefore this process is a slow process and can sometimes take hours or even days.

Decision TreeRandom Forest
It is a tree-like decision-making diagram.It is a group of decision trees combined together to give output.
Possibility of Overfitting.Prevents Overfitting.
Gives less accurate result.Gives accurate results.
Simple and easy to interpret.Hard to interpret.
Less ComputationMore Computation
Simple to visualize.Complex Visualization.
Fast to process.Slow to process.