Tham khảo tài liệu 'computational intelligence in automotive applications episode 1 part 4', kỹ thuật - công nghệ, cơ khí - chế tạo máy phục vụ nhu cầu học tập, nghiên cứu và làm việc hiệu quả | 46 K. Torkkola et al. Each tree in the Random Forest is grown according to the following parameters 1. A number m is specified much smaller than the total number of total input variables M typically m is proportional to a M . 2. Each tree of maximum depth until pure nodes are reached is grown using a bootstrap sample of the training set. 3. At each node m out of the M variables are selected at random. 4. The split used is the best possible split on these m variables only. Note that for each tree to be constructed bootstrap sampling is applied. A different sample set of training data is drawn with replacement. The size of the sample set is the same as the size of the original dataset. This means that some individual samples will be duplicated but typically 30 of the data is left out of this sample out-of-bag . This data has a role in providing an unbiased estimate of the performance of the tree. Also note that the sampled variable set does not remain constant while a tree is grown. Each new node in a tree is constructed based on a different random sample of m variables. The best split among these m variables is chosen for the current node in contrast to typical decision tree construction which selects the best split among all possible variables. This ensures that the errors made by each tree of the forest are not correlated. Once the forest is grown a new sensor reading vector will be classified by every tree of the forest. Majority voting among the trees produces then the final classification decision. We will be using RF throughout our experimentation because of it simplicity and excellent In general RF is resistant to irrelevant variables it can handle massive numbers of variables and observations and it can handle mixed type data and missing data. Our data definitely is of mixed type . some variables are continuous some variables are discrete although we do not have missing data since the source is the simulator. Random Forests for Driving .