Description
- Implement a binary decision tree with no pruning using the ID3 (Iterative Dichotomiser 3) algorithm[1].
Format of calling the function and accuracy you will get after training:
$ python ID3.py ./path/to/train-file ./path/to/test-file vanilla 80
Train set accuracy: 0.9123
Test set accuracy: 0.8123
The fourth argument (80) is the training set percentage. The above example command means we use only the first 80% of the training data from train-file. (We use all of the test data from test-file.)
- Implement a binary decision tree with a given maximum depth. Format of calling the function and accuracy you will get after training:
$ python ID3.py ./path/to/train-file ./path/to/test-file depth 50 40 14
Train set accuracy: 0.9123
Validation set accuracy: 0.8523
Test set accuracy: 0.8123
The fourth argument (50) is the training set percentage and the fifth argument (40) is the validation set percentage. The sixth argument (14) is the value of maximum depth.
So, for example, the above command would get a training set from the first 50% of train-file and get a validation set from the last 40% of train-file (the two numbers need not add up to 100% because we sometimes use less training data). Finally, we set the maximum depth of the decision tree as 14. As before, we get the full test set from test-file.
Note: you have to print the validation set accuracy for this case.
- Implement a binary decision tree with a given minimum sample split size. Format of calling the function and accuracy you will get after training:
$ python ID3.py ./path/to/train-file ./path/to/test-file min_split 50 40 2
Train set accuracy: 0.9123
Validation set accuracy: 0.8523
Test set accuracy: 0.8123
The sixth argument (2) is the value of minimum samples to split on.
The above example command would get a training set from the first 50% of train-file and get a validation set from the last 40% of train-file (the two numbers need not add up to 100% because we sometimes use less training data). Finally, we set the minimum samples to split on of the decision tree as 2. As before, we get the full test set from test-file.
Note: you have to print the validation set accuracy for this case.
- Implement a binary decision tree with post-pruning using reduced error pruning.
Format of calling the function and accuracy you will get after training:
$ python ID3.py ./path/to/train-file ./path/to/test-file prune 50 40
Train set accuracy: 0.9123
Test set accuracy: 0.8123
The fourth argument (50) is the training set percentage and the fifth argument (40) is the validation set percentage.
So, for example, the above command would get a training set from the first 50% of train-file and get a validation set from the last 40% of train-file. As before, we get the full test set from test-file.




