JIST2015 Data Challenge
- Apply Filters for Attributes
- Change all properties to nominal using regex pattern
- String regex = "^(https?|ftp|file)://[-a-zA-Z0-9+&@#/%?=~_|!:,.;]*[-a-zA-Z0-9+&@#/%=~_|]";
- Replace all {0} or {1} to {0,1} in train.arff and test.arff files to let them be compatible.
- Normalize all numeric attributes into [0,1] interval
- Save Predictions to a File
- [Visualize classifier errors] - > save as a .arff file
- remove all headers and import it into a .csv file
- final two columns show the "predicted classes" and "actual classes"
- Parameter Selection
- Use CVParameterSelection as the classifier
- Choose a classifier to optimize the parameters
- Add a parameter need to be tuned in CVParameters
- e.g., I 10 250 25 for Random Forest will test 10, 20 ... 250 (25 steps) for the [number of trees] parameter
- It will show the best option as output: "Classifier Options: -I 90 -K 0 -S 1"
- Feature Selection
- Fast feature (attribute) selection using ranking (using searching is not feasible due to the number of attributes is large)
- GainRatioAttributeEval with Rank
- Set threshold < 0.1, < 0.2, < 0.3, < 0.4 to run experiments to see which threshold is working best -> 0.2
- Consider Index (Can it improve the performance?)
- Used formula 1-index/size
- Answer: