On Imbalanced and High Dimensional Low Sample Size Classification
In this talk, we will address some challenges and issues related to classifying imbalanced high dimensional low sample size data sets. We will discuss both binary classifiers and multi-category classification methods. Two popular methods, Support Vector Machine (SVM) and Distance Weighted Discrimination (DWD) will be used as examples. Novel classification methods that possess the merits of both methods are proposed. We show that the new classifier inheres the merit of DWD, and hence, overcomes the data-piling and overfitting issue of SVM. On the other hand, the new method is not subject to imbalanced data issue which was a main advantage of SVM over DWD. Several theoretical properties, including Fisher consistency and asymptotic normalityof the DWSVM solution are developed. We use some simulated examples to show that the new method can compete DWD and SVM on both classification performance and interpretability.