The resources are divided into several repositories.
- ocal-evaluation: Contains the scripts to run experiments and to analyze the results. The package readme is a step-by-step guide to reproduce the experiments described in the companion paper. Running the benchmark is compute intensive and takes many CPU hours. Therefore, we also provide the results to download (1.2 GB).
- OneClassActiveLearning.jl: A Julia package that implements various Query Strategies and an Active Learning Cycle.
- SVDD.jl: A Julia package for Support Vector Data Description.
The code is licensed under a MIT License and the result data under a Creative Commons Attribution 4.0 International License.
Active learning are methods to improve classification quality by user feedback. An important subcategory is active learning for one-class classifiers, i.e., methods specialized to imbalanced class distributions. In this research project, we review, categorize and benchmark active learning for one-class classification. The goal is to make it easier to assess novel research contributions in this field, and to facilitate the selection of a suitable active-learning method.
Active learning for one-class classification can be separated into three building blocks:
- Learning Scenario: assumptions under which different methods are compared.
- Base Learner: a one-class classifier
- Query Strategy: the algorithm to select observations for feedback
The companion paper details these building blocks.
Active learning strategies can be compared with different objectives. The first objective is an overall comparison of different learning scenarios, base learners and query strategies. The evaluation singles out areas in the experimental space, where active learning strategies perform well. For example, active learning results in different in classification quality depending on the assumptions how data from the minority class is distributed, and on the selection of training and test splits.
The second objective is the selection of a query strategy for a specific application, in which the data set and learning scenario are given. In this case, active-learning progress curves are helpful to visualize the classification quality of different query strategies. Further, summary statistics allow to compare characteristics of progress curves. For example, one can compare two query strategies based on their ramp-up performance, i.e., the quality increase in the first active learning iterations.
For further results and details on the figures, we refer to the companion paper.
We welcome contributions to the packages and bug reports on Github.
For questions and comments, please contact email@example.com and firstname.lastname@example.org