An Overview and a Benchmark of Active Learning for Outlier Detection with One-Class Classifiers

Holger Trittenbach, Adrian Englhardt, Klemens Böhm

This is the companion website for the publication

Holger Trittenbach, Adrian Englhardt, Klemens Böhm, “An overview and a benchmark of active learning for outlier detection with one-class classifiers”, DOI: 10.1016/j.eswa.2020.114372, Expert Systems with Applications, 2021.

This website provides code and description to reproduce experiments and analyses. The description covers the full experimental pipeline, from preprocessing the raw data and to generating the plots and tables shown in the paper. Citing this work:

@article{trittenbach2021overview,
    title = {An overview and a benchmark of active learning for outlier detection with one-class classifiers},
    journal = {Expert Systems with Applications},
    volume = {168},
    pages = {114372},
    year = {2021},
    issn = {0957-4174},
    doi = {https://doi.org/10.1016/j.eswa.2020.114372},
    url = {https://www.sciencedirect.com/science/article/pii/S0957417420310496},
    author = {Holger Trittenbach and Adrian Englhardt and Klemens Böhm}
}

Resources

The resources are divided into several repositories.

The code is licensed under a MIT License and the result data under a Creative Commons Attribution 4.0 International License.

Overview

Active learning are methods to improve classification quality by user feedback. An important subcategory is active learning for one-class classifiers, i.e., methods specialized to imbalanced class distributions. In this research project, we review, categorize and benchmark active learning for one-class classification. The goal is to make it easier to assess novel research contributions in this field, and to facilitate the selection of a suitable active-learning method.

Active learning for one-class classification can be separated into three building blocks:

The companion paper details these building blocks.

Evaluation Examples

Active learning strategies can be compared with different objectives. The first objective is an overall comparison of different learning scenarios, base learners and query strategies. The evaluation singles out areas in the experimental space, where active learning strategies perform well. For example, active learning results in different in classification quality depending on the assumptions how data from the minority class is distributed, and on the selection of training and test splits.

The second objective is the selection of a query strategy for a specific application, in which the data set and learning scenario are given. In this case, active-learning progress curves are helpful to visualize the classification quality of different query strategies. Further, summary statistics allow to compare characteristics of progress curves. For example, one can compare two query strategies based on their ramp-up performance, i.e., the quality increase in the first active learning iterations.

For further results and details on the figures, we refer to the companion paper.

Contact

We welcome contributions to the packages and bug reports on Github.

For questions and comments, please contact holger.trittenbach@kit.edu and adrian.englhardt@kit.edu