Exploring the Unknown - Query Synthesis in One-Class Active Learning

Adrian Englhardt, Klemens Böhm

This is the companion website for the manuscript

Adrian Englhardt, Klemens Böhm, “Exploring the Unknown - Query Synthesis in One-Class Active Learning”. In: Proceedings of the 2020 SIAM International Conference on Data Mining (SDM), May 7-9, Cincinnati, Ohio, USA. [PDF]

@inproceedings{englhardt2020des,
  title={Exploring the Unknown - Query Synthesis in One-Class Active Learning},
  author={Englhardt, Adrian and B{\"o}hm, Klemens},
  booktitle={Proceedings of the 2020 SIAM International Conference on Data Mining},
  year={2020},
  organization={SIAM}
}

This website provides code and description to reproduce experiments and analyses. The description covers the full experimental pipeline, from preprocessing the raw data and to generating the plots and tables shown in the paper. Citing this work:

Resources

The resources are divided into several repositories.

The code is licensed under a MIT License and the result data under a Creative Commons Attribution 4.0 International License.

Overview

Quality and quantity of training data have crucial implications on the usefulness of a classifier. When data collection is limited or expensive the data set collected often is small and biased, i.e., does not represent the real data distribution well. In this case, data synthesis helps to extend the sample. Since the data domain is often unknown and/or high-dimensional and extending the data by hand is challenging, we propose to use active learning. In our article we focus on expanding the classifier knowledge beyond an initial sample of inliers that form a single connected inlier region in an unbound space – the domain expansion problem. To this end, we develop a novel Domain Expansion Strategy (DES) that performs query synthesis in one-class active learning. DES generates synthetic queries that are then labeled. The feedback is then fed back into the classifier.

Example

This figure gives an example of expanding the classifier knowledge with DES. The black line is the decision boundary fitted by a one-class classifier (SVDDneg). The dashed line is the target boundary that we want to learn. The initial sample does not cover the full valid space and misses large areas to the right. The heatmap indicates the expected information gain a query would yield – the darker, the higher the gain. Over the curse of 15 queries DES expands the classifier knowledge and approximates the real decision boundary quite well.