The resources are divided into several repositories.
- des-evaluation: Contains the scripts to run experiments and to analyze the results. The package readme is a step-by-step guide to reproduce the experiments described in the companion paper. Running the benchmark is compute intensive and takes many CPU hours. Therefore, we also provide the results to download (866 MB).
- OneClassActiveLearning.jl: A Julia package that implements various query synthesis and the active learning cycle.
- SVDD.jl: A Julia package for Support Vector Data Description.
The code is licensed under a MIT License and the result data under a Creative Commons Attribution 4.0 International License.
Quality and quantity of training data have crucial implications on the usefulness of a classifier. When data collection is limited or expensive the data set collected often is small and biased, i.e., does not represent the real data distribution well. In this case, data synthesis helps to extend the sample. Since the data domain is often unknown and/or high-dimensional and extending the data by hand is challenging, we propose to use active learning. In our article we focus on expanding the classifier knowledge beyond an initial sample of inliers that form a single connected inlier region in an unbound space – the domain expansion problem. To this end, we develop a novel Domain Expansion Strategy (DES) that performs query synthesis in one-class active learning. DES generates synthetic queries that are then labeled. The feedback is then fed back into the classifier.
This figure gives an example of expanding the classifier knowledge with DES. The black line is the decision boundary fitted by a one-class classifier (SVDDneg). The dashed line is the target boundary that we want to learn. The initial sample does not cover the full valid space and misses large areas to the right. The heatmap indicates the expected information gain a query would yield – the darker, the higher the gain. Over the curse of 15 queries DES expands the classifier knowledge and approximates the real decision boundary quite well.