Understanding the Effects of Temporal Energy-Data Aggregation on Clustering Quality

Holger Trittenbach, Jakob Bach, and Klemens Böhm

This is the companion website for the publication

Holger Trittenbach, Jakob Bach and Klemens Böhm “Understanding the Effects of Temporal Energy-Data Aggregation on Clustering Quality.” it-Information Technology (2019)

This website provides code and description to reproduce experiments and analyses. Citing this work:

  title={Understanding the Effects of Temporal Energy-Data Aggregation on Clustering Quality},
  author={Trittenbach, Holger and Bach, Jakob and B{\"o}hm, Klemens},
  journal={it-Information Technology},
  publisher={De Gruyter Oldenbourg}


The resources are available in two code repositories and one data repository.



The raw data and result files are available for download at clustagg-data.zip (packed: 17 GB, unpacked: 25 GB extracted). For more information on the energy data, please see HIPE Data. The archive contains two directories:

The data directory has several subdirectories with specific naming conventions:

The experiment run directories contain several files:

The CSV files titles are in German; the translation to English:

German English
Frequenz Frequency
Leistungsfaktor2 Power Factor
Mittelwert_Leiterspannungen Voltage (average over three phases)
Mittelwert_Stromstaerken Amperage (average over three phases)
PositiveWirkenergie_korrigiert Positive Energy
Wirkleistung Active Power

Reproduce Experiments

This website is intended as manual to reproduce our particular experiments, without explaining individual steps. For details, please see README.md in ImpactAggregationClustering, which contains details of the relevant files, execution steps, parameters and execution syntax. Please note that some experiments might take several hours, even with multiple CPU cores (all cores will be used).

  1. Setup: Follow the setup instructions in ImpactAggregationClustering and FastTSDistances. Copy the raw CSV files of the electrical quantity of interest from data/SparkDatasets/<QUANTITY>/ to a working directory <WORK_DIR>. Do not copy the CSVs files ending with _summer or _winter, but only the consolidated versions from the processed subdirectory.
  1. RDS Preprocessing: Select the preprocessing script from preprocscripts/ that matches the <QUANTITY> selected.
  1. Update the hard-coded path of the input file in the preprocessing script to <WORK_DIR>.
  2. Execute the preprocessing script. This reproduces the .rds files in the the data directory.
  3. Delete the .rds files without the extension _valid.
  1. Preprocessing Variable-Length Sequences: This step is only required to extract variable-length sequences, i.e., defined by start and end of machine activity. Depending on the <QUANTITY> selected, this requires the following additional preprocessing steps:
  1. Experiment: Execute batchscripts/SparkAggregationBasedScript.R by passing the experiment parameters. The parameters used in the publication experiments are documented in results/params.txt.

Random data

To reproduce the results with random data of Section 5-B:

  1. Execute batchscripts/RandomDataGenerationScript.R to create random data.
  2. Execute batchscripts/SparkAggregationBasedScript.R for analysis.


For questions and comments, please contact holger.trittenbach@kit.edu and jakob.bach@kit.edu