Supported Datasets

PaddleTS currently supports dozens of datasets including the four major tasks of time series prediction, time series anomaly detection, time series imputation, and time series classification.

Time Series Prediction

Time series forecasting is one of the most important tasks in time series. At present, we integrate data from multiple scenarios such as electricity, weather, disease, and exchange rate for time series forecasting.

1. ETT-small

  • Data source: https://github.com/zhouhaoyi/ETDataset

  • Data brief introduction: Power transformer data containing 7 variables, used to forecast electricity demand in various regions. Divided into minute-level and hour-level data

  • Dataset names: ETTh1, ETTh2, ETTm1, ETTm2.

2. Weather

  • Data source: https://www.bgc-jena.mpg.de/wetter

  • Data introduction: contains 22 variables sampled weather data every ten minutes, used to predict the weather.

  • Dataset name: Weather.

3. ILI

4. Traffic

  • Data source: https://pems.dot.ca.gov

  • Data brief introduction: Contains 862 sensors, hourly road occupancy ratio.

  • Dataset name: Traffic.

5. Exchange

6. ECL

6. M4

  • Data source: https://www.kaggle.com/datasets/yogesh94/m4-forecasting-competition-dataset

  • Data introduction: 100,000 pieces of univariate data including demography, finance, industry, macroeconomics, microeconomics, etc.

  • Dataset name: M4-Yearly-train, M4-Yearly-test, M4-Monthly-train, M4-Monthly-test, M4-Weekly-train, M4-Weekly-test, M4-Daily-train, M4-Daily-test, M4-Hourly-train, M4-Hourly-test.

7. WTH

Time Series Anomaly Detection

Detecting anomalies from monitoring data is vital to industrial maintenance. We provide widely-used anomaly detection benchmarks: SMD, MSL, SMAP, SWaT, PSM, covering service monitoring, space & earth exploration, and water treatment applications.

1. SMD

2. SMAP

3. MSL

  • Data source: https://s3-us-west-2.amazonaws.com/telemanom/data.zip

  • Data brief introduction: MSL (Mars Science Laboratory) is from NASA with 55 dimensions, which contain the telemetry anomaly data derived from the Incident Surprise Anomaly (ISA) reports of spacecraft monitoring systems.

  • Dataset names: msl_train, msl_test.

4. SWAT

5. PSM

6. NAB_TEMP

  • Data source: https://github.com/numenta/NAB

  • Data brief introduction: The Numenta Anomaly Benchmark (NAB) provides streaming data to research anomaly detection algorithms. NAB_TEMP is the temperature dataset.

  • Dataset names: NAB_TEMP

Time Series Classification

1. UEA

  • Data source: https://www.timeseriesclassification.com/index.php

  • Data brief introduction: UEA Time Series Classification dataset includes 10 multivariate datasets, covering the gesture, action and audio recognition, medical diagnosis by heartbeat monitoring and other practical tasks.

  • Dataset names: EthanolConcentration_Train, EthanolConcentration_Test, FaceDetection_Train, FaceDetection_Test, Handwriting_Train, Handwriting_Test, Heartbeat_Train, Heartbeat_Test, JapaneseVowels_Train, JapaneseVowels_Test, PEMSSF_Train, PEMSSF_Test, SelfRegulationSCP1_Train, SelfRegulationSCP1_Test, SelfRegulationSCP2_Train, SelfRegulationSCP2_Test, SpokenArabicDigits_Train, SpokenArabicDigits_Test, UWaveGestureLibrary_Train, UWaveGestureLibrary_Test.

2. BasicMotions

  • Data source: https://timeseriesclassification.com/description.php?Dataset=BasicMotions

  • Data brief introduction: The data was generated as part of a student project where four students performed four activities whilst wearing a smart watch. here are classes: walking, resting, running and badminton. Participants were required to record motion a total of five times, and the data is sampled once every tenth of a second, for a ten second period.

  • Dataset names: BasicMotions_Train, BasicMotions_Test.