Skip to main content
Skip table of contents

Preprocessing

time_scaling

time_scaling allows you to adjust the sampling period of your dataset using Tangent to better fit your analysis needs.

Why Timescale?

  • Aggregating Data: Useful for converting fine-grained data (e.g., IoT data recorded every second) to a coarser level (e.g., every minute or every 10 minutes).

  • Scaling for Analysis: Convert data like hourly sales figures to daily or weekly numbers, making it easier to analyze for specific use cases like forecasting demand.

How to configure timescaling

To timescale your data:

  1. Set the base unit: Choose the time unit (second, minute, hour, or day).

  2. Set the number of base units: Enter an integer greater than 0 to define the new sampling period.

Important rules

  • Minimum timescale: The new timescale must be equal to or larger than the original sampling period.

  • Valid timescales: The new timescale must be a divisor or multiple of a standard time unit. For example:

    • allowed: Scaling from 1 hour to 8 hours or 48 hours.

    • not allowed: Scaling to 7 hours or 49 hours.

XML
"timeScale": { "baseUnit": "Day",  "value": 2}

imputation

Imputation allows you to manage and fill missing data within your datasets, ensuring the continuity and quality of your analysis. Missing data can negatively impact the performance of predictive models, making it crucial to address gaps effectively.

Why Imputation?

Handling Incomplete Data

  • Purpose: Imputation is essential for replacing missing values in your dataset, which can occur due to various reasons such as data collection errors or sensor malfunctions. By filling these gaps, you can maintain the integrity of your dataset, ensuring that it remains suitable for analysis.

  • Application: For example, in a dataset with missing hourly temperature readings, imputation allows you to fill these gaps, thereby preserving the dataset's usefulness for accurate forecasting.

Improving Model Accuracy

  • Purpose: Missing data can distort the analysis and reduce the accuracy of your models. Imputation helps to prevent these issues by providing estimates for missing values, thus allowing the model to work as if the data were complete.

  • Application: If you are predicting energy consumption and some data points are missing due to sensor downtime, imputing these values helps maintain the model's accuracy and reliability.

2. How to Configure Imputation

Select the available imputation method that best suit your data and analysis needs and define the maximum number of consecutive missing data points that you want to impute.

  • Last Observation Carried Forward (LOCF): Replaces missing values with the last observed non-missing value.

XML
"imputation": {"type": "LOCF",  "maxLength": 1}

  • Linear Imputation: Fills gaps by linearly interpolating between the data points on either side of the missing values.

XML
"imputation": {"type": "Linear",  "maxLength": 1}

  • No Imputation: Leaves missing values as they are, without any replacements.

XML
"imputation": {"type": "Linear",  "maxLength": 1}

Special Cases of Missing Data

  • Regularly Missing Data:

    • Occurs when data is consistently missing at specific times due to predictable patterns, such as:

      • Trading Days: No data on weekends in financial markets.

      • Store Hours: No sales data outside business hours.

      • Energy Production: No solar energy data during nighttime.

    • Handling: Use imputation with a set maximum gap length to fill these regular gaps, ensuring it covers the typical missing periods.

RegularlyMissingDataZoomed-684b979379c483a00850e71c05be3ceb.png

regularly missing data

  • Outliers:

    • Unusual data points caused by events like sensor failures or power outages.

    • Handling: Omit these outliers to prevent model distortion. The resulting gaps can then be imputed like other missing data.

image-20240827-142300.png

outlier data

JavaScript errors detected

Please note, these errors can depend on your browser setup.

If this problem persists, please contact our support.