Skip to content

Dataset Management

After installing, importing and initializing the Python client, the user is completely set to start utilizing the client's functionalities.

The first step to using TIM is to start working with datasets. This section explains how to use the TIM Python client to upload a dataset to the TIM repository.

upload_dataset - upload a dataset

upload_dataset(self, dataset: pandas.core.frame.DataFrame, configuration: tim.data_sources.dataset.types.UploadCSVConfiguration = {}, handle_status_poll: Union[Callable[[tim.data_sources.dataset.types.DatasetStatusResponse], NoneType], NoneType] = None) -> tim.data_sources.dataset.types.UploadDatasetResponse

The upload_dataset method serves to upload a dataset to the TIM repository. This method is called on the authenticated instance of Tim created as described in Authentication ("client"), like in the following statement:

client.upload_dataset(dataset = <dataset>, configuration = <configuration>, handle_status_poll = <callback function>)

using keyword arguments, or in the following statement:

client.upload_dataset(<dataset>, <configuration>, <callback function>)

using positional arguments, where <dataset> and <configuration> are replaced by the DataFrame and Dictionary representing them, respectively, and <callback function> is replaced by an optional callback function for status polling..

The arguments are:

  • dataset: a DataFrame containing the dataset, which consists of time-series data
  • configuration: a Dictionary containing metadata of the dataset. This is an optional argument, available keys are:
    • timestampFormat: a string describing the format of the timestamps,
    • timestampColumn: a string containing the name of the timestamp column, or an integer containing the index of the timestamp column,
    • decimalSeparator: the decimal separator used,
    • name: the desired name for the dataset in the TIM repository,
    • description: the desired description for the dataset in the TIM repository,
    • samplingPeriod: the sampling period of the data,
  • handle_status_poll: a callback function handling polling for the status and progress of the dataset upload.

This method returns the following data:

  • id: the ID of the uploaded dataset;
  • metadata: a Dictionary if the upload was successful, containing the following keys:
    • id: the ID of the uploaded dataset,
    • name: the name of the uploaded dataset,
    • description: the description of the uploaded dataset,
    • isFavorite: a flag indicating whether this dataset is a favorite,
    • estimatedSamplingPeriod: the estimated sampling period of this dataset,
    • createdAt: the time of creation of this dataset,
    • createdBy: the id of the user who created/uploaded this dataset,
    • updatedAt: the time of the last update of this dataset (if applicable),
    • updatedBy: the id of the user who last updated this dataset (if applicable),
    • latestVersion: a Dictionary containing the following keys:
      • id: the ID of the latest version of the dataset,
      • status: the status of the latest version of the dataset, possible values are "Failed", "Finished" and "FinishedWithWarning",
      • numberOfObservations: the number of observation in the latest version of the dataset,
      • numberOfVariables: the number of variables in the latest version of the dataset,
      • firstTimestamp: the timestamp of the first observation in the dataset,
      • lastTimestamp: the timestamp of the last observation in the dataset,
    • workspace: a Dictionary containing the following keys:
      • id: the ID of the workspace in which the dataset resides,
      • name: the name of the workspace in which the dataset resides;
  • logs: a list of Dictionaries, each of which contain the following keys:
    • message: the log message,
    • messageType: the type of the message, possible values are "Info", "Debug" and "Warning",
    • createdAt: the time of creation of the log,
    • origin: the origin of the log, in this case this will be "Upload".

delete_dataset - delete a dataset

delete_dataset(self, dataset_id: str) -> tim.types.ExecuteResponse

The delete_dataset method deletes a dataset from the TIM repository. This method is called on the authenticated instance of Tim created as described in Authentication ("client"), like in the following statement:

client.delete_dataset(dataset_id = <dataset ID>)

using keyword arguments, or in the following statement:

client.delete_dataset(<dataset ID>)

using positional arguments, where <dataset ID> is replaced by the ID of the dataset to delete.

The argument is:

  • dataset_id: the ID of the dataset to delete.

This method returns a Dictionary with the following keys:

  • message: a message indicating what has happened (the dataset has successfully been deleted),
  • code: a code providing more information on this message; if the deletion was successful, this code will be "DM09038".

If an error is encountered, a similar Dictionary will be returned with the keys message and code containing additional information about the error.

get_datasets - retrieve a list of available datasets

get_datasets(self, offset: Optional[int] = None, limit: Optional[int] = None, workspace_id: Optional[str] = None, sort: Optional[tim.types.SortDirection] = None) -> List[tim.data_sources.dataset.types.Dataset]

The get_datasets method retrieves a list of available datasets from the TIM repository. This method is called on the authenticated instance of Tim created as described in Authentication ("client"), like in the following statement:

client.get_datasets(offset = <offset>, limit = <limit>, workspace_id = <workspace ID>, sort = <sorting order>)

using keyword arguments, or in the following statement:

client.get_datasets(<offset>, <limit>, <workspace ID>, <sorting order>)

using positional arguments, where <offset>, <limit>, <workspace ID> and <sorting order> are replaced by the relevant values.

The arguments are:

  • offset: the number of datasets to be skipped from the beginning of the list (related to pagination), this is an optional argument with a default value of 0,
  • limit: the maximum number of datasets to be returned, this is an optional argument with a default value of 10000,
  • workspace_id: a filter on the ID of the workspace a dataset resides in, this is an optional argument with a default value of None,
  • sort: a sorting order to sort results by, possible values are "+createdAt" and "-createdAt", where "+" and "-" indicate ascending and descending order, respectively. This is an optional argument with a default value of "-createdAt" (most recently created datasets are returned first).

This method returns a list of Dictionaries, each of which include the following data:

  • id: the ID of the dataset,
  • name: the name of the dataset,
  • description: the description of the dataset,
  • isFavorite: a flag indicating whether this dataset is a favorite,
  • estimatedSamplingPeriod: the estimated sampling period of this dataset,
  • createdAt: the time of creation of this dataset,
  • createdBy: the id of the user who created/uploaded this dataset,
  • updatedAt: the time of the last update of this dataset (if applicable),
  • updatedBy: the id of the user who last updated this dataset (if applicable),
  • latestVersion: a Dictionary containing the following keys:
    • id: the ID of the latest version of the dataset,
    • status: the status of the latest version of the dataset, possible values are "Failed", "Finished" and "FinishedWithWarning",
    • numberOfObservations: the number of observation in the latest version of the dataset,
    • numberOfVariables: the number of variables in the latest version of the dataset,
    • firstTimestamp: the timestamp of the first observation in the dataset,
    • lastTimestamp: the timestamp of the last observation in the dataset,
  • workspace: a Dictionary containing the following keys:
    • id: the ID of the workspace in which the dataset resides,
    • name: the name of the workspace in which the dataset resides.

get_dataset_versions - retrieve the list of versions of a dataset

get_dataset_versions(self, id: str, offset: Optional[int] = None, limit: Optional[int] = None) -> List[tim.data_sources.dataset.types.DatasetListVersion]

The get_dataset_versions method retrieves the list of available dataset versions related to a specific dataset from the TIM repository. This method is called on the authenticated instance of Tim created as described in Authentication ("client"), like in the following statement:

client.get_dataset_versions(id = <dataset ID>, offset = <offset>, limit = <limit>)

using keyword arguments, or in the following statement:

client.get_dataset_versions(<dataset ID>, <offset>, <limit>)

using positional arguments, where <dataset ID>, <offset> and <limit> are replaced by the relevant values.

The arguments are:

  • id: the ID of the dataset from which to retrieve the versions,
  • offset: the number of datasets to be skipped from the beginning of the list (related to pagination), this is an optional argument with a default value of 0,
  • limit: the maximum number of datasets to be returned, this is an optional argument with a default value of 10000.

This method returns a list of Dictionaries, each of which include the following data:

  • id: the ID of the dataset version,
  • status: the status of the dataset version, possible values are "Failed", "Finished", "FinishedWithWarning" and "Registered",
  • createdAt: the time of creation of this dataset version.