Skip to content

Data update pipeline

This is the first part of the series of orchestration solution templates. The objective is to guide the user through loading SQL table into TIM Studio and creating Data Factory pipeline for automation of dataset updates.

Preparing dataset in SQL table

As a first step we need to prepare our data in SQL table. Each row of the dataset is new observation with unique datetime and each column is either a datetime, target or predictor. The datetime column has to be in datetime format (datetime/datetime2 for MSSQL). Learn more about dataset properties. The image below shows an example of a MSSQL table in valid format.

image.png

Now we want to load the dataset into TIM Studio. Login to TIM Studio web application. Once you are logged in, create your anomaly/forecasting workspace. After that, go to Datasets screen and create new SQL dataset. After a while, you should see your dataset in the list of loaded datasets.

image.png

You can try to manually update the dataset in the “Update” column on the right side. This action updates the existing dataset with new/updated rows of the SQL table.

Creating Data Factory pipeline

The goal is now to create a pipeline that invokes the dataset update automatically. The following lines are step by step guide on how to do this using Azure Data Factory.

  1. Login to Azure Data Factory. You should see welcome screen similar to the one on the image below. In the left pane go to the “Author” tab.

    image.png

  2. Create new pipeline by following the image below.

    image.png

    After that, you should see empty pipeline. To simplify the navigation in the screen we split it visually into “Factory Resources pane”, “Activity pane”, “Lower pane”, “Upper pane 1”, “Upper pane 2”, and “Workspace”, see image below.

    image.png

  3. In the “Lower pane”, go to tab “General” and type name of the pipeline - let's say "data_update" pipeline.

    image.png

  4. Then, go to the “Parameters” tab and add five parameters:

    • username – TIM Studio username
    • password – TIM Studio password
    • url – URL of TIM Studio (https://timstudio.tangent.works)
    • dataset_id – ID of the dataset, set the default to 0
    • workspace_id – ID of the workspace, set the default to 0

    Set the default values similarly as shown in the image below, we will populate them when running the pipeline.

    image.png

  5. Similarly, go to the "Variables" tab and create variable named token.

    image.png

  6. Now, go to the “Activity pane” -> “General” and find “Web”. Drag & drop this activity into the “Workspace”. Select the activity by clicking on it. In the "Lower pane" go to tab "General" and fill name of the activity, e.g. "get_auth_token". This activity will be responsible for the user authentication.

    image.png

  7. By replicating the previous step add two more activities - "Set variable" and "Web". These activities will be responsible for storing the token variable and sending the data update request respectively. Join the activities with green arrows as illustrated on the image below. The arrow indicates the order in which the activities of the pipeline will run and the green color indicates that the activity executes only if execution of the previous activity is successful.

    image.png

  8. Select the first "Web" activity. In the “Lower pane” go to tab “Settings”. Here we have to specify the API request for the user authentication. We use the username and password parameters of the pipeline defined in step 4.

    URL: @{pipeline().parameters.url}/api/auth/
    Method: POST
    Body: {"username":"@{pipeline().parameters.username}","password":"@{pipeline().parameters.password}"}
    

    Enter each of the values by clicking on the corresponding input field and selecting "Add dynamic content".

    image.png

  9. Select the "Set variable" activity. In the “Lower pane” go to tab “Settings”. Select the token variable that was defined in step 5 and set the value to:

    value: @{activity('get_auth_token').output.token}
    

    It takes the retrieved token from the "Web" activity and stores it in the token variable we created in step 5.

    image.png

  10. Select the second "Web" activity. In the “Lower pane” go to tab “Settings”. Here we have to specify the API request for updating the dataset. Fill the input fields as following:

    URL: @{pipeline().parameters.url}/api/datasets/sql/@{pipeline().parameters.dataset_id}/?no-data
    Method: PATCH
    Headers:
    Authorization: Bearer @{variables('token')}
    Body: {"workspace":@{string(pipeline().parameters.workspace_id)}}
    

    The url containts the ID of the dataset for which we want to do the update and the workspace ID is sent in the body of the request.

    image.png

  11. To test the pipeline, click on the “Debug” in the “Upper pane 1”. In the popup, we have to specify the input parameters entering the pipeline. Change the default values of “username”, “password”, and “url” parameters if needed.

    image.png

    The “dataset_id” can be found in TIM Studio in the list of datasets under the column “ID”. The “workspace_id” can be found in the same window, in the url of browser.

    image.png

    After filling the parameters, click on “Finish”. You should see the output of the debug run in the “Lower pane” -> “Output”. All three activities should end up with status "Succeeded".

    image.png

    To verify if the dataset was updated, go to the Datasets screen in TIM Studio. You should see current datetime in the “Last Updated” column of the dataset.

    image.png

  12. Now we can run the pipeline and simulate production mode. Navigate to "Upper pane 1" and select "Add trigger" -> "Trigger now". Fill the parameters similarly as when debugging the pipeline in previous step.

    In the left pane go to "Monitor" tab to see result of the pipeline run. After a while, "Succeeded" status should appear.

    image.png

  13. In the last step, go back to "Author" tab and save the pipelines by clicking on the “Publish all” button in the “Upper pane 2”.

Download the pipeline

The pipeline can be downloaded here. See Importing pipelines for importing the pipeline to your Azure Data Factory subscription.

Next steps