Subscribe to the Teradata Blog

Get the latest industry news, technology trends, and data science insights each week.

How to Connect Teradata Vantage to Azure Blob Storage to Query JSON Files

How to Connect Teradata Vantage to Azure Blob Storage to Query JSON Files
Many Teradata customers are interested in integrating Teradata Vantage with Microsoft Azure First Party Services. This Getting Started Guide will help you to connect Teradata Vantage using Native Object Store capability with Azure Blob Storage.*

Disclaimer: This guide includes content from both Microsoft and Teradata product documentation.
 

Overview

We will use Teradata Vantage using the Native Object Store (NOS) feature to query JSON files on Microsoft Azure Blob Storage, where you can keep vast amounts of semi-structured or unstructured data. If you already have JSON files on your blob storage account, then you can skip to Connect Teradata Vantage using Native Object Store to Azure Blob Storage section.

There are many ways to create and process JSON data. In this getting started guide, we will use a Raspberry Pi online simulator to send JSON strings to an Azure IoT hub device, then route messages to the Azure Event hub, use an Azure Stream Analytic job to process,  and then send it to our Azure Blob Storage container as JSON files. Finally, we will use the NOS feature in Vantage to connect and query the JSON files.

This is a diagram of the workflow.
Picture1.png

Prerequisites

You are expected to be familiar with Azure services and Teradata Vantage with Native Object Store (NOS).
You will need the following accounts, objects, and systems. Links have been included with setup instructions.

Create an Azure Blob Storage account and container

We need an Azure Blob Storage account to store our JSON files.

1. Logon to the Azure portal and create a storage account.

Provide your Subscription, Resource group, Storage account name and Location. Leave the remaining fields set to their default values. Click Review+create to validate and click Create.

We suggest that you use the same location for all services you create. This will avoid confusing errors for this example.
2. Once the deployment is completed, click on Go to resource to create a container for your JSON files.

3. Click on Containers icon and then click on +Container and provide a new container name. Leave Public access level as “Private” for this example.
Picture1-(1).png
4. Click on Access keys and copy Storage account name and key1, which we will use later in the Connect Teradata Vantage Native Object Store to Azure Blob Storage – Create Authorization Object section.
Picture1-(2).png

Create an Azure Event Hub namespace and hub

We need to create Event Hub namespace and event hub endpoint for Iot Hub device to send messages.

1. Using the portal, create an event hubs namespace and event hub. Provide a Namespace name, choose Standard tier, create a new or use an existing Resource Group and Location. Leave the remaining fields set to their default values. Click Create. You may have to wait a few minutes for the system to fully provision the resources.

Open the Resource Group to see your Event Hub namespace.
Picture1-(3).pngWe do not cover the Throughput Units and Enable Auto-Inflate properties in this guide. Please see the Azure documentation for details.

Next, select your Event Hub namespace and create an event hub by clicking +Event Hub. Provide a Event Hub name. Leave remaining fields set to their default values and click Create.
Picture1-(4).png
The Event Hub capture features supports only the Avro format to Azure Blob Storage. This guide uses Stream Analytics to move data to accommodate the JSON format.

After the event hub is created, you should see it in the list of event hubs below.
Picture1-(5).png

Create Azure Stream Analytic job

We want to move Event Hub messages to Azure Blob Storage. We have chosen to process data from our event hub using Stream Analytics

1. Click on the created event hub (for example, eventhub1) and click on Process data.
Picture1-(6).png
2. Click on Explore to display the Query dialog. If there are messages in your Event Hub, you can process your messages by clicking Create.

3. Click Deploy query to create a Stream Analytic job to move Event Hub messages to Azure Blob Storage.
Picture1-(7).png4. Enter a Job name in the New Stream Analytic job pane and leave remaining fields set with their default values.

5. Click Create.

6. Click on the Outputs UI and click +Add to define Blob Storage as an output.
Picture1-(8).png7. Provide the Output alias, Storage account and container information in the Blob Storage/Data Lake Storage Gen2 pane. The Path pattern (for example, <directory>/{date}/{time}) is optional.

8. Ensure that Event serialization format is set to JSON. Leave remaining fields set to their default values and click Save.
Picture1-(9).png

You can find more information on Path pattern property at Azure Stream Analytic custom blob output partitioning.
 
8. Exit the Output pane by clicking in the upper right corner.
 
9. Edit and save query with new Output alias (for example, output1).
 
10. Exit the Query pane by clicking in the upper right corner.
 
11. Click Start job with the Now option.
Picture1-(10).png

Create Azure IoT Hub to route messages

We need to create an IoT Hub instance, register an IoT hub device, and setup a route/custom endpoint to get messages from our Raspberry Pi online simulator to Azure IoT hub and finally to Event Hub.

1. In the portal, create an IoT Hub, register a new device and define message routing.

2. Click on Create a resource at the top left side and search for IoT Hub. Click Create.

3. In the Basic tab, create a new or use an existing Resource Group, provide an Iot Hub Name, and in the choose B1: Basic tier in the Size and Scale tab. Leave remaining fields set to default values and click Review+create to validate. Click Create.

For more information see choosing the right IoT Hub tier. Configuring the Number of IoT Hub units property and a more general conversation on performance considerations is not covered here.

The online simulator requires a device identify in the registry to connect to a hub.

4. Click Go to resource or in your IoT hub pane, open IoT Devices, then select +New to add a device in your IoT hub. Provide a Device ID name and click Save.
Picture1-(11).png

5. Click Refresh in the IoT devices dialog to display your device and click on Device. Copy the Primary Connection String which we will use later to connect our Raspberry Pi online simulator to our IoT hub device (for example, rasppi).
Picture1-(12).png

We need to configure a route and endpoint for IoT Hub device to send messages to Event Hub.

6. In your IoT Hub pane, click Message routing under Messaging to define a route and custom endpoint.
Picture1-(13).png

Routes is the first tab on the Message Routing pane.

7. Click +Add to add a new route. You see the following screen. Enter a Name for your route and choose an endpoint. For the endpoint, you can select one from the dropdown list, or add a new one. In this example, click on +Add endpoint and choose Event Hub.
Picture1-(14).png
8. In the Add an event hub endpoint pane, provide an Endpoint name, an existing Event hub namespace, and an Event hub instance. Click Create.
Picture1-(15).png
9. Click Save to save routing rule. You should see the new routing rule displayed.
Picture1-(16).png
10. In the Custom endpoints tab, click Refresh to see your custom endpoint rule displayed under Event Hubs. Ensure that the Status is Healthy.
Picture1-(17).png

Configure Raspberry Pi online simulator for new device

We will use the Raspberry Pi online simulator as our source for JSON strings. Data will be sent to the IoT hub registed device, which we created in previous section.

This is an example of a JSON string that the simulator will generate. (Line wrapping was added for clarity).
{"messageId":2,
"deviceId":"Raspberry Pi Web Client",
"temperature":29.288325949023434,
"humidity":77.5147906}


We need to add the IoT hub device Primary Connection String for the online simulator to connect.

1. Click on START RASPBERRY PI SIMULATOR.

2. Edit line 15 and replace '[Your IoT hub device connection string]'; with your Primary Connection String.
Picture1-(18).png

3. Run the online simulator for a few minutes. You should see JSON files in your blob storage container.

4. Click Stop to stop the simulator.

If desired, you can view your blob storage data.

A. Logon to Azure portal and click Resource groups.

B. Find and click your resource group with storage account.

C. Click container name.
Picture1-(19).png
D. Click on the JSON file and Edit to view data.
Picture1-(20).png
Alternatively, you can use Azure Storage Explorer, although configuring the Azure Storage Explorer is not discussed in this guide.

Connect Teradata Vantage Native Object Store to Azure Blob Storage

Native Object Store is a new capability included with Teradata Vantage 17.0 that makes it easy to explore data sets located on external objects stores, like Azure Blob Storage, using standard SQL.

Once configured, an object on Azure Blob will look just like a table in Vantage.

Detailed information can be found in the Native Object Store – Teradata Vantage Advance SQL Engine 17.0 (Orange Book).

Vantage needs an authorization object to gain access to a Azure Blob Storage account. Authorization objects provide more security than just relying on Azure Blob Storage authorization.

The CREATE AUTHORIZATION DDL statement requires the storage account name to be specified in the USER field and the Azure access key we saved earlier in the PASSWORD field.

You must create the authorization object in the same database as the foreign table that will reference it.

1. Create an authorization object using syntax similar to the following.
CREATE AUTHORIZATION DefAuth_AZ
AS DEFINER TRUSTED
USER 'mystorageacctrs' /* storage account name */
PASSWORD '********';  /* storage account key */

If you delete and recreate your storage account with the same name, the access keys will change. Therefore, you must drop and recreate your authorization objects.

A foreign table allows external data to be easily referenced within the Vantage Advanced SQL Engine and makes the data available in a structured, relational format. Below is a example of the syntax to create a simple foreign table.

The storage LOCATION information includes three parts, separated by slashes: the AZ prefix, the storage account name including the “blob.core.windows.net” suffix, and the container name.

If external security authentication is used, then the EXTERNAL SECURITY DEFINER TRUSTED DefAuth_AZ syntax in the DDL statement.

2. Create a foreign table using syntax similar to the following.
 
CREATE MULTISET FOREIGN TABLE json ,FALLBACK ,EXTERNAL SECURITY DEFINER TRUSTED DefAuth_AZ,
     MAP = TD_MAP1
     (
      Location VARCHAR(2048) CHARACTER SET UNICODE CASESPECIFIC,
      Payload JSON(8388096) INLINE LENGTH 32000 CHARACTER SET UNICODE)
USING
(
      LOCATION  ('/AZ/mystorageacctrs.blob.core.windows.net/json')
      MANIFEST  ('FALSE')
      PATHPATTERN  ('$Var1/$Var2/$Var3/$Var4/$Var5/$Var6/$Var7/$Var8/$Var9/$Var10/$Var11/$Var12/$Var13/$Var14/$Var15/$Var16/$Var17/$Var18/$Var19/$Var20')
      ROWFORMAT  ('{"record_delimiter":"\n", "character_set":"UTF8"}')
      STOREDAS  ('TEXTFILE')
)
NO PRIMARY INDEX ;

Now that the foreign table is created, you can access the object on Azure Blob Storage.

JSON is essentially a list of keys and values. You can obtain a list of the keys (attributes) with the JSON_KEYS table operator with the following statement.
 
SELECT DISTINCT * FROM JSON_KEYS (ON (SELECT payload FROM json)) AS j

Picture1-(21).png
You can obtain a list of the actual values by selecting a small number of rows without any filtering.
 
SELECT Payload.* FROM json

Picture1-(22).pngYou can obtain the name-value pairs that includes the fields and values by omitting the asterisk from the Payload keyword.
SELECT Payload FROM json

Picture1-(23).pngIt is best practice to create a view on top of a table to provide a layer between users and tools and the underlying table.

Below is an example of a view on the JSON foreign table.
CREATE VIEW json_perm_cols_v AS (
SELECT
CAST(payload.messageId AS INT) MessageID,
          CAST(payload.deviceId AS VARCHAR(25)) DeviceID,
                CAST(payload.temperature AS FLOAT) Temperature,
          CAST(payload.humidity AS FLOAT) Humidity
       FROM json
       );


At this point, the view may be used by users and tools.

Native Object Store does not automatically make a persistent copy of the external object that it reads through the foreign table.

You can copy the data in the external object into a table in Vantage. Below are a few simple COPY options.

The CREATE TABLE AS…WITH DATA statement creates a No Primary Index (NoPI) relational table as the target table by default. If you do not take further action, this NoPI table will have just the two columns: Location and Payload.

CREATE TABLE json_perm AS (select * from json) WITH DATA

It is better to create columns for each payload. The example below does this.
 
CREATE TABLE json_perm_cols AS
(SELECT
       CAST(payload.messageId AS INT) MessageID,
       CAST(payload.deviceId AS VARCHAR(25)) DeviceID,
       CAST(payload.temperature AS FLOAT) Temperature,
       CAST(payload.humidity AS FLOAT) Humidity
       FROM json
       )
WITH DATA
NO PRIMARY INDEX

Another option is using an INSERT…SELECT statement. This approach does require that the permanent table be created beforehand. The example below first creates the table and then performs the insert.
 
CREATE SET TABLE json_perm_empty ,FALLBACK ,
     NO BEFORE JOURNAL,
     NO AFTER JOURNAL,
     CHECKSUM = DEFAULT,
     DEFAULT MERGEBLOCKRATIO,
     MAP = TD_MAP1
     (
      MessageId INTEGER,
      DeviceId VARCHAR(25) CHARACTER SET LATIN NOT CASESPECIFIC,
      Temperature FLOAT,
      Humidity FLOAT)
PRIMARY INDEX ( MessageId );
 
INSERT into.json_perm_empty
SELECT
    payload.messageId,
    payload.deviceId,
    payload.temperature,
    payload.humidity
FROM json

 

 

*Although this approach has been implemented and tested internally, it is offered on an as-is basis. Microsoft does not provide validation of Teradata Vantage using Native Object Store capability with Azure services.
 
Portrait of Kevin Bogusch

(Author):
Kevin Bogusch

Kevin Bogusch is the Cloud Product Marketing Manager at Teradata. Kevin has spent his 25-year career in IT from programming to system and database administration, from teaching to sales and solution architecting. He has a BS in Computer Engineering, an MS in Computer Information Systems, and an MBA in Operations Management. He is a certified Teradata Vantage Master and certified AWS Professional Solutions Architect.
 
  View all posts by Kevin Bogusch
Portrait of Rupal Shah

(Author):
Rupal Shah

Rupal Shah is a member of Teradata Partners Technical Consultant team. Prior to consulting on the Microsoft partnership, he was a technical consultant for the IBM Cognos and Oracle Hyperion partnerships. Along with his extensive experience working with business intelligence and ‘in-database’ solutions, Rupal has worked with various Teradata application organizations for whom he provided database consulting. He received his B.A. in Math and Computer Science from the University of California at San Diego, and he is currently based in San Diego. View all posts by Rupal Shah

Turn your complex data and analytics into answers with Teradata Vantage.

Contact us