How-to Sync Data From AWS S3 to iomete

This is an end-to-end guide that explains how to move files from your AWS S3 to iomete and show it in the built-in BI dashboard.

Intro

This is an end-to-end guide about how to move files from your AWS S3 to iomete and show it in the BI dashboard. In this example, we use a JSON file, but for other file types (such as CSV, Parquet, and ORC) please check out: https://docs.iomete.com/external-data-sources/reading-raw-files


Your files in AWS S3

Let's say you have a dedicated bucket where you have files you want to move to iomete

📘

Note: This bucket will be different in your case. This is just an example bucket for demonstration purpose

We want to query/migrate countries.json file in iomete platform

32863286

The file (countries.json) we want to move to iomete


📘

You could download thecountries.json file for yourself with this command:

wget https://iomete-public.s3.eu-central-1.amazonaws.com/datasets/countries.json


Create a storage integration in iomete

  1. Choose AWS External Storage

  1. Specify a name and enter your AWS S3 Location to create integration between to
23302330

Create AWS External Stage Storage Integration

  1. Once it is created copy policies created to be added to your S3 Bucket permissions

  1. Go to your AWS S3 Bucket and add generated JSON policy to your S3 Bucket's Permission

Create warehouse

Create a new warehouse instance and specify the storage integration you created in the previous step.

Moving Data

In the SQL Editor, you should be able to query the file and migrate to iomete using the following methods
Querying JSON file data without moving to iomete

Once you decided that you want to move data to iomete you could use the following commands

Non-partitioned Table

  • Option 1. Create a table from select
-- Create table directly from the query
CREATE TABLE countries
AS SELECT  * FROM json.`s3a://my-staging-area-for-iomete/countries.json`

-- To inspect the table use the following query
DESC TABLE EXTENDED countries;
  • Option 2. Insert into to existing table
-- just append data
INSERT INTO countries
SELECT  * FROM json.`s3a://my-staging-area-for-iomete/countries.json`

-- first clean an existing data and then insert new data
INSERT OVERWRITE TABLE countries
SELECT  * FROM json.`s3a://my-staging-area-for-iomete/countries.json`
  • Option 3. Merge with existing data
MERGE INTO countries
    USING (SELECT  * FROM json.`s3a://my-staging-area-for-iomete/countries.json`) updates
ON countries.id = updates.id
WHEN MATCHED THEN
    UPDATE SET *
WHEN NOT MATCHED
    THEN INSERT *

Partitioning data to speed up queries or DML that have predicates involving the partition columns. Here we use country_code as a partition column

Partitioned Table

  • Option 1. Create a partitioned table from select
-- Create a partitioned table directly from the query
CREATE TABLE countries_partitioned
  PARTITIONED BY (country_code)
AS SELECT * FROM json.`s3a://my-staging-area-for-iomete/countries.json` 
ORDER BY country_code;
         
-- To inspect the table use the following query
DESC TABLE EXTENDED countries_partitioned;
  • Option 2. Insert into to existing table
-- just append data
INSERT INTO countries_partitioned
  SELECT  * FROM json.`s3a://my-staging-area-for-iomete/countries.json`
  ORDER BY country_code;

-- or you can use the follwing command to overwerite data
INSERT OVERWRITE TABLE countries_partitioned
  SELECT  * FROM json.`s3a://my-staging-area-for-iomete/countries.json`
  ORDER BY country_code;
  • Option 3. Merge with existing data
MERGE INTO countries_partitioned
  USING (SELECT  * FROM json.`s3a://my-staging-area-for-iomete/countries.json`) updates
ON countries_partitioned.id = updates.id
WHEN MATCHED THEN
  UPDATE SET *
WHEN NOT MATCHED
  THEN INSERT *

Visualize Data

First, let's create a view with clean column names to be used in BI dashboarding:

CREATE OR REPLACE VIEW countries_view 
  AS SELECT 
      country_code, 
      region, 
      `SP.POP.TOTL` AS sp_pop_totl 
     FROM countries;

Open BI Application


Add new dataset

From the menu choose Data -> Dataset and click + Dataset button on the right top corner


Create a new chart

Click on the newly created dataset countries_view which opens chart view. Choose the visualization type and corresponding settings:


Save this chart to the dashboard too and navigate to the dashboard. And, here is the dashboard of the Countries that we just created

Congratulations 🎉🎉🎉


Did this page help you?