-
Notifications
You must be signed in to change notification settings - Fork 1
/
druid-to-redshift.yaml
79 lines (71 loc) · 2.61 KB
/
druid-to-redshift.yaml
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
id: druid-to-redshift
namespace: company.team
tasks:
- id: query_druid
type: io.kestra.plugin.jdbc.druid.Query
url: jdbc:avatica:remote:url=http://localhost:8888/druid/v2/sql/avatica/;transparent_reconnection=true
sql: |
SELECT __time as edit_time, channel, page, user, delta, added, deleted
FROM wikipedia
fetchType: STORE
- id: write_to_csv
type: io.kestra.plugin.serdes.csv.IonToCsv
from: "{{ outputs.query_druid.uri }}"
- id: upload
type: io.kestra.plugin.aws.s3.Upload
accessKeyId: "{{ secret('AWS_ACCESS_KEY_ID') }}"
secretKeyId: "{{ secret('AWS_SECRET_ACCESS_KEY') }}"
region: eu-central-1
from: "{{ outputs.write_to_csv.uri }}"
bucket: <bucket-name>
key: wikipedia/input/wikipedia.csv
- id: create_table
type: io.kestra.plugin.jdbc.redshift.Query
url: jdbc:redshift://123456789.eu-central-1.redshift.amazonaws.com:5439/dev
username: "{{ secret('REDSHIFT_USER') }}"
password: "{{ secret('REDSHIFT_PASSWORD') }}"
sql: |
create table if not exists wikipedia
(
edit_time varchar(100),
channel varchar(100),
page varchar(1000),
wiki_user varchar(100),
edit_delta integer,
added integer,
deleted integer
);
- id: insert_into_redshift
type: io.kestra.plugin.jdbc.redshift.Query
url: jdbc:redshift://123456789.eu-central-1.redshift.amazonaws.com:5439/dev
username: "{{ secret('REDSHIFT_USER') }}"
password: "{{ secret('REDSHIFT_PASSWORD') }}"
sql: |
COPY wikipedia
FROM 's3://<bucket-name>/wikipedia/input/wikipedia.csv'
credentials
'aws_access_key_id=<access-key>;aws_secret_access_key=<secret-key>'
IGNOREHEADER 1
CSV;
extend:
title: Query data from Druid and load it to Redshift
description: >-
This flow will:
1. Query data from a Druid server
2. Convert the result to a CSV file
3. Upload the CSV file to S3
4. Create a table in Redshift if it does not exist
5. Insert data from the CSV file into the Redshift table.
To set up Apache Druid locally, follow the instructions mentioned in the
[official documentation](https://druid.apache.org/docs/latest/tutorials/).
The example Druid dataset `wikipedia` has been used in this flow.
tags:
- S3
- Ingest
- SQL
- AWS
ee: false
demo: false
meta_description: This flow will query data from a Druid server, convert the
result to a CSV file upload the CSV file to S3, create a table in Redshift
if it does not exist, insert data from the CSV file into the Redshift table.