-
Notifications
You must be signed in to change notification settings - Fork 1
/
couchbase-to-pandas.yaml
70 lines (53 loc) · 1.94 KB
/
couchbase-to-pandas.yaml
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
id: couchbase-to-pandas
namespace: company.team
tasks:
- id: query_couchbase
type: io.kestra.plugin.couchbase.Query
connectionString: couchbase://10.57.233.41
username: admin
password: admin_password
query: |
SELECT id, country, name, type, iata, icao
FROM `travel-sample`.`inventory`.`airline`
fetchType: FETCH
- id: pandas
type: io.kestra.plugin.scripts.python.Script
beforeCommands:
- pip install pandas > /dev/null
outputFiles:
- final.csv
warningOnStdErr: false
script: >
import pandas as pd
data = {{ outputs.query_couchbase.rows }}
df = pd.DataFrame(data)
agg_df = df.groupby('country')['country'].count().reset_index(name="count")
print(agg_df.head())
agg_df.to_csv("final.csv", index=False)
extend:
title: Fetch data from Couchbase and transform it with Pandas in Python
description: >-
This flow extracts data from a Couchbase database and then performs
transformation (aggregation) on the fetched results using Pandas in a Python
script.
The sample bucket `travel-sample` is used in this blueprint.
You can setup Couchbase locally on Docker using:
```bash
docker run -d --name db -p 8091-8096:8091-8096 -p 11210-11211:11210-11211
couchbase
```
Once the container is up and running, go to http://localhost:8091/ in your
browser and select "Setup New Cluster". Set up the new cluster by providing
the appropriate Cluster Name, Admin Username, and Password.
Once the cluster is created, load data from `Sample Buckets` present under
the `Settings` tab. The data from the sample bucket `travel-sample` is used
in this blueprint.
tags:
- Ingest
- SQL
- Python
ee: false
demo: false
meta_description: "This flow extracts data from a Couchbase database and then
performs transformation (aggregation) on the fetched results using Pandas in
a Python script. "