Skip to content

Commit

Permalink
Tutorial update (#538)
Browse files Browse the repository at this point in the history
* Update tutorial

* Fixes

* Update index.md

* (docs) final fixes
  • Loading branch information
anna-geller authored Sep 26, 2023
1 parent c2a8b0e commit 43f8993
Show file tree
Hide file tree
Showing 23 changed files with 9,193 additions and 4,941 deletions.
16 changes: 8 additions & 8 deletions content/docs/01.getting-started.md
Original file line number Diff line number Diff line change
Expand Up @@ -43,10 +43,10 @@ If you want to extend your Docker Compose file, modify container networking, or
Navigate to **Flows** in the left menu, then click the "Create" button and paste the following configuration to create your first flow:

```yaml
id: getting-started
id: getting_started
namespace: dev
tasks:
- id: hello
- id: hello_world
type: io.kestra.core.tasks.log.Log
message: Hello World!
```
Expand All @@ -64,11 +64,11 @@ Click on **Save** and start your first execution via the **Create Execution** bu
## Next Steps
Congrats on your first flow! :clap:
Congrats! You've just installed Kestra and executed your first flow! :clap:
Next, we suggest following the documentation in this order:
- Check the [tutorial](./02.tutorial/index.md)
Next, you can follow the documentation in this order:
- Check the [tutorial](./02.tutorial/index.md)
- Learn core [concepts](./03.concepts/index.md)
- Read the [Developer Guide](./05.developer-guide/index.md) for more advanced concepts
- Check the available [Plugins](../plugins/index.md) to integrate with external systems and automate your tasks
- [Deploy](./09.administrator-guide/index.md) your Kestra instance to remote development and production environments.
- Read the [Developer Guide](./05.developer-guide/index.md) for an in-depth explanation of all key concepts
- Check the available [Plugins](../plugins/index.md) to integrate with external systems and start orchestrating your applications, microservices and processes
- [Deploy](./09.administrator-guide/index.md) Kestra to remote development and production environments.
159 changes: 126 additions & 33 deletions content/docs/02.tutorial/01.fundamentals.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,79 +2,172 @@
title: Fundamentals
---

This section will guide you through the fundamentals of Kestra.
Let's start by building a "Hello world" example.

::alert{type="info"}
To install Kestra, check the [Getting Started](../01.getting-started.md).
To install Kestra, check the [Getting Started](../01.getting-started.md) page.
::

## Flows

In Kestra, we define flows using the declarative YAML language.
Flows are defined in a declarative YAML syntax to keep the orchestration code portable and language-agnostic.

We identify them by an `id` and a `namespace`. The `id` is a unique identifier inside the [namespace](../03.concepts/01.flows.md#namespace), which is used to group flows. Flows can also have a [description](../05.developer-guide/01.flow.md#document-your-flow) and labels.
Each flow consists of three **required** components: `id`, `namespace` and `tasks`:
1. `id` represents the name of the flow
2. `namespace` can be used to separate development and production environments
3. `tasks` is a list of tasks that will be executed in the order they are defined.

Here are those three components in a YAML file:

```yaml
id: kestra-tutorial
namespace: io.kestra.tutorial
labels:
env: PRD
id: getting_started
namespace: dev
tasks:
- id: hello_world
type: io.kestra.core.tasks.log.Log
message: Hello World!
```
The `id` of a flow must be **unique within a namespace**. For example:
- ✅ you can have a flow named `getting_started` in the `dev` namespace and another flow named `getting_started` in the `prod` namespace.
- ❌ you cannot have two flows named `getting_started` in the `dev` namespace at the same time.

The combination of `id` and `namespace` serves as a **unique identifier** for a flow.


### Namespaces

[Namespaces](../03.concepts/01.flows.md#namespace) are used to group flows and provide structure. Keep in mind, however, that allocation of a flow to a namespace is immutable. Once a flow is created, you cannot change its namespace. If you need to change the namespace of a flow, create a new flow with the desired namespace and delete the old flow.


### Labels

To add another layer of organization, you can use [labels](../03.concepts/01.flows.md#labels), allowing you to group flows using key-value pairs.


### Description(s)

You can optionally add a [description](../05.developer-guide/01.flow.md#document-your-flow) property to keep your flows documented. The `description` is a string that supports **markdown** syntax. That markdown description will be rendered and displayed in the UI.

::alert{type="info"}
Not only flows can have a description. You can also add a `description` property to `tasks` and `triggers` to keep all components of your workflow documented.
::

Here is the same flow as before, but this time with **labels** and **descriptions**:

```yaml
id: getting_started
namespace: dev
description: |
# Kestra Tutorial
As you notice, we can use markdown here.
# Getting Started
Let's `write` some **markdown** - [first flow](https://t.ly/Vemr0) 🚀

labels:
owner: rick.astley
project: never-gonna-give-you-up

tasks:
- id: hello_world
type: io.kestra.core.tasks.log.Log
message: Hello World!
description: |
## About this task
This task will print "Hello World!" to the logs.
```
Discover more about flows in the [Flows](../05.developer-guide/01.flow.md) section.
Learn more about flows in the [Flows section](../05.developer-guide/01.flow.md) in the Developer Guide.
---
## Tasks
We use Tasks to write flows. We define a Task by an `id`, a `type`, and some properties related to its type. Each Task is a step in your Flow that will execute a specific action. For example, you can use a Task to run a Python script.
Tasks are atomic actions in your flows. You can design your tasks to be small and fine-granular, e.g. fetching data from a REST API or running a self-contained Python script. However, tasks can also represent large and complex processes, e.g. triggering containerized processes or long-running batch jobs (e.g. using dbt, Spark, AWS Batch, Azure Batch, etc.) and waiting for their completion.
### The order of task execution
Tasks are defined in form of a **list**. By default, all tasks in the list will be executed **sequentially** — the second task will start as soon as the first one finishes successfully.
Kestra provides additional **customization** allowing to run tasks **in parallel**, iterating (_sequentially or in parallel_) over a list of items, or to **allow failure** of specific tasks. Those are called [`Flowable` tasks](05.flowable.md) because they define the flow logic.

A task in Kestra must have an `id` and a `type`. Other properties depend on the task type. You can think of a task as a step in a flow that should execute a specific action, such as running a Python or Node.js script in a Docker container, or loading data from a database.

```yaml
tasks:
- id: python
type: io.kestra.plugin.scripts.python.Script
docker:
image: python:slim
script: |
print("Hello World!")
```

### Autocompletion

Kestra supports [hundreds of tasks](../../plugins/index.md) integrating with various external systems. Use the shortcut `CTRL + SPACE` on Windows/Linux or `fn + control + SPACE` on Mac to trigger **autocompletion** listing available tasks or properties of a given task.

::alert{type="info"}
At the moment of writing this guide, [Kestra has over 300 tasks](../../plugins/index.md), which can be challenging to remember. That's why we provide an auto-completion feature to help you find the Task you need. Use the shortcut `CTRL or ⌘ + SPACE` to activate it. If you want to **comment out** some part of your code, you can use the `CTRL or ⌘ + K + C` shortcut, and to uncomment it, use `CTRL or ⌘ + K + U`. To remember it, `C` stands for `comment` and `U` stands for `uncomment`.
If you want to **comment out** some part of your code, use the `CTRL or ⌘ + K + C` shortcut, and to uncomment it, use `CTRL or ⌘ + K + U`. To remember it, `C` stands for `comment` and `U` stands for `uncomment`. All available keyboard shortcuts are listed upon right-clicking anywhere in the code editor.
::

![Autocompletion](/docs/tutorial/fundamentals/autocomplete.gif)
![Autocompletion](https://kestra.io/autocompletion.gif)

---

## Supported task types

Let's look at supported task types.

### Core

The **Core tasks** from the `io.kestra.core.tasks.flows` category are commonly used to control the flow logic. You can use them to declare which processes should run **in parallel** or **sequentially**. You can specify **conditional branching**, **iterating** over a list of items, **pausing** or allowing certain tasks to fail without failing the Execution.

### Scripts

## Create your first Flow
**Script tasks** are used to run scripts in Docker containers or local processes. You can use them to run Python, Node.js, R, Julia, or any other script. You can also use them to execute a series of commands in Shell or PowerShell. Check the [Script tasks](../05.developer-guide/03.scripts.md) page for more details.

Now, let's create our first Flow. On the left side of the screen, click on the **Flows** menu.
Then, click on the **Create** button.
### Internal Storage

![Access flow creation](/docs/tutorial/fundamentals/create-button.png)
Tasks from the `io.kestra.core.tasks.storages` category, along with [Outputs](../05.developer-guide/05.outputs.md), are used to interact with **internal storage**. Kestra uses internal storage to **pass data between tasks**. You can think of internal storage as an S3 bucket. In fact, you can use your private S3 bucket as internal storage. This storage layer helps avoid proliferation of connectors. For example, you can use the Postgres plugin to extract data from a Postgres database and load it to internal storage. Other task(s) can read that data from internal storage and load it to other systems such as Snowflake, BigQuery or Redshift, or process it using any other plugin, without requiring point to point connections between each of them.

Use the following Flow in the Editor, then click the **Save** button.
This Flow will download a CSV file from the French Open Data Portal.
### State Store

Internal storage is mainly used to pass data within a single flow execution. If you need to pass data between different flow executions, you can use the **State Store**. The tasks `Set`, `Get` and `Delete` from the `io.kestra.core.tasks.states` category allow you to persist files between executions (even across namespaces). For example, if you are using [dbt](https://www.getdbt.com/), you can leverage the State Store to persist the `manifest.json` file between executions and implement the slim CI pattern.

### ⚡️ Plugins

Apart from **core tasks**, the [plugins library](../../plugins/index.md) provides a wide range of integrations. Kestra has built-in plugins for data ingestion, data transformation, interacting with databases, object stores, or message queues, and the list keeps growing with every new release. On top of that, you can also [create your own plugins](../10.plugin-developer-guide) to integrate with any system or programming language.

---

## Create and run your first flow

Now, let's create and run your first flow. On the left side of the screen, click on the **Flows** menu. Then, click on the **Create** button.

![Create flow](/docs/tutorial/fundamentals/create_button.png)

Paste the following code to the Flow editor:

```yaml
id: kestra-tutorial
namespace: io.kestra.tutorial
labels:
env: PRD
description: |
# Kestra Tutorial
As you notice, we can use markdown here.
id: getting_started
namespace: dev
tasks:
- id: download
type: io.kestra.plugin.fs.http.Download
uri: "https://gist.githubusercontent.com/tchiotludo/2b7f28f4f507074e60150aedb028e074/raw/6b6348c4f912e79e3ffccaf944fd019bf51cba30/conso-elec-gaz-annuelle-par-naf-agregee-region.csv"
- id: api
type: io.kestra.plugin.fs.http.Request
uri: https://dummyjson.com/products
```

After saving it, you will see a **New Execution** button. Click on it and watch your first Flow running.
Then, hit the **Save** button.

![Create flow](/docs/tutorial/fundamentals/save_button.png)

![New execution](/docs/tutorial/fundamentals/new-execution.png)
This flow has a single task that will fetch data from the [dummyjson](https://dummyjson.com/) API. Let's run it!

![New execution](/docs/tutorial/fundamentals/new_execution.png)

::next-link
[The next step is to add Inputs to your flow](./02.inputs.md)
[Next, let's parametrize this flow using `inputs`.](./02.inputs.md)
::


85 changes: 46 additions & 39 deletions content/docs/02.tutorial/02.inputs.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,68 +2,75 @@
title: Inputs
---

Flows can have Input. They are parameters provided when starting a flow. They help make your flow more dynamic and reusable.
Inputs allow you to make your flows more dynamic and reusable. Instead of hardcoding values in your flow, you can use inputs to make your workflows more adaptible to change.

## How to retrieve inputs

Inputs can be accessed in any task using a special `{{ inputs.input_name }}` [variable](../05.developer-guide/03.variables/01.index.md).

---

## Defining inputs

We defined Inputs in the `inputs` section of the flow file. They must have a `name` and a `type`.
You can also set a `defaults` value.
Similarly to tasks, inputs is a list of key-value pairs. Each input must have a `name` and a `type`. You can also set `defaults` for each input. Setting default values for an input is always recommended, especially if you want to run your flow on a schedule.

To reference an input value in your flow, use the `{{ inputs.input_name }}` syntax.

```yaml
id: inputs_demo
namespace: dev

inputs:
- name: isTutorial
type: BOOLEAN
defaults: true
- name: user
type: STRING
defaults: Rick Astley

tasks:
- id: hello
type: io.kestra.core.tasks.log.Log
message: Hey there, {{ inputs.user }}
```
| Type | Description |
|---------|------------------------------------------------------------------------|
| STRING | No control is done on this input type (no parsing), can be any string. |
| INT | Must be a valid integer (without any decimals). |
| BOOLEAN | Must be a valid true or false as string. |
Try running the above flow with different values for the `user` input. You can do this by clicking on the `New execution` button in the UI and then typing the desired value in the menu.

Discover more types on the [input documentation](../05.developer-guide/04.inputs.md ).
![Inputs](/docs/tutorial/inputs.png)

## Accessing inputs
::alert{type="info"}
The plural form of `defaults` rather than `default` has two reasons. First, `default` is a reserved keyword in Java, so it couldn't be used. Second, this property allows you to set default value for a JSON object which can simultaneously be an array defining multiple default values.
::

Kestra includes a templating engine to access variables in your flow. Use the `{{ variable }}` syntax to access them.
Find more on the [variable documentation.](../05.developer-guide/03.variables/01.index.md)
Here are the most common input types:

Inputs can be accessed in the flow using the `{{ inputs.name }}` syntax.
| Type | Description |
|---------|-------------------------------------------------------------------------------------------------------|
| STRING | It can be any string value. Strings are not parsed, they are passed as-is to any task that uses them. |
| INT | It can be any valid integer number (without decimals). |
| BOOLEAN | It must be either `true` or `false`. |

Check the [inputs documentation](../05.developer-guide/04.inputs.md) for a full list of supported input types.

## Add inputs to your flow
---

In our example, we will provide the URL of the CSV file we want to download in Input. But we will set a default value if the user doesn't provide it.
## Parametrize your flow

```yaml
inputs:
- name: url
type: STRING
defaults: "https://gist.githubusercontent.com/tchiotludo/2b7f28f4f507074e60150aedb028e074/raw/6b6348c4f912e79e3ffccaf944fd019bf51cba30/conso-elec-gaz-annuelle-par-naf-agregee-region.csv"
```
In our example, we will provide the URL of the API as an input. This way, we can easily change the URL when calling the flow without having to modify the flow itself.

::collapse{title="Click here to see the full flow"}
```yaml
id: kestra-tutorial
namespace: io.kestra.tutorial
labels:
env: PRD
description: |
# Kestra Tutorial
As you notice, we can use markdown here.
id: getting_started
namespace: dev
inputs:
- name: url
- name: api_url
type: STRING
defaults: "https://gist.githubusercontent.com/tchiotludo/2b7f28f4f507074e60150aedb028e074/raw/6b6348c4f912e79e3ffccaf944fd019bf51cba30/conso-elec-gaz-annuelle-par-naf-agregee-region.csv"
defaults: https://dummyjson.com/products
tasks:
- id: download
type: io.kestra.plugin.fs.http.Download
uri: "{{ inputs.url }}"
- id: api
type: io.kestra.plugin.fs.http.Request
uri: "{{ inputs.api_url }}"
```
::


::next-link
[Follow the next step to see what's your task outputs](./03.outputs.md)
[Next, let's look at outputs](./03.outputs.md)
::
Loading

1 comment on commit 43f8993

@vercel
Copy link

@vercel vercel bot commented on 43f8993 Sep 26, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please sign in to comment.