Skip to content

Commit

Permalink
Add caching page (#559)
Browse files Browse the repository at this point in the history
* Add caching page

* Update 14.caching.md
  • Loading branch information
anna-geller authored Oct 4, 2023
1 parent 00260fb commit 59b01a1
Showing 1 changed file with 103 additions and 0 deletions.
103 changes: 103 additions & 0 deletions content/docs/05.developer-guide/14.caching.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,103 @@
---
title: Caching
---

## Cache files in a `WorkingDirectory` task

The file caching functionality on the `WorkingDirectory` task allows you to cache a subset of files to speed up your workflow execution. This is especially useful when you work with sizeable package dependencies that don't change often.

::alert{type="info"}
Kestra can only cache files installed or created as part of the [script](03.scripts.md) tasks if the script uses a `PROCESS` runner. If the script uses a `DOCKER` runner, the files will not be cached and the `WorkingDirectory` task will [throw an error](https://github.com/kestra-io/kestra/issues/2233): `Unable to execute WorkingDirectory post actions`.
::

### Use cases for file caching

The file caching is useful if you want to install some `pip` or `npm` packages before running your [script](03.scripts.md). You can cache the `node_modules` or Python `venv` folder to avoid re-installing the dependencies on each run.

To do that, add a `cache` to your `WorkingDirectory` task. The `cache` property accepts a list of glob `patterns` to match files to cache. The cache will be automatically invalidated after a specified time-to-live using the `ttl` property accepting a duration.

```yaml
id: caching_files
namespace: dev

tasks:
- id: working_dir
type: io.kestra.core.tasks.flows.WorkingDirectory
cache:
patterns:
- some_directory/**
ttl: PT1H
```
### How does it work under the hood
Kestra packages the files that need to be cached and stores them in the internal storage. When the task is executed again, the cached files are retrieved, initializing the working directory with their contents.
### Node.js example
Here's an example of a flow that installs the `colors` package before running a Node.js script. The `node_modules` folder is cached for one hour.

```yaml
id: node_cached_dependencies
namespace: dev
tasks:
- id: working_dir
type: io.kestra.core.tasks.flows.WorkingDirectory
cache:
patterns:
- node_modules/**
ttl: PT1H
tasks:
- id: node_script
type: io.kestra.plugin.scripts.node.Script
beforeCommands:
- npm install colors
script: |
const colors = require("colors");
console.log(colors.red("Hello"));
```

### Python example

Here's an example of a flow that installs the `pandas` package before running a Python script. The `venv` folder is cached for one day.

```yaml
id: python_cached_dependencies
namespace: dev
tasks:
- id: working_dir
type: io.kestra.core.tasks.flows.WorkingDirectory
tasks:
- id: python_script
type: io.kestra.plugin.scripts.python.Script
runner: PROCESS
warningOnStdErr: false
beforeCommands:
- python -m venv venv
- source venv/bin/activate
- pip install pandas
script: |
import pandas as pd
print(pd.__version__)
cache:
patterns:
- venv/**
ttl: PT24H
```

### How to invalidate the cache

Here's how that works:
- After the first run, the files are cached
- The next time the task is executed:
- if the `ttl` didn't pass, the files are retrieved from cache
- If the `ttl` passed, the cache is invalidated and no files will be retrieved from cache; because cache is no longer present, the `npm install` command from the `beforeCommands` property will take a bit longer to execute
- If you edit the task and change the `ttl` to:
- a longer duration e.g. `PT5H` — the files will be cached for five hours using the new `ttl` duration
- a shorter duration e.g. `PT5M` — the cache will be invalidated after five minutes using the new `ttl` duration.

The `ttl` is evaluated at runtime. If the most recently set `ttl` duration has passed as compared to the last task run execution date, the cache is invalidated and the files are no longer retrieved from cache.

0 comments on commit 59b01a1

Please sign in to comment.