-
Notifications
You must be signed in to change notification settings - Fork 100
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
* Add caching page * Update 14.caching.md
- Loading branch information
1 parent
00260fb
commit 59b01a1
Showing
1 changed file
with
103 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,103 @@ | ||
--- | ||
title: Caching | ||
--- | ||
|
||
## Cache files in a `WorkingDirectory` task | ||
|
||
The file caching functionality on the `WorkingDirectory` task allows you to cache a subset of files to speed up your workflow execution. This is especially useful when you work with sizeable package dependencies that don't change often. | ||
|
||
::alert{type="info"} | ||
Kestra can only cache files installed or created as part of the [script](03.scripts.md) tasks if the script uses a `PROCESS` runner. If the script uses a `DOCKER` runner, the files will not be cached and the `WorkingDirectory` task will [throw an error](https://github.com/kestra-io/kestra/issues/2233): `Unable to execute WorkingDirectory post actions`. | ||
:: | ||
|
||
### Use cases for file caching | ||
|
||
The file caching is useful if you want to install some `pip` or `npm` packages before running your [script](03.scripts.md). You can cache the `node_modules` or Python `venv` folder to avoid re-installing the dependencies on each run. | ||
|
||
To do that, add a `cache` to your `WorkingDirectory` task. The `cache` property accepts a list of glob `patterns` to match files to cache. The cache will be automatically invalidated after a specified time-to-live using the `ttl` property accepting a duration. | ||
|
||
```yaml | ||
id: caching_files | ||
namespace: dev | ||
|
||
tasks: | ||
- id: working_dir | ||
type: io.kestra.core.tasks.flows.WorkingDirectory | ||
cache: | ||
patterns: | ||
- some_directory/** | ||
ttl: PT1H | ||
``` | ||
### How does it work under the hood | ||
Kestra packages the files that need to be cached and stores them in the internal storage. When the task is executed again, the cached files are retrieved, initializing the working directory with their contents. | ||
### Node.js example | ||
Here's an example of a flow that installs the `colors` package before running a Node.js script. The `node_modules` folder is cached for one hour. | ||
|
||
```yaml | ||
id: node_cached_dependencies | ||
namespace: dev | ||
tasks: | ||
- id: working_dir | ||
type: io.kestra.core.tasks.flows.WorkingDirectory | ||
cache: | ||
patterns: | ||
- node_modules/** | ||
ttl: PT1H | ||
tasks: | ||
- id: node_script | ||
type: io.kestra.plugin.scripts.node.Script | ||
beforeCommands: | ||
- npm install colors | ||
script: | | ||
const colors = require("colors"); | ||
console.log(colors.red("Hello")); | ||
``` | ||
|
||
### Python example | ||
|
||
Here's an example of a flow that installs the `pandas` package before running a Python script. The `venv` folder is cached for one day. | ||
|
||
```yaml | ||
id: python_cached_dependencies | ||
namespace: dev | ||
tasks: | ||
- id: working_dir | ||
type: io.kestra.core.tasks.flows.WorkingDirectory | ||
tasks: | ||
- id: python_script | ||
type: io.kestra.plugin.scripts.python.Script | ||
runner: PROCESS | ||
warningOnStdErr: false | ||
beforeCommands: | ||
- python -m venv venv | ||
- source venv/bin/activate | ||
- pip install pandas | ||
script: | | ||
import pandas as pd | ||
print(pd.__version__) | ||
cache: | ||
patterns: | ||
- venv/** | ||
ttl: PT24H | ||
``` | ||
|
||
### How to invalidate the cache | ||
|
||
Here's how that works: | ||
- After the first run, the files are cached | ||
- The next time the task is executed: | ||
- if the `ttl` didn't pass, the files are retrieved from cache | ||
- If the `ttl` passed, the cache is invalidated and no files will be retrieved from cache; because cache is no longer present, the `npm install` command from the `beforeCommands` property will take a bit longer to execute | ||
- If you edit the task and change the `ttl` to: | ||
- a longer duration e.g. `PT5H` — the files will be cached for five hours using the new `ttl` duration | ||
- a shorter duration e.g. `PT5M` — the cache will be invalidated after five minutes using the new `ttl` duration. | ||
|
||
The `ttl` is evaluated at runtime. If the most recently set `ttl` duration has passed as compared to the last task run execution date, the cache is invalidated and the files are no longer retrieved from cache. | ||
|