Add EMR plugin to trigger a Spark job and wait for its completion #563
Labels
area/plugin
Plugin-related issue or feature request
enhancement
New feature or request
kind/customer-request
Requested by one or more customers
Feature description
Specs
CreateCluster
type
:io.kestra.plugin.aws.emr.cluster.CreateCluster
description
: Create an EMR clusterclusterName
- STRING - requiredreleaseLabel
- STRING, default to latestdescription: specifies the EMR release version label
applications
- ARRAY ["Spark"]enableDebugging
- BOOLEANlogUri
- STRINGdescription: a URI in S3 for log files is required when debugging is enabled
masterInstanceType
- STRINGslaveInstanceType
- STRINGinstanceCount
- INTEGERsteps
- ARRAY - list of jobs to runkeepJobFlowAliveWhenNoSteps
- BOOLEANvpcARN
- STRINGdescription: VPC arn
subnetARN
- STRINGdescription: Subnet arn
emrRole
- STRINGdescription: IAM service role
emrEc2Role
- STRINGdescription: IAM service role
(these below are probably needed to authenticate the Java client)
References & Example
AWS documentation example:
👉 Note for @mgabelle
It seems like the RunJobFlowRequest is the method to look for
Let's implement the
steps
property so Kestra user can either: create cluster and run job directly OR simply create cluster (KeepJobFlowAliveWhenNoSteps=true
, see note below)DeleteCluster
type
:io.kestra.plugin.aws.emr.cluster.DeleteCluster
description
: Delete an EMR clusterid
: cluster id to deleteSeems like the
TerminateJobFlowsRequest
is the method to use (first list clusters and then call the TerminateJobFlows). See this exampleAddJobFlowsSteps
type
:io.kestra.plugin.aws.emr.cluster.AddJobFlowsSteps
description
: Adds new steps to a running clusterThe text was updated successfully, but these errors were encountered: