Skip to content

Safarveisi/BestArgoWorkflow

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

51 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Best of Argo Workflows

Install Argo Workflows on your K8s cluster

kubectl create namespace argo # If does not exist
kubectl apply -n argo -f https://github.com/argoproj/argo-workflows/releases/download/v3.7.3/install.yaml # Latest release

Create playground namespace

kubectl create namespace playground # Workflows as well as other resources are created in this namespace

Create required resources

kubectl apply -f roles/ # This service account is used in Workflow CRDs
# CAUTION Make sure you have already created the secret `s3-credentials` referenced in the configmap
kubectl apply -f configmap/ # S3 credentials and Spark application configurations

Submit a pipeline

Example

# Make sure you have argo CLI already installed (see: https://github.com/argoproj/argo-workflows/releases/tag/v3.7.3)
argo submit -n playground pipelines/artifact.yaml --watch

Check the status of your pipeline

# You can also use --watch with argo submit to track the workflow progress (see above example)
argo get @latest -n playground

Run a spark application, track its progress and delete it

(1) Deploy the Kubeflow Spark operator into your K8s cluster

# To run your Spark jobs in a namespace called playground.
# This will also create a serviceaccount in playground namespace and
# driver pod will use it to communicate with K8s API server (e.g., create executor pods).
cd helm/charts
helm upgrade --install -n spark-operator spark-operator ./spark-operator -f ./spark-operator/values.yaml --create-namespace --set "spark.jobNamespaces={playground}"

(2) Run the workflow

argo submit -n playground pipelines/k8s-orchestration.yaml --watch

Successful workflow execution

Trigger an Argo Workflow using Argo Events

Suppose you want to trigger one of your workflows using a webhook - you can use Argo Events to achieve this.

(1) Install Ago Events on your K8s cluster

kubectl create namespace argo-events
kubectl apply -n argo-events -f https://github.com/argoproj/argo-events/releases/download/v1.9.8/install.yaml
kubectl apply -n argo-events -f https://github.com/argoproj/argo-events/releases/download/v1.9.8/install-validating-webhook.yaml

(2) Create EventBus pods

kubectl apply -n argo-events -f https://raw.githubusercontent.com/argoproj/argo-events/v1.9.8/examples/eventbus/native.yaml

(3) Set up the event source for the webhook — this is one of the possible event source types supported by Argo Events.

kubectl apply -f events/source/webhook.yaml

(4) Create a service account with the necessary RBAC permissions to allow the sensor to trigger an Argo Workflow in the playground namespace. In this setup, I created a ClusterRole, which enables the workflow to be created in any namespace. Alternatively, you can create a namespace-scoped Role in playground and bind it to a service account in the argo-events namespace to achieve the same result with more restricted permissions.

kubectl apply -f events/roles

(5) Create the Argo Events sensor

Modify the value of keys for the input parameters of the spark application in the workflow manifest of Events sensor (events/sensor/spark-sensor.yaml):

arguments:
    parameters:
    - name: s3-bucket
      value:  <bucket>
    - name: s3-script-prefix
      value: <key in s3 for the main spark application>
    - name: s3-endpoint
      value: "<endpoint>"
    - name: s3-region
      value: "<region>"
    - name: s3-file-prefix # You can leave this as is
      value: '{{steps.generate-salary.outputs.parameters.s3-file-prefix}}'

Note

There are also some other steps in the workflow that write an artifact into s3. You should alter the key to suit your needs.

Now you can create the Events sensor

kubectl apply -f events/sensor/spark-sensor.yaml

(6) Define an ingress resource for exposing an Argo Events webhook event source service externally through an NGINX ingress Controller.

(6.1) Install ingress-nginx controller for ingress on your K8s cluster

cd helm/charts
helm upgrade --install -n ingress-nginx ingress-nginx ./ingress-nginx -f ./ingress-nginx/values.yaml --create-namespace

(6.2)

kubectl apply -f events/ingress/ingress.yaml

(6.3) Get the IP address of the ingress load balancer

kubectl get ingress webhook-event-source -n argo-events -o json | jq -r '.status.loadBalancer.ingress[0].ip'

(6.4) For Linux users, add the following line to /etc/hosts

load_balancer_ip   demo-argo-workflows.com

(7) Trigger the Workflow

curl -d '{}' -H "Content-Type: application/json" -X POST http://demo-argo-workflows.com:80/trigger/workflow/example

(8) Verify that an Argo Workflow was triggered

kubectl -n playground get workflows | grep k8s-orchestrate

Successful workflow trigger

(9) Access the Argo workflows UI (optional)

(9.1) Extract the bearer token

argo auth token

(9.2) Run port-forward command to access the Argo Workflow Web UI from localhost. Alternatively, you can use an ingress resource to expose the service externally.

kubectl port-forward svc/argo-server -n argo 2746:2746

(9.3) Open browser and enter https://localhost:2746/ and paste the token (whole text including Bearer) into the white rectangle. You can now view the submitted workflows.

Argo workflows-UI

(10) Delete all terminated pods in playground namespace (optional)

kubectl get pods -n playground -o name | xargs kubectl delete -n playground

About

How to write Argo Workflows for complex use cases

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published