Use cases for Argo Workflows range from MLOps to the CI processes associated with GitOps (which you would likely run with the related tool Argo CD) to any data-intensive processing you need to carry out in advance of an application’s deployment, during runtime, or for cleanup after a given resource has been torn down.
Argo Workflows work with a K8s operator – a Kubernetes native application – that is deployed in your K8s cluster. This application extends the native behavior of your cluster by watching Etcd, the central datastore associated with a cluster, for Argo-specific manifests – called Custom Resource Definitions – that define the workflow process to be carried out.
When the operator finds a workflow definition in Etcd, it then creates the required K8s objects and executes the workflow as it has been defined. For example, it may run each of the involved processes in parallel, or – according to the workflow definition – certain steps may be completed in order to satisfy specific requirements of other steps in the workflow before any other steps begin.
The order of processes being carried out can be controlled in several ways. For example, steps within the larger workflow can depend on other steps in the workflow, which means that any dependencies of a given step will have been completed before a given step is carried out, or steps can be made to run sequentially, or if the steps are running independently of each other, they can be made to run in parallel to speed up the execution of the larger workflow.
Custom resource definitions are defined in YAML manifests much like native K8s resources, but they are different in that they extend the capabilities of K8s. Below, we have an example of a CRD of the kind Workflow. Native K8s objects such as Pods, ReplicaSets, Deployments, and others are defined with this same kind tag, but in this case, we have a custom resource that is specific to Argo Workflows called – appropriately enough – a workflow.
Once the Argo operator has been installed in your cluster, which we’ll look at below, this manifest can be applied to your cluster exactly as a native K8s object’s manifest would be – you simply run the following with Kubectl:
kubectl apply -f <workflow-definition-filename.yaml>
- name: whalesay
args: ["hello world"]
Each step included in the workflow is containerized, allowing Argo to run each step in a K8s Pod. It is possible to include complex Pod definitions that involve multiple varieties of containers in a given Pod, such as sidecar containers for example, that run alongside the main containerized step that is being run within the workflow, or to even run multiple containers within a given Pod to simplify passing output from one step to the next within a workflow, as is often required when a given step depends on another.
The workflow definition above contains a single step, which is defined by a template that runs a simple containerized process called “whalesay” provided by Docker.
Above the templates tag is an entrypoint tag defining the first template to be run within the workflow. In this case, there is only one template to be run, but we’ll look at combining templates into a more complex workflow below.
Now let’s run the above example workflow locally in minikube. To do so, we’ll first have to complete the following prerequisites:
brew install helm
brew install minikube
Start the minikube cluster:
brew install argo
First, we'll create a namespace for the Argo Workflows controller:
kubectl create namespace argo
Next, we'll install with Helm, like so:
helm install my-argo-workflows -n argo argo/argo-workflows --version 0.22.4
Next, we can review the created resources via the minikube dashboard by running the following:
Notice that the resources that make up the Argo operator have been created in the namespace “argo.”
With the operator installed, we can move on to running the example workflow included above by running:
argo submit https://raw.githubusercontent.com/argoproj/argo-workflows/master/examples/hello-world.yaml --watch
When we do, we’ll see the following output in the terminal, because we included the --watch flag.
ServiceAccount: unset (will run with the default ServiceAccount)
Created: Tue Dec 27 13:31:33 -0800 (11 seconds ago)
Started: Tue Dec 27 13:31:33 -0800 (11 seconds ago)
Finished: Tue Dec 27 13:31:43 -0800 (1 second ago)
Duration: 10 seconds
ResourcesDuration: 4s*(1 cpu),4s*(100Mi memory)
STEP TEMPLATE PODNAME DURATION MESSAGE
✔ hello-world-rfkcz whalesay hello-world-rfkcz-whalesay-102532164 6s
Finally, let’s go back to the minikube dashboard and see what resources were created.
Notice that (because we didn’t specify a namespace) the workflow ran in the Default namespace, as Argo can carry out workflows in any namespace by default – not just the argo namespace to which it has been deployed.
And we can see that the workflow (as described above) has run in a K8s Pod that was created by Argo Workflows specifically for this workflow.
Argo Workflows is an extremely flexible workflow engine that runs directly in Kubernetes. It works by watching Etcd, the central K8s datastore, for custom resource definitions, of the kind Workflow, that are created and deployed much like native Kubernetes objects. Above, we walked through the process of installing the Argo operator and running a sample workflow in minikube.