1 of 22

MLStacks

Getting Started

Introduction

🌰 In a nutshell: What is MLStacks?

MLStacks is a Python package that allows you to quickly spin up MLOps infrastructure using Terraform. It is designed to be used with ZenML, but can be used with any MLOps tool or platform.

Simply write stack and component YAML specification files and deploy them using the MLStacks CLI. MLStacks will take care of the rest. We currently support modular MLOps stacks on AWS, GCP and K3D (for local use).

👷 Why We Built MLStacks

When we first created ZenML as an extensible MLOps framework for creating portable, production-ready MLOps pipelines, we saw many of our users having to deal with the pain of deploying infrastructure from scratch to run these pipelines. The community consistently asked questions like:

How do I deploy tool X with tool Y?
Does a combination of tool X with Y make sense?
Isn't there an easy way to just try these stacks out to make an informed decision?

To address these questions, the ZenML team presents you a series of Terraform-based stacks to quickly provision popular combinations of MLOps tools. These stacks will be useful for you if:

You are at the start of your MLOps journey, and would like to explore different tools.
You are looking for guidelines for production-grade deployments.
You would like to run your MLOps pipelines on your chosen ZenML Stack.

🔥 Do you use these tools or do you want to add one to your MLOps stack? At ZenML, we are looking for design partnerships and collaboration to implement and develop these MLOps stacks in a real-world setting.

If you'd like to learn more, please join our Slack and leave us a message!

🤓 Learn More

Try the Quickstart example below to get started with MLStacks.
Discover what you can configure with the different stacks in the Stacks documentation.
Learn about our CLI commands in the CLI documentation.

🙏🏻 Acknowledgements

Thank you to the folks over at Fuzzy Labs for their support and contributions to this repository. Also many thanks to Ali Abbas Jaffri for several stimulating discussions around the architecture of this project.

We'd also like to acknowledge some of the cool inspirations for this project:

Installation

MLStacks is a Python package that can be installed using pip. It is recommended that you install MLStacks in a virtual environment. You can install MLStacks using the following command:

pip install mlstacks

Other requirements and dependencies

MLStacks uses Terraform on the backend to manage infrastructure. You will need to have Terraform installed. Please visit the Terraform docs for installation instructions.

MLStacks also uses Helm to deploy Kubernetes resources. You will need to have Helm installed. Please visit the Helm docs for installation instructions.

If you're using a Mac, you will need to install jq in order for some of the Terraform deployment commands and scripts to work.

If you want to use the mlstacks breakdown command to get cost estimates for your MLOps stacks, you'll need to also have infracost installed as well as to be logged in. Please visit the Infracost docs for installation instructions.

Cloud provider installation

MLStacks currently supports the following stack providers:

If you wish to deploy using these providers you'll need to have accounts (for AWS and GCP) and the relevant CLIs installed and authenticated. You will also need to have the relevant permissions to deploy, manage and destroy resources in these accounts. Please refer to the documentation for those providers for more information.

Quickstart

AWS Quickstart

This quickstart will guide you through deploying a simple stack on AWS using mlstacks. We'll be deploying a simple S3 bucket. This is as simple and quick an example of how mlstacks works as it gets.

Prerequisites

First, install the mlstacks CLI:

pip install mlstacks

You'll need an active AWS account to get started. You will also need sufficient permissions to be able to create and destroy resources.

If you don't have Terraform or Helm installed, you should also install them.

Defining our stack

Then, create a file called quickstart_stack.yaml wherever you have access to the mlstacks tool. In this file, add the following:

spec_version: 1
spec_type: stack
name: "quickstart_stack"
provider: aws
default_region: "eu-north-1"
default_tags:
  deployed-by: "mlstacks"
components:
  - simple_component_s3.yaml

This defines our stack using the mlstacks specification. We'll now define the component that we want to deploy in a separate file called simple_component_s3.yaml:

spec_version: 1
spec_type: component
component_type: "artifact_store"
component_flavor: "s3"
name: "quickstart_s3_bucket"
provider: aws
metadata:
  config:
    bucket_name: "quickstart-s3-bucket"
  tags:
    deployed-by: "mlstacks"
  region: "eu-north-1"

Deploying our stack

Now, we can deploy our stack using the mlstacks CLI:

mlstacks deploy -f quickstart_stack.yaml

This will deploy our stack to AWS. It will also deploy/provision an S3 bucket (beginning with zenml-mlstacks-remote-state by default) which will be used as a remote state store and backend for your Terraform assets. This will happen first before the deployment of your stack. You can now check your AWS console to see that the stack (and remote state bucket) has been deployed.

Get stack outputs

You can get the outputs of your stack using the mlstacks CLI:

mlstacks output -f quickstart_stack.yaml

This will print out the outputs of your stack, which you can use in your pipelines.

Destroying our stack

Finally, we can destroy our stack (and the remote state S3 bucket) using the mlstacks CLI:

mlstacks destroy -f quickstart_stack.yaml

What next?

You can now try adding more components and deploying them to your cloud provider. You can also try deploying your stack to a different cloud provider.

Good luck! And if you have any questions, feel free to reach out to us on Slack

GCP Quickstart

This quickstart will guide you through deploying a simple stack on GCP using mlstacks. We'll be deploying a simple storage bucket. This is as simple and quick an example of how mlstacks works as it gets.

Prerequisites

First, install the mlstacks CLI:

pip install mlstacks

You'll need an active GCP account and project to get started. (If you don't have one, you can create one following these instructions. You will also need sufficient permissions to be able to create and destroy resources.

If you don't have Terraform or Helm installed, you should also install them.

Defining our stack

Then, create a file called quickstart_stack.yaml wherever you have access to the mlstacks tool. In this file, add the following:

spec_version: 1
spec_type: stack
name: "quickstart_stack"
provider: gcp
default_region: "europe-north1"
default_tags:
  deployed-by: "mlstacks"
components:
  - simple_component_gcs.yaml

This defines our stack using the mlstacks specification. We'll now define the component that we want to deploy in a separate file called simple_component_gcs.yaml:

spec_version: 1
spec_type: component
component_type: "artifact_store"
component_flavor: "gcp"
name: "quickstart_gcs_bucket"
provider: gcp
metadata:
  config:
    bucket_name: "quickstart_gcs_bucket"
    project_id: "<YOUR_GCP_PROJECT_ID_HERE>"
  tags:
    deployed-by: "mlstacks"
  region: "eu-north1"

Deploying our stack

Now, we can deploy our stack using the mlstacks CLI:

mlstacks deploy -f quickstart_stack.yaml

This will deploy our stack to GCP. It will also deploy/provision a GCS bucket (beginning with zenml-mlstacks-remote-state by default) which will be used as a remote state store and backend for your Terraform assets. This will happen first before the deployment of your stack. You can now check your GCP console to see that the stack (and remote state bucket) has been deployed.

Get stack outputs

You can get the outputs of your stack using the mlstacks CLI:

mlstacks output -f quickstart_stack.yaml

This will print out the outputs of your stack, which you can use in your pipelines.

Destroying our stack

Finally, we can destroy our stack (and the remote state GCS bucket) using the mlstacks CLI:

mlstacks destroy -f quickstart_stack.yaml

What next?

You can now try adding more components and deploying them to your cloud provider. You can also try deploying your stack to a different cloud provider.

Good luck! And if you have any questions, feel free to reach out to us on Slack

K3D Quickstart

This quickstart will guide you through deploying a simple stack using K3D to simulate a cloud provider using mlstacks. We'll be deploying a simple minio bucket. This is as simple and quick an example of how mlstacks works as it gets.

Prerequisites

First, install the mlstacks CLI:

Defining our stack

Then, create a file called quickstart_stack.yaml wherever you have access to the mlstacks tool. In this file, add the following:

This defines our stack using the mlstacks specification. We'll now define the component that we want to deploy in a separate file called simple_component_minio.yaml:

Deploying our stack

Now, we can deploy our stack using the mlstacks CLI:

This will deploy our stack to a local K3D cluster. You can now check your K3D console to see that the stack and the minio bucket has been deployed.

Get stack outputs

You can get the outputs of your stack using the mlstacks CLI:

This will print out the outputs of your stack, which you can use in your pipelines.

Destroying our stack

Finally, we can destroy our stack using the mlstacks CLI:

What next?

You can now try adding more components and deploying them to this K3D provider. You can also try deploying your stack to an actual cloud provider instead of this local environment.

Core Concepts

MLStacks is built around common concepts that are used to describe infrastructure for machine learning and MLOps. This section will introduce you to these concepts and how they are used in MLStacks.

What's a stack?

A Stack is a collection of stack components, where each component represents the respective configuration regarding a particular function in your MLOps pipeline such as orchestration systems, artifact repositories, and model deployment platforms.

As a shorthand, you can think of a stack as a grouping of these components.

What's a component?

Components are the building-blocks of stacks. MLStacks currently supports the following stack components:

artifact_store: An artifact store is a component that can be used to store artifacts. (e.g. S3 buckets on AWS)
container_registry: A container registry is a component that can be used to store container images. (e.g. ECR on AWS)
experiment_tracker: An experiment tracker is a component that can be used to track experiments, including metrics, parameters, and artifacts. (e.g. MLFlow)
orchestrator: An orchestrator is a component that can be used to orchestrate machine learning pipelines. (e.g. Airflow)
mlops_platform: An MLOps platform is a component that can be used to deploy, monitor, and manage machine learning models. (e.g. ZenML)
model_deployer: A model deployer is a component that can be used to deploy machine learning models. (e.g. Seldon Core)
step_operator: A step operator is a component that can be used to execute steps that require custom hardware.

How does MLStacks work?

MLStacks is built around the concept of a stack specification. A stack specification is a YAML file that describes the stack and includes references to component specification files. A component specification is a YAML file that describes a component. (Currently all deployments of components (in various combinations) must be defined within the context of a stack.)

Once you write your stack specification, you can then use MLStacks' CLI to deploy your stack to your preferred cloud (or local K3d) provider. Terraform definitions are stored in your global configuration directory. MLStacks allows you to deploy or connect to a remote state store (e.g. S3, GCS, etc.) so that you can collaborate on your stacks and deployed infrastructure with your colleagues.

Your global configuration directory could be in a number of different places depending on your operating system, but read more about it in the Click docs to see which location applies to your situation. This is where the stack specs and the Terraform definition files are located.

Stacks

Writing Spec Files

Stack specification

The core of a stack is the stack.yaml file. This file contains all the information needed to deploy a stack. It contains the following fields:

spec_version: 1
spec_type: stack
name: <STACK_NAME>
provider: <YOUR_PROVIDER>
default_region: <REGION_NAME>
default_tags:
  - <TAG_KEY>: <TAG_VALUE>
components:
  - <COMPONENT_FILENAME_GOES_HERE>

Let's go through each of these fields in detail.

`spec_version`

This field defines the version of the mlstacks specification that this stack uses. This is currently 1 and is set as the default.

`spec_type`

This field defines the type of the specification. This is currently stack.

`name`

This field defines the name of the stack. This is used to identify the stack when deploying, destroying, or getting outputs from the stack.

`provider`

This field defines the provider that the stack will be deployed to. This is currently one of k3d, gcp, or aws.

`default_region`

This field defines the default region that the stack will be deployed to. If you specify a region that doesn't exist for your particular provider, the stack deployment will fail.

If you don't specify a region in your stack specification, mlstacks will use whatever is set as the default region for your provider. Note that this will differ between providers.

`default_tags`

This field defines the default tags that will be applied to all resources created by the stack. This is useful for identifying resources created by the stack.

This is an optional field.

`components`

This field defines the components that will be deployed by the stack. This is a list of component filenames.

Component specification

The core of a component is the component.yaml file. This file contains all the information needed to deploy a component. It contains the following fields:

spec_version: 1
spec_type: component
component_type: <COMPONENT_TYPE>
component_flavor: <COMPONENT_FLAVOR>
name: <COMPONENT_NAME>
provider: <YOUR_PROVIDER>
metadata:
  config:
    <CONFIG_KEY>: <CONFIG_VALUE>
  tags:
    <TAG_KEY>: <TAG_VALUE>
  region: <REGION_NAME>

Let's go through each of these fields in detail.

`spec_version`

This field defines the version of the mlstacks specification that this component uses. This is currently 1. This is set as the default.

`spec_type`

This field defines the type of the specification. This is currently component.

`component_type`

This field defines the type of the component. Available component types currently include:

artifact_store: An artifact store is a component that can be used to store artifacts.
container_registry: A container registry is a component that can be used to store container images.
experiment_tracker: An experiment tracker is a component that can be used to track experiments.
orchestrator: An orchestrator is a component that can be used to orchestrate pipelines.
mlops_platform: An MLOps platform is a component that can be used to orchestrate pipelines, track experiments, and manage the overall connection of MLOps components and tools together.
model_deployer: A model deployer is a component that can be used to deploy models.
step_operator: A step operator is a component that can be used to execute steps in a pipeline using custom hardware or platforms.

`component_flavor`

This field defines the flavor of the component. This is used to differentiate between different implementations of the same component type. For example, the artifact_store component type has the following flavors:

minio: A MinIO artifact store.
s3: An S3 artifact store.
gcp: A GCP/GCS artifact store.

`name`

This field defines the name of the component. This is used to identify the component when deploying, destroying, or getting outputs from the component.

`provider`

This field defines the provider that the component will be deployed to. This is currently one of k3d, gcp, or aws.

`metadata`

This field defines the metadata of the component. This is an (optional) dictionary with the following fields:

`config`

This field defines the configuration of the component. This is a dictionary you can pass in arbitrary fields to configure the component. For example for an artifact store, as shown in the quickstart examples, you can pass in a bucket_name field to configure the bucket name of the artifact store.

Config is usually optional (except in the case of GCP deployments when you need to specify a project_id.)

`environment_variables`

This field defines the environment variables of the component. This is a dictionary you can pass in arbitrary fields to configure the component. For example you might want certain environment variables to be set and defined ahead of the deployment of certain components. Environment variables are optional.

AWS

Supported components and flavors

Coming Soon!

Airflow Orchestrator on AWS
Feast Feature Store on AWS
Label Studio Annotator on AWS
Model Registry components on AWS
Image Builder components on AWS

GCP

Important Notes for GCP Deployments

All GCP deployments require the inclusion of a GCP Project ID in the metadata's config for each component. This is because GCP resources are tied to a project and cannot be created without one.

Supported components and flavors

Coming Soon!

Airflow Orchestrator on GCP
Feast Feature Store on GCP
Label Studio Annotator on GCP
Model Registry components on GCP
Image Builder components on GCP

Azure

MLStacks doesn't support Azure Cloud yet. We are working on adding support for modular stacks on Azure. In the meanwhile, be sure to check out matcha from FuzzyLabs which caters to deployments using Azure. If you'd like to contribute, please join our Slack and leave us a message or open an issue in the repository!

K3D

The K3D Modular recipe is available in the mlstacks repository and you can view the raw Terraform files here.

A full list of supported components and flavors can be found in the Supported Components and Flavors section, as can a list of components that are coming soon.

Supported components and flavors

Component

Flavor(s)

Artifact Store

minio

Container Registry

default

Experiment Tracker

mlflow

Orchestrator

kubeflow, kubernetes, sagemaker, tekton

MLOps Platform

zenml

Model Deployer

seldon

Coming Soon!

Airflow Orchestrator on AWS
Feast Feature Store on AWS
Label Studio Annotator on AWS
Model Registry components on AWS
Image Builder components on AWS

Reference

CLI

MLStacks is a CLI tool that allows you to deploy and manage your ML infrastructure using the MLStacks specification. You can install the CLI using the following command:

Deploying a stack

You can deploy a stack using the mlstacks deploy command. This command takes a path to a stack specification file as an argument. For example, if you have a stack specification file called stack.yaml, you can deploy it using the following command:

If you want to drop into the internal Terraform log messages and prompts, turn on debug mode with the -d or --debug flag:

Using remote state with a team

MLStacks deploys a remote state bucket to the same cloud provider as you're using for your stack by default. This remote state backend has a default name that begins with zenml-mlstacks-remote-state and is deployed first before your stack gets deployed.

If you'd like to connect to a pre-existing state bucket that you or a colleague have already created, you can do so by passing the bucket name to the mlstacks deploy command:

This will then connect to the remote state bucket and use that as the backend for your stack deployment.

Getting stack outputs

Once you have a stack deployed, you can get the outputs of the stack using the mlstacks output command. This command takes a path to a stack specification file as an argument. For example, if you have a stack specification file called stack.yaml, you can get the outputs of the stack using the following command:

This will print out the outputs of the stack, which you can use in your pipelines. If you just want a single output you can add the -k or --key option and pass in the name of the output you want:

Destroying a stack

You can destroy a stack using the mlstacks destroy command. This command takes a path to a stack specification file as an argument. For example, if you have a stack specification file called stack.yaml, you can destroy it using the following command:

If you want to drop into the internal Terraform log messages and prompts, turn on debug mode with the -d or --debug flag:

Stack Cost Estimation with Infracost

Once you have Infracost installed, you can get a cost estimate for your stack using the mlstacks breakdown command. This command takes a path to a stack specification file as an argument. For example, if you have a stack specification file called stack.yaml, you can get a cost estimate for it using the following command:

This will print out a cost estimate for your stack.

Viewing Terraform definitions and Stack Specifications

If you'd like to view the Terraform definitions that MLStacks generates for your stack, you can use the mlstacks source command. This command will print out the location of the Terraform definitions for your stack.

Cleaning Up

If you want to clean up all the files and directories created by MLStacks, you can use the mlstacks clean command. This works at a global level (i.e. affecting all stacks), so you don't need to pass in a stack specification file.

Check your mlstacks version

To see what version of mlstacks package you're using, please use the following command:

Terraform usage

MLStacks uses Terraform under the hood to deploy and destroy the infrastructure that you specify in your stack specification files. We specifically designed the interface to conceal the Terraform implementation details from you, but if you want to use Terraform directly, you can do so.

Where are the Terraform files stored?

You can download our modular recipes by cloning our GitHub repository:

git clone https://github.com/zenml-io/mlstacks.git

The specific directory you want to look at is src/mlstacks/terraform.

Terraform next steps

If you want to use Terraform directly, you can simply navigate to the root of one of the xxx-modular directories and run (for example) terraform init to initialize the Terraform directory. You can then run terraform plan to see what Terraform will do, and terraform apply to apply the changes.

You are free to remix and use the Terraform modules and recipes as you see fit, but please note that this is not a core use case for MLStacks and you might no longer be able to use the MLStacks CLI to manage your stacks any more.

ZenML & MLStacks

It is not necessary to use the MLOps stacks recipes presented here alongside the ZenML framework, but it is highly recommended to do so. The ZenML framework is designed to be used with these recipes, and the recipes are designed to be used with ZenML.

The ZenML CLI has an integration with this package that makes it really simple to use and deploy these recipes. For more information, visit the ZenML documentation for more but a quick example is shown below.

# after installing ZenML
zenml stack deploy -p gcp -a -n basic -r us-east1 -t env=dev -x bucket_name=my-new-bucket -x project_id=zenml

This command will deploy a GCP artifact store to us-east1 region with a specific bucket name, project ID and tag, for example.

To learn more about ZenML and how it empowers you to develop a stack-agnostic MLOps solution, head over to the ZenML docs.

Importing `mlstacks` stacks into ZenML

The ZenML CLI also has a command to import stacks created with mlstacks into ZenML. All stacks created with mlstacks generate a .yaml file that can be imported into ZenML with the following command:

# after installing ZenML
zenml stack import -f <path-to-stack-file.yaml>

The path of the stack file can be found by navigating to the directory containing all the Terraform source files. You can easily find this by running the following command:

mlstacks source

This will print the path to the directory containing all the Terraform source and will ask you if you want to open the directory in your default file explorer. You can then navigate to the .yaml file and use that path to import it into ZenML as described above.

Debugging Tips

These are some known problems that might arise out of running mlstacks. Errors for mlstacks deployments are usually related to changes that you might have made independently of the original recipes or they might also relate to network or permissions issues.

Usually the quickest way to start afresh is to run mlstacks clean, but note that this will also delete deployments that you might have made using mlstacks.

You can also try to debug the problem by running the terraform commands from within the mlstacks config directory where the Terraform definition files are stored.

You can also run the mlstacks commands with the --debug flag to get more information and decision points along the way.

Analytics

In order to help us better understand how the community uses ZenML, the pip package reports anonymized usage statistics. You can always opt out by setting the MLSTACKS_ANALYTICS_OPT_IN environment variable to False:

export MLSTACKS_ANALYTICS_OPT_IN=False

Why does MLStacks collect analytics?

In addition to the community at large, MLStacks is created and maintained by a startup based in Munich, Germany called ZenML GmbH. We're a team of techies that love MLOps and want to build tools that fellow developers would love to use in their daily work. This is us if you want to put faces to the names!

However, in order to improve MLStacks and understand how it is being used, we use analytics to have an overview of how it is used 'in the wild'. This not only helps us find bugs but also helps us prioritize features and commands that might be useful in future releases. If we did not have this information, all we really get is pip download statistics and chatting with people directly, which while being valuable, is not enough to seriously better the tool as a whole.

How does MLStacks and ZenML collect these statistics?

MLStacks uses Segment as the data aggregation library for all our analytics. The entire code is entirely visible and can be seen at client.py.

None of the data sent can identify you individually but allows us to understand how MLStacks is being used holistically.

FAQ

What are the benefits of using mlstacks?

MLStacks is a tool for deploying infrastructure to cloud providers. It is designed to make it easy to deploy and manage infrastructure for machine learning. It is built on top of Terraform, which means that it is cloud-agnostic and can be used to deploy infrastructure to any cloud provider that Terraform supports.

MLStacks is designed and developed by a team that live and breathe MLOps. This means that it is designed to support the full range of infrastructure that you might need for your MLOps tooling. It is also designed to be modular, which means that you can easily mix and match different components to create the infrastructure that you need.

What are the tradeoffs of using mlstacks?

MLStacks is currently a project in beta. This means that it is still under active development and may have some rough edges. We are working hard to make it production-ready as soon as possible, but in the meantime, you may encounter some bugs or missing features.

In particular, not all cloud providers and stack components are supported out of the box with the modular recipes that come with MLStacks. If you want to deploy to a cloud provider or use a stack component that is not supported, you will need to write your own recipe. We are working hard to add support for more cloud providers and stack components, but in the meantime, you can use the existing recipes as a starting point for writing your own.

What are the alternatives to mlstacks?

There are lots of ways to deploy infrastructure to cloud providers that span the full spectrum from manual to automated. MLStacks uses Terraform as its backend for configuring and deploying infrastructure, but there are other tools that can help with this like Pulumi or cloud-specific tools like AWS CloudFormation.

What's the connection between mlstacks and ZenML?

MLStacks is developed and maintained by the core ZenML team. It is designed to work (well) with ZenML, but it can also be used independently of ZenML.

Can I use mlstacks independently of ZenML?

Yes! You can use MLStacks to deploy infrastructure for any MLOps tooling you like and it is designed to offer a range of components and flavors to support the full variety of MLOps tools.

How do I use mlstacks within a team setting?

MLStacks is designed to be used in a team setting. It is designed to support the full range of infrastructure that you might need for your MLOps tooling. We also spin up a remote state backend with every deployment (unless you're connecting to one that already exists) so that other team members can collaborate on your stacks and deployed infrastructure. Please see the section of the docs on using remote state for more information.

Contributing

If you want to don the chef's hat and create a new recipe to cover your specific use case, we have just the ingredients you need!

Writing Spec Files

Stack specification

The core of a stack is the stack.yaml file. This file contains all the information needed to deploy a stack. It contains the following fields:

spec_version: 1
spec_type: stack
name: <STACK_NAME>
provider: <YOUR_PROVIDER>
default_region: <REGION_NAME>
default_tags:
  - <TAG_KEY>: <TAG_VALUE>
components:
  - <COMPONENT_FILENAME_GOES_HERE>

Let's go through each of these fields in detail.

`spec_version`

This field defines the version of the mlstacks specification that this stack uses. This is currently 1 and is set as the default.

`spec_type`

This field defines the type of the specification. This is currently stack.

`name`

This field defines the name of the stack. This is used to identify the stack when deploying, destroying, or getting outputs from the stack.

`provider`

This field defines the provider that the stack will be deployed to. This is currently one of k3d, gcp, or aws.

`default_region`

This field defines the default region that the stack will be deployed to. If you specify a region that doesn't exist for your particular provider, the stack deployment will fail.

If you don't specify a region in your stack specification, mlstacks will use whatever is set as the default region for your provider. Note that this will differ between providers.

`default_tags`

This field defines the default tags that will be applied to all resources created by the stack. This is useful for identifying resources created by the stack.

This is an optional field.

`components`

This field defines the components that will be deployed by the stack. This is a list of component filenames.

Component specification

The core of a component is the component.yaml file. This file contains all the information needed to deploy a component. It contains the following fields:

spec_version: 1
spec_type: component
component_type: <COMPONENT_TYPE>
component_flavor: <COMPONENT_FLAVOR>
name: <COMPONENT_NAME>
provider: <YOUR_PROVIDER>
metadata:
  config:
    <CONFIG_KEY>: <CONFIG_VALUE>
  tags:
    <TAG_KEY>: <TAG_VALUE>
  region: <REGION_NAME>

Let's go through each of these fields in detail.

`spec_version`

This field defines the version of the mlstacks specification that this component uses. This is currently 1. This is set as the default.

`spec_type`

This field defines the type of the specification. This is currently component.

`component_type`

This field defines the type of the component. Available component types currently include:

artifact_store: An artifact store is a component that can be used to store artifacts.
container_registry: A container registry is a component that can be used to store container images.
experiment_tracker: An experiment tracker is a component that can be used to track experiments.
orchestrator: An orchestrator is a component that can be used to orchestrate pipelines.
mlops_platform: An MLOps platform is a component that can be used to orchestrate pipelines, track experiments, and manage the overall connection of MLOps components and tools together.
model_deployer: A model deployer is a component that can be used to deploy models.
step_operator: A step operator is a component that can be used to execute steps in a pipeline using custom hardware or platforms.

`component_flavor`

minio: A MinIO artifact store.
s3: An S3 artifact store.
gcp: A GCP/GCS artifact store.

`name`

This field defines the name of the component. This is used to identify the component when deploying, destroying, or getting outputs from the component.

`provider`

This field defines the provider that the component will be deployed to. This is currently one of k3d, gcp, or aws.

`metadata`

This field defines the metadata of the component. This is an (optional) dictionary with the following fields:

`config`

Config is usually optional (except in the case of GCP deployments when you need to specify a project_id.)