Nodepool in Zuul
Warning
This is not authoritative documentation. These features are not currently available in Zuul. They may change significantly before final implementation, or may never be fully completed.
The following specification describes a plan to move Nodepool’s functionality into Zuul and end development of Nodepool as a separate application. This will allow for more node and image related features as well as simpler maintenance and deployment.
Introduction
Nodepool exists as a distinct application from Zuul largely due to historical circumstances: it was originally a process for launching nodes, attaching them to Jenkins, detaching them from Jenkins and deleting them. Once Zuul grew its own execution engine, Nodepool could have been adopted into Zuul at that point, but the existing loose API meant it was easy to maintain them separately and combining them wasn’t particularly advantageous.
However, now we find ourselves with a very robust framework in Zuul for dealing with ZooKeeper, multiple components, web services and REST APIs. All of these are lagging behind in Nodepool, and it is time to address that one way or another. We could of course upgrade Nodepool’s infrastructure to match Zuul’s, or even separate out these frameworks into third-party libraries. However, there are other reasons to consider tighter coupling between Zuul and Nodepool, and these tilt the scales in favor of moving Nodepool functionality into Zuul.
Designing Nodepool as part of Zuul would allow for more features related to Zuul’s multi-tenancy. Zuul is quite good at fault-tolerance as well as scaling, so designing Nodepool around that could allow for better cooperation between node launchers. Finally, as part of Zuul, Nodepool’s image lifecycle can be more easily integrated with Zuul-based workflow.
There are two Nodepool components: nodepool-builder and nodepool-launcher. We will address the functionality of each in the following sections on Image Management and Node Management.
This spec contemplates a new Zuul component to handle image and node management: zuul-launcher. Much of the Nodepool configuration will become Zuul configuration as well. That is detailed in its own section, but for now, it’s enough to know that the Zuul system as a whole will know what images and node labels are present in the configuration.
Image Management
Part of nodepool-builder’s functionality is important to have as a long-running daemon, and part of what it does would make more sense as a Zuul job. By moving the actual image build into a Zuul job, we can make the activity more visible to users of the system. It will be easier for users to test changes to image builds (inasmuch as they can propose a change and a check job can run on that change to see if the image builds sucessfully). Build history and logs will be visible in the usual way in the Zuul web interface.
A frequently requested feature is the ability to verify images before putting them into service. This is not practical with the current implementation of Nodepool because of the loose coupling with Zuul. However, once we are able to include Zuul jobs in the workflow of image builds, it is easier to incorporate Zuul jobs to validate those images as well. This spec includes a mechanism for that.
The parts of nodepool-builder that makes sense as a long-running daemon are the parts dealing with image lifecycles. Uploading builds to cloud providers, keeping track of image builds and uploads, deciding when those images should enter or leave service, and deleting them are all better done with state management and long-running processes (we should know – early versions of Nodepool attempted to do all of that with Jenkins jobs with limited success).
The sections below describe how we will implement image management in Zuul.
First, a reminder that using custom images is optional with Zuul. Many Zuul systems will be able to operate using only stock cloud provider images. One of the strengths of nodepool-builder is that it can build an image for Zuul without relying on any particular cloud provider images. A Zuul system whose operator wants to use custom images will need to bootstrap that process, and under the proposed system where images are build in Zuul jobs, that would need to be done using a stock cloud image. In other words, to bootstrap a system such as OpenDev from scratch, the operators would need to use a stock cloud image to run the job to build the custom image. Once a custom image is available, further image builds could be run on either the stock cloud image or the custom image. That decision is left to the operator and involves consideration of fault tolerance and disaster recovery scenarios.
To build a custom image, an operator will define a fairly typical Zuul job for each image they would like to produce. For example, a system may have one job to build a debian-stable image, a second job for debian-unstable, a third job for ubuntu-focal, a fourth job for ubuntu-jammy. Zuul’s job inheritance system could be very useful here to deal with many variations of a similar process.
Currently nodepool-builder will build an image under three circumstances: 1) the image (or the image in a particular format) is missing; 2) a user has directly requested a build; 3) on an automatic interval (typically daily). To map this into Zuul, we will use Zuul’s existing pipeline functionality, but we will add a new trigger for case #1. Case #2 can be handled by a manual Zuul enqueue command, and case #3 by a periodic pipeline trigger.
Since Zuul knows what images are configured and what their current states are, it will be able to emit trigger events when it detects that a new image (or image format) has been added to its configuration. In these cases, the zuul driver in Zuul will enqueue an image-build trigger event on startup or reconfiguration for every missing image. The event will include the image name. Pipelines will be configured to trigger on image-build events as well as on a timer trigger.
Jobs will include an extra attribute to indicate they build a particular image. This serves two purposes; first, in the case of an image-build trigger event, it will act as a matcher so that only jobs matching the image that needs building are run. Second, it will allow Zuul to determine which formats are needed for that image (based on which providers are configured to use it) and include that information as job data.
The job will be responsible for building the image and uploading the result to some storage system. The URLs for each image format built should be returned to Zuul as artifacts.
Finally, the zuul driver reporter will accept parameters which will tell it to search the result data for these artifact URLs and update the internal image state accordingly.
An example configuration for a simple single-stage image build:
- pipeline:
name: image
trigger:
zuul:
events:
- image-build
timer:
time: 0 0 * * *
success:
zuul:
image-built: true
image-validated: true
- job:
name: build-debian-unstable-image
image-build-name: debian-unstable
This job would run whenever Zuul determines it needs a new
debian-unstable image or daily at midnight. Once the job completes,
because of the image-built: true
report, it will look for artifact
data like this:
artifacts:
- name: raw image
url: https://storage.example.com/new_image.raw
metadata:
type: zuul_image
image_name: debian-unstable
format: raw
- name: qcow2 image
url: https://storage.example.com/new_image.qcow2
metadata:
type: zuul_image
image_name: debian-unstable
format: qcow2
Zuul will update internal records in ZooKeeper for the image to record the storage URLs. The zuul-launcher process will then start background processes to download the images from the storage system and upload them to the configured providers (much as nodepool-builder does now with files on disk). As a special case, it may detect that the image files are stored in a location that a provider can access directly for import and may be able to import directly from the storage location rather than downloading locally first.
To handle image validation, a flag will be stored for each image
upload indicating whether it has been validated. The example above
specifies image-validated: true
and therefore Zuul will put the
image into service as soon as all image uploads are complete.
However, if it were false, then Zuul would emit an image-validate
event after each upload is complete. A second pipeline can be
configured to perform image validation. It can run any number of
jobs, and since Zuul has complete knowledge of image states, it will
supply nodes using the new image upload (which is not yet in service
for normal jobs). An example of this might look like:
- pipeline:
name: image-validate
trigger:
zuul:
events:
- image-validate
success:
zuul:
image-validated: true
- job:
name: validate-debian-unstable-image
image-build-name: debian-unstable
nodeset:
nodes:
- name: node
label: debian
The label should specify the same image that is being validated. Its node request will be made with extra specifications so that it is fulfilled with a node built from the image under test. This process may repeat for each of the providers using that image (normal pipeline queue deduplication rules may need a special case to allow this). Once the validation jobs pass, the entry in ZooKeeper will be updated and the image will go into regular service.
A more specific process definition follows:
After a buildset reports with image-built: true
, Zuul will scan
result data and for each artifact it finds, it will create an entry in
ZooKeeper at /zuul/images/<image_name>/<sequence>. Zuul will know
not to emit any more image-build events for that image at this
point.
For every provider using that image, Zuul will create an entry in ZooKeeper at /zuul/image-uploads/<image_name>/<image_number>/provider/<provider_name>. It will set the remote image ID to null and the image-validated flag to whatever was specified in the reporter.
Whenever zuul-launcher observes a new image-upload record without an ID, it will:
Lock the whole image
Lock each upload it can handle
Unlocks the image while retaining the upload locks
Downloads artifact (if needed) and uploads images to provider
If upload requires validation, it enqueues an image-validate zuul driver trigger event
Unlocks upload
The locking sequence is so that a single launcher can perform multiple uploads from a single artifact download if it has the opportunity.
Once more than two builds of an image are in service, the oldest is deleted. The image ZooKeeper record set to the deleting state. Zuul-launcher will delete the uploads from the providers. The zuul driver emits an image-delete event with item data for the image artifact. This will trigger an image-delete job that can delete the artifact from the cloud storage.
All of these pipeline definitions should typically be in a single tenant (but need not be), but the images they build are potentially available to each tenant that includes the image definition configuration object (see the Configuration section below). Any repo in a tenant with an image build pipeline will be able to cause images to be built and uploaded to providers.
Snapshot Images
Nodepool does not currently support snapshot images, but the spec for the current version of Nodepool does contemplate the possibility of a snapshot based nodepool-builder process. Likewise, this spec does not require us to support snapshot image builds, but in case we want to add support in the future, we should have a plan for it.
The image build job in Zuul could, instead of running diskimage-builder, act on the remote node to prepare it for a snapshot. A special job attribute could indicate that it is a snapshot image job, and instead of having the zuul-launcher component delete the node at the end of the job, it could snapshot the node and record that information in ZooKeeper. Unlike an image-build job, an image-snapshot job would need to run in each provider (similar to how it is proposed that an image-validate job will run in each provider). An image-delete job would not be required.
Node Management
The techniques we have developed for cooperative processing in Zuul can be applied to the node lifecycle. This is a good time to make a significant change to the nodepool protocol. We can achieve several long-standing goals:
Scaling and fault-tolerance: rather than having a 1:N relationship of provider:nodepool-launcher, we can have multiple zuul-launcher processes, each of which is capable of handling any number of providers.
More intentional request fulfillment: almost no intelligence goes into selecting which provider will fulfill a given node request; by assigning providers intentionally, we can more efficiently utilize providers.
Fulfilling node requests from multiple providers: by designing zuul-launcher for cooperative work, we can have nodesets that request nodes which are fulfilled by different providers. Generally we should favor the same provider for a set of nodes (since they may need to communicate over a LAN), but if that is not feasible, allowing multiple providers to fulfill a request will permit nodesets with diverse node types (e.g., VM + static, or VM + container).
Each zuul-launcher process will execute a number of processing loops in series; first a global request processing loop, and then a processing loop for each provider. Each one will involve obtaining a ZooKeeper lock so that only one zuul-launcher process will perform each function at a time.
Zuul-launcher will need to know about every connection in the system so that it may have a fuul copy of the configuration, but operators may wish to localize launchers to specific clouds. To support this, zuul-launcher will take an optional command-line argument to indicate on which connections it should operate.
Currently a node request as a whole may be declined by providers. We will make that more granular and store information about each node in the request (in other words, individual nodes may be declined by providers).
All drivers for providers should implement the state machine interface. Any state machine information currently storen in memory in nodepool-launcher will need to move to ZooKeeper so that other launchers can resume state machine processing.
The individual provider loop will:
Lock a provider in ZooKeeper (/zuul/provider/<name>)
Iterate over every node assigned to that provider in a building state
Drive the state machine
If success, update request
If failure, determine if it’s a temporary or permanent failure and update the request accordingly
If quota available, unpause provider (if paused)
The global queue process will:
Lock the global queue
Iterate over every pending node request, and every node within that request
If all providers have failed the request, clear all temp failures
If all providers have permanently failed the request, return error
Identify providers capable of fulfilling the request
Assign nodes to any provider with sufficient quota
If no providers with sufficient quota, assign it to first (highest priority) provider that can fulfill it later and pause that provider
Configuration
The configuration currently handled by Nodepool will be refactored and added to Zuul’s configuration syntax. It will be loaded directly from git repos like most Zuul configuration, however it will be non-speculative (like pipelines and semaphores – changes must merge before they take effect).
Information about connecting to a cloud will be added to zuul.conf
as a connection
entry. The rate limit setting will be moved to
the connection configuration. Providers will then reference these
connections by name.
Because providers and images reference global (i.e., outside tenant
scope) concepts, ZooKeeper paths for data related to those should
include the canonical name of the repo where these objects are
defined. For example, a debian-unstable image in the
opendev/images repo should be stored at
/zuul/zuul-images/opendev.org%2fopendev%2fimages/
. This avoids
collisions if different tenants contain different image objects with
the same name.
The actual Zuul config objects will be tenant scoped. Image definitions which should be available to a tenant should be included in that tenant’s config. Again using the OpenDev example, the hypothetical opendev/images repository should be included in every OpenDev tenant so all of those images are available.
Within a tenant, image names must be unique (otherwise it is a tenant configuration error, similar to a job name collision).
The diskimage-builder related configuration items will no longer be necessary since they will be encoded in Zuul jobs. This will reduce the complexity of the configuration significantly.
The provider configuration will change as we take the opportunity to make it more “Zuul-like”. Instead of a top-level dictionary, we will use lists. We will standardize on attributes used across drivers where possible, as well as attributes which may be located at different levels of the configuration.
The goals of this reorganization are:
Allow projects to manage their own image lifecycle (if permitted by site administrators).
Manage access control to labels, images and flavors via standard Zuul mechanisms (whether an item appears within a tenant).
Reduce repetition and boilerplate for systems with many clouds, labels, or images.
The new configuration objects are:
- Image
This represents any kind of image (A Zuul image built by a job described above, or a cloud image). By using one object to represent both, we open the possibility of having a label in one provider use a cloud image and in another provider use a Zuul image (because the label will reference the image by short-name which may resolve to a different image object in different tenants). A given image object will specify what type it is, and any relevant information about it (such as the username to use, etc).
- Flavor
This is a new abstraction layer to reference instance types across different cloud providers. Much like labels today, these probably won’t have much information associated with them other than to reserve a name for other objects to reference. For example, a site could define a small and a large flavor. These would later be mapped to specific instance types on clouds.
- Label
Unlike the current Nodepool
label
definitions, these labels will also specify the image and flavor to use. These reference the two objects above, which means that labels themselves contain the high-level definition of what will be provided (e.g., a large ubuntu node) while the specific mapping of what large and ubuntu mean are left to the more specific configuration levels.- Section
This looks a lot like the current
provider
configuration in Nodepool (but also a little bit like apool
). Several parts of the Nodepool configuration (such as separating out availability zones from providers into pools) were added as an afterthought, and we can take the opportunity to address that here.A
section
is part of a cloud. It might be a region (if a cloud has regions). It might be one or more availability zones within a region. A lot of the specifics about images, flavors, subnets, etc., will be specified here. Because a cloud may have many sections, we will implement inheritance among sections.- Provider
This is mostly a mapping of labels to sections and is similar to a provider pool in the current Nodepool configuration. It exists as a separate object so that site administrators can restrict
section
definitions to central repos and allow tenant administrators to control their own image and labels by allowing certain projects to define providers.It mostly consists of a list of labels, but may also include images.
When launching a node, relevant attributes may come from several sources (the pool, image, flavor, or provider). Not all attributes make sense in all locations, but where we can support them in multiple locations, the order of application (later items override earlier ones) will be:
image
stanzaflavor
stanzalabel
stanzasection
stanza (top level)image
withinsection
flavor
withinsection
provider
stanza (top level)label
withinprovider
This reflects that the configuration is built upwards from general and simple objects toward more specific objects image, flavor, label, section, provider. Generally speaking, inherited scalar values will override, dicts will merge, lists will concatenate.
An example configuration follows. First, some configuration which may appear in a central project and shared among multiple tenants:
# Images, flavors, and labels are the building blocks of the
# configuration.
- image:
name: centos-7
type: zuul
# Any other image-related info such as:
# username: ...
# python-path: ...
# shell-type: ...
# A default that can be overridden by a provider:
# config-drive: true
- image:
name: ubuntu
type: cloud
- flavor:
name: large
- label:
name: centos-7
min-ready: 1
flavor: large
image: centos-7
- label:
name: ubuntu
flavor: small
image: ubuntu
# A section for each cloud+region+az
- section:
name: rax-base
abstract: true
connection: rackspace
boot-timeout: 120
launch-timeout: 600
key-name: infra-root-keys-2020-05-13
# The launcher will apply the minimum of the quota reported by the
# driver (if available) or the values here.
quota:
instances: 2000
subnet: some-subnet
tags:
section-info: foo
# We attach both kinds of images to providers in order to provide
# image-specific info (like config-drive) or username.
images:
- name: centos-7
config-drive: true
# This is a Zuul image
- name: ubuntu
# This is a cloud image, so the specific cloud image name is required
image-name: ibm-ubuntu-20-04-3-minimal-amd64-1
# Other information may be provided
# username ...
# python-path: ...
# shell-type: ...
flavors:
- name: small
cloud-flavor: "Performance 8G"
- name: large
cloud-flavor: "Performance 16G"
- section:
name: rax-dfw
parent: rax-base
region: 'DFW'
availability-zones: ["a", "b"]
# A provider to indicate what labels are available to a tenant from
# a section.
- provider:
name: rax-dfw-main
section: rax-dfw
labels:
- name: centos-7
- name: ubuntu
key-name: infra-root-keys-2020-05-13
tags:
provider-info: bar
The following configuration might appear in a repo that is only used in a single tenant:
- image:
name: devstack
type: zuul
- label:
name: devstack
- provider:
name: rax-dfw-devstack
section: rax-dfw
# The images can be attached to the provider just as a section.
image:
- name: devstack
config-drive: true
labels:
- name: devstack
Here is a potential static node configuration:
- label:
name: big-static-node
- section:
name: static-nodes
connection: null
nodes:
- name: static.example.com
labels:
- big-static-node
host-key: ...
username: zuul
- provider:
name: static-provider
section: static-nodes
labels:
- big-static-node
Each of the the above stanzas may only appear once in a tenant for a given name (like pipelines or semaphores, they are singleton objects). If they appear in more than one branch of a project, the definitions must be identical; otherwise, or if they appear in more than one repo, the second definition is an error. These are meant to be used in unbranched repos. Whatever tenants they appear in will be permitted to access those respective resources.
The purpose of the provider
stanza is to associate labels, images,
and sections. Much of the configuration related to launching an
instance (including the availability of zuul or cloud images) may be
supplied in the provider
stanza and will apply to any labels
within. The section
stanza also allows configuration of the same
information except for the labels themselves. The section
supplies default values and the provider
can override them or add
any missing values. Images are additive – any images that appear in
a provider
will augment those that appear in a section
.
The result is a modular scheme for configuration, where a single
section
instance can be used to set as much information as
possible that applies globally to a provider. A simple configuration
may then have a single provider
instance to attach labels to that
section. A more complex installation may define a “standard” pool
that is present in every tenant, and then tenant-specific pools as
well. These pools will all attach to the same section.
References to sections, images and labels will be internally converted to canonical repo names to avoid ambiguity. Under the current Nodepool system, labels are truly a global object, but under this proposal, a label short name in one tenant may be different than one in another. Therefore the node request will internally specify the canonical label name instead of the short name. Users will never use canonical names, only short names.
For static nodes, there is some repitition to labels: first labels
must be associated with the individual nodes defined on the section,
then the labels must appear again on a provider. This allows an
operator to define a collection of static nodes centrally on a
section, then include tenant-specific sets of labels in a provider.
For the simple case where all static node labels in a section should
be available in a provider, we could consider adding a flag to the
provider to allow that (e.g., include-all-node-labels: true
).
Static nodes themselves are configured on a section with a null
connection (since there is no cloud provider associated with static
nodes). In this case, the additional nodes
section attribute
becomes available.
Upgrade Process
Most users of diskimages will need to create new jobs to build these images. This proposal also includes significant changes to the node allocation system which come with operational risks.
To make the transition as minimally disruptive as possible, we will support both systems in Zuul, and allow for selection of one system or the other on a per-label and per-tenant basis.
By default, if a nodeset specifies a label that is not defined by a
label
object in the tenant, Zuul will use the old system and place
a ZooKeeper request in /nodepool
. If a matching label
is
available in the tenant, The request will use the new system and be
sent to /zuul/node-requests
. Once a tenant has completely
converted, a configuration flag may be set in the tenant configuration
and that will allow Zuul to treat nodesets that reference unknown
labels as configuration errors. A later version of Zuul will remove
the backwards compatability and make this the standard behavior.
Because each of the systems will have unique metadata, they will not recognize each others nodes, and it will appear to each that another system is using part of their quota. Nodepool is already designed to handle this case (at least, handle it as well as possible).
Library Requirements
The new zuul-launcher component will need most of Nodepool’s current dependencies, which will entail adding many third-party cloud provider interfaces. As of writing, this uses another 420M of disk space. Since our primary method of distribution at this point is container images, if the additional space is a concern, we could restrict the installation of these dependencies to only the zuul-launcher image.
Diskimage-Builder Testing
The diskimage-builder project team has come to rely on Nodepool in its testing process. It uses Nodepool to upload images to a devstack cloud, launch nodes from those instances, and verify that they function. To aid in continuity of testing in the diskimage-builder project, we will extract the OpenStack image upload and node launching code into a simple Python script that can be used in diskimage-builder test jobs in place of Nodepool.
Work Items
In existing Nodepool convert the following drivers to statemachine: gce, kubernetes, openshift, openshift, openstack (openstack is the only one likely to require substantial effort, the others should be trivial)
Replace Nodepool with an image upload script in diskimage-builder test jobs
Add roles to zuul-jobs to build images using diskimage-builder
Implement node-related config items in Zuul config and Layout
Create zuul-launcher executable/component
Add image-name item data
Add image-build-name attribute to jobs * Including job matcher based on item image-name * Include image format information based on global config
Add zuul driver pipeline trigger/reporter
Add image lifecycle manager to zuul-launcher * Emit image-build events * Emit image-validate events * Emit image-delete events
Add Nodepool driver code to Zuul
Update zuul-launcher to perform image uploads and deletion
Implement node launch global request handler
Implement node launch provider handlers
Update Zuul nodepool interface to handle both Nodepool and zuul-launcher node request queues
Add tenant feature flag to switch between them
Release a minor version of Zuul with support for both
Remove Nodepool support from Zuul
Release a major version of Zuul with only zuul-launcher support
Retire Nodepool