ZooKeeper

Overview

Zuul has a microservices architecture with the goal of no single point of failure in mind.

Zuul is an event driven system with several event loops that interact with each other:

Driver event loop: Drivers like GitHub or Gerrit have their own event loops. They perform preprocessing of the received events and add events into the scheduler event loop.
Scheduler event loop: This event loop processes the pipelines and reconfigurations.

Each of these event loops persists data in ZooKeeper so that other components can share or resume processing.

A key aspect of scalability is maintaining an event queue per pipeline. This makes it easy to process several pipelines in parallel. A new driver event is first processed in the driver event queue. This adds a new event into the scheduler event queue. The scheduler event queue then checks which pipeline may be interested in this event according to the tenant configuration and layout. Based on this the event is dispatched to all matching pipeline queues.

In order to make reconfigurations efficient we store the parsed branch config in Zookeeper. This makes it possible to create the current layout without the need to ask the mergers multiple times for the configuration. This is used by zuul-web to keep an up-to-date layout for API requests.

We store the pipeline state in Zookeeper. This contains the complete information about queue items, jobs and builds, as well as a separate abbreviated state for quick access by zuul-web for the status page.

Driver Event Ingestion

There are three types of event receiving mechanisms in Zuul:

Active event gathering: The connection actively listens to events (Gerrit) or generates them itself (git, timer, zuul)
Passive event gathering: The events are sent to Zuul from outside (GitHub webhooks)
Internal event generation: The events are generated within Zuul itself and typically get injected directly into the scheduler event loop.

The active event gathering needs to be handled differently from passive event gathering.

Active Event Gathering

This is mainly done by the Gerrit driver. We actively maintain a connection to the target and receive events. We utilize a leader election to make sure there is exactly one instance receiving the events.

Passive Event Gathering

In case of passive event gathering the events are sent to Zuul typically via webhooks. These types of events are received in zuul-web which then stores them in Zookeeper. This type of event gathering is used by GitHub and other drivers. In this case we can have multiple instances but still receive only one event so that we don’t need to take special care of event deduplication or leader election. Multiple instances behind a load balancer are safe to use and recommended for such passive event gathering.

Configuration Storage

Zookeeper is not designed as a database with a large amount of data, so we should store as little as possible in zookeeper. Thus we only store the per project-branch unparsed config in zookeeper. From this, every part of Zuul, like the scheduler or zuul-web, can quickly recalculate the layout of each tenant and keep it up to date by watching for changes in the unparsed project-branch-config.

We store the actual config sharded in multiple nodes, and those nodes are stored under per project and branch znodes. This is needed because of the 1MB limit per znode in zookeeper. It further makes it less expensive to cache the global config in each component as this cache is updated incrementally.

Executor and Merger Queues

The executors and mergers each have an execution queue (and in the case of executors, optionally per-zone queues). This makes it easy for executors and mergers to simply pick the next job to run without needing to inspect the entire pipeline state. The scheduler is responsible for submitting job requests as the state changes.

Zookeeper Map

This is a reference for object layout in Zookeeper.

/zuul: All ephemeral data stored here. Remove the entire tree to “reset” the system.

/zuul/cache/connection/<connection>: The connection cache root. Each connection has a dedicated space for its caches. Two types of caches are currently implemented: change and branch.

/zuul/cache/connection/<connection>/branches: The connection branch cache root. Contains the cache itself and a lock.

/zuul/cache/connection/<connection>/branches/data Type: BranchCacheZKObject (sharded): The connection branch cache data. This is a single sharded JSON blob.

/zuul/cache/connection/<connection>/branches/lock Type: RWLock: The connection branch cache read/write lock.

/zuul/cache/connection/<connection>/cache

The connection change cache. Each node under this node is an entry in the change cache. The node ID is a sha256 of the cache key, the contents are the JSON serialization of the cache entry metadata. One of the included items is the data_uuid which is used to retrieve the actual change data.

When a cache entry is updated, a new data node is created without deleting the old data node. They are eventually garbage collected.

/zuul/cache/connection/<connection>/data

Data for the change cache. These nodes are identified by a UUID referenced from the cache entries.

These are sharded JSON blobs of the change data.

/zuul/cache/blob/data

Data for the blob store. These nodes are identified by a sha256sum of the secret content.

These are sharded blobs of data.

/zuul/cache/blob/lock: Side-channel lock directory for the blob store. The store locks by key id under this znode when writing.

/zuul/cleanup

This node holds locks for the cleanup routines to make sure that only one scheduler runs them at a time.

/zuul/cleanup/build_requests

/zuul/cleanup/connection

/zuul/cleanup/general

/zuul/cleanup/merge_requests

/zuul/cleanup/node_request

/zuul/cleanup/sempahores

/zuul/components

The component registry. Each Zuul process registers itself under the appropriate node in this hierarchy so the system has a holistic view of what’s running. The name of the node is based on the hostname but is a sequence node in order to handle multiple processes. The nodes are ephemeral so an outage is automatically detected.

The contents of each node contain information about the running process and may be updated periodically.

/zuul/components/executor

/zuul/components/fingergw

/zuul/components/merger

/zuul/components/scheduler

/zuul/components/web

/zuul/config/cache: The unparsed config cache. This contains the contents of every Zuul config file returned by the mergers for use in configuration. Organized by repo canonical name, branch, and filename. The files themeselves are sharded.

/zuul/config/lock: Locks for the unparsed config cache.

/zuul/events/connection/<connection>/events Type: ConnectionEventQueue: The connection event queue root. Each connection has an event queue where incoming events are recorded before being moved to the tenant event queue.

/zuul/events/connection/<connection>/events/queue: The actual event queue. Entries in the queue reference separate data nodes. These are sequence nodes to maintain the event order.

/zuul/events/connection/<connection>/events/data: Event data nodes referenced by queue items. These are sharded.

/zuul/events/connection/<connection>/events/election

An election to determine which scheduler processes the event queue and moves events to the tenant event queues.

Drivers may have additional elections as well. For example, Gerrit has an election for the watcher and poller.

/zuul/events/tenant/<tenant>: Tenant-specific event queues. Each queue described below has a data and queue subnode.

/zuul/events/tenant/<tenant>/management: The tenant-specific management event queue.

/zuul/events/tenant/<tenant>/trigger: The tenant-specific trigger event queue.

/zuul/events/tenant/<tenant>/pipelines: Holds a set of queues for each pipeline.

/zuul/events/tenant/<tenant>/pipelines/<pipeline>/management: The pipeline management event queue.

/zuul/events/tenant/<tenant>/pipelines/<pipeline>/result: The pipeline result event queue.

/zuul/events/tenant/<tenant>/pipelines/<pipeline>/trigger: The pipeline trigger event queue.

/zuul/executor/unzoned Type: JobRequestQueue

The unzoned executor build request queue. The generic description of a job request queue follows:

/zuul/executor/unzoned/requests/<request uuid>: Requests are added by UUID. Consumers watch the entire tree and order the requests by znode creation time.

/zuul/executor/unzoned/locks/<request uuid> Type: Lock: A consumer will create a lock under this node before processing a request. The znode containing the lock and the requent znode have the same UUID. This is a side-channel lock so that the lock can be held while the request itself is deleted.

/zuul/executor/unzoned/params/<request uuid>: Parameters can be quite large, so they are kept in a separate znode and only read when needed, and may be removed during request processing to save space in ZooKeeper. The data may be sharded.

/zuul/executor/unzoned/result-data/<request uuid>: When a job is complete, the results of the merge are written here. The results may be quite large, so they are sharded.

/zuul/executor/unzoned/results/<request uuid>: Since writing sharded data is not atomic, once the results are written to result-data, a small znode is written here to indicate the results are ready to read. The submitter can watch this znode to be notified that it is ready.

/zuul/executor/unzoned/waiters/<request uuid> (ephemeral): A submitter who requires the results of the job creates an ephemeral node here to indicate their interest in the results. This is used by the cleanup routines to ensure that they don’t prematurely delete the result data. Used for merge jobs

/zuul/executor/zones/<zone>: A zone-specific executor build request queue. The contents are the same as above.

/zuul/layout/<tenant>: The layout state for the tenant. Contains the cache and time data needed for a component to determine if its in-memory layout is out of date and update it if so.

/zuul/layout-data/<layout uuid>: Additional information about the layout. This is sharded data for each layout UUID.

/zuul/locks: Holds various types of locks so that multiple components can coordinate.

/zuul/locks/connection: Locks related to connections.

/zuul/locks/connection/<connection>: Locks related to a single connection.

/zuul/locks/connection/database/migration Type: Lock: Only one component should run a database migration; this lock ensures that.

/zuul/locks/events: Locks related to tenant event queues.

/zuul/locks/events/trigger/<tenant> Type: Lock: The scheduler locks the trigger event queue for each tenant before processing it. This lock is only needed when processing and removing items from the queue; no lock is required to add items.

/zuul/locks/events/management/<tenant> Type: Lock: The scheduler locks the management event queue for each tenant before processing it. This lock is only needed when processing and removing items from the queue; no lock is required to add items.

/zuul/locks/pipeline: Locks related to pipelines.

/zuul/locks/pipeline/<tenant>/<pipeline> Type: Lock: The scheduler obtains a lock before processing each pipeline.

/zuul/locks/tenant: Tenant configuration locks.

/zuul/locks/tenant/<tenant> Type: RWLock: A write lock is obtained at this location before creating a new tenant layout and storing its metadata in ZooKeeper. Components which later determine that they need to update their tenant configuration to match the state in ZooKeeper will obtain a read lock at this location to ensure the state isn’t mutated again while the components are updating their layout to match.

/zuul/ltime: An empty node which serves to coordinate logical timestamps across the cluster. Components may update this znode which will cause the latest ZooKeeper transaction ID to appear in the zstat for this znode. This is known as the ltime and can be used to communicate that any subsequent transactions have occurred after this ltime. This is frequently used for cache validation. Any cache which was updated after a specified ltime may be determined to be sufficiently up-to-date for use without invalidation.

/zuul/merger Type: JobRequestQueue: A JobRequestQueue for mergers. See zuul/executor/unzoned.

/zuul/nodepool Type: NodepoolEventElection: An election to decide which scheduler will monitor nodepool requests and generate node completion events as they are completed.

/zuul/results/management: Stores results from management events (such as an enqueue event).

/zuul/scheduler/timer-election Type: SessionAwareElection: An election to decide which scheduler will generate events for timer pipeline triggers.

/zuul/scheduler/stats-election Type: SchedulerStatsElection: An election to decide which scheduler will report system-wide stats (such as total node requests).

/zuul/global-semaphores/<semaphore> Type: SemaphoreHandler: Represents a global semaphore (shared by multiple tenants). Information about which builds hold the semaphore is stored in the znode data.

/zuul/semaphores/<tenant>/<semaphore> Type: SemaphoreHandler: Represents a semaphore. Information about which builds hold the semaphore is stored in the znode data.

/zuul/system Type: SystemConfigCache

System-wide configuration data.

/zuul/system/conf: The serialized version of the unparsed abide configuration as well as system attributes (such as the tenant list).

/zuul/system/conf-lock Type: WriteLock: A lock to be acquired before updating zuul/system/conf

/zuul/tenant/<tenant>: Tenant-specific information here.

/zuul/tenant/<tenant>/pipeline/<pipeline>: Pipeline state.

/zuul/tenant/<tenant>/pipeline/<pipeline>/dirty: A flag indicating that the pipeline state is “dirty”; i.e., it needs to have the pipeline processor run.

/zuul/tenant/<tenant>/pipeline/<pipeline>/queue: Holds queue objects.

/zuul/tenant/<tenant>/pipeline/<pipeline>/item/<item uuid>: Items belong to queues, but are held in their own hierarchy since they may shift to differrent queues during reconfiguration.

/zuul/tenant/<tenant>/pipeline/<pipeline>/item/<item uuid>/buildset/<buildset uuid>: There will only be one buildset under the buildset/ node. If we reset it, we will get a new uuid and delete the old one. Any external references to it will be automatically invalidated.

/zuul/tenant/<tenant>/pipeline/<pipeline>/item/<item uuid>/buildset/<buildset uuid>/repo_state: The global repo state for the buildset is kept in its own node since it can be large, and is also common for all jobs in this buildset.

/zuul/tenant/<tenant>/pipeline/<pipeline>/item/<item uuid>/buildset/<buildset uuid>/job/<job name>: The frozen job.

/zuul/tenant/<tenant>/pipeline/<pipeline>/item/<item uuid>/buildset/<buildset uuid>/job/<job name>/build/<build uuid>: Information about this build of the job. Similar to buildset, there should only be one entry, and using the UUID automatically invalidates any references.

/zuul/tenant/<tenant>/pipeline/<pipeline>/item/<item uuid>/buildset/<buildset uuid>/job/<job name>/build/<build uuid>/parameters: Parameters for the build; these can be large so they’re in their own znode and will be read only if needed.