Data Model

class changes.models.artifact.Artifact(**kwargs)[source]

The artifact produced by one job/step, produced on a single machine. Sometimes this is a JSON dict referencing a file in S3, sometimes it is Null, sometimes it is an empty dict. It is basically any file left behind after a run for changes to pick up


A list of every person who has written a revision parsed by changes. Keyed by email. Automatically updated when new authors are seen by changes in diffs etc.


Represents the work we do (e.g. running tests) for one diff or commit (an entry in the source table) in one particular project

Each Build contains many Jobs (usually linked to a JobPlan).

class changes.models.buildseen.BuildSeen(**kwargs)[source]

Keeps track of when users have viewed builds in the ui. Not sure we expose this to users in the ui right now.

class changes.models.command.Command(**kwargs)[source]

The information of the script run on one node within a jobstep: the contents of the script are included, and later the command can be updated with status/return code.

changes-client has no real magic beyond running commands, so the list of commands it ran basically tells you everything that happened.

Looks like only mesos/lxc builds (DefaultBuildStep)

class changes.models.comment.Comment(**kwargs)[source]

Comments on test runs in changes. You can go into the GUI and leave messages, and this table keeps track of those. There is a job_id but it is always null, despite the UI showing you the comment only on the job page.

Due to this, the UI will show an identical set of comments on every job page of a build.

class changes.models.event.Event(**kwargs)[source]

Indicates that something (specified by type and data) happened to some entity (specified by item_id). This allows us to record that we’ve performed some action with an external side-effect so that we can be sure we do it no more than once. It is also useful for displaying to users which actions have been performed when, and whether they were successful.

class changes.models.failurereason.FailureReason(**kwargs)[source]

Always associated with a single jobstep. failurereason is not required to fail a build. But if a jobstep fails, it can record why here. reason column can be: [test_failures, missing_test, missing_artifact, timeout, malformed_artifact, duplicate_test_name]

class changes.models.filecoverage.FileCoverage(**kwargs)[source]

Unique to file/job/project. Contains a data-blob-string, where each character is either

  • U Unconvered
  • C Covered
  • N No Info

filled in when file coverage artifacts are collected (updated with additional lines for each new artifact in a job)

class changes.models.itemsequence.ItemSequence(**kwargs)[source]

Used to hold counters for autoincrement-style sequence number generation. In each row, value is the last sequence number returned for the corresponding parent.

The table is used via the next_item_value database function and not used in the python codebase.

class changes.models.itemstat.ItemStat(**kwargs)[source]

Also a key/value table, tailored towards statistics generated by tests and code coverage. Examples: test_rerun_count, test_duration, lines_covered

class changes.models.job.Job(**kwargs)[source]

An instantiation of a plan for a particular build. We run the code specified by the appropriate plan. That code creates and farms out a bunch of jobsteps to do the actual work.

class changes.models.jobplan.JobPlan(**kwargs)[source]

A snapshot of a plan and its constituent steps, taken at job creation time. This exists so that running jobs are not impacted by configuration changes. Note that this table combines the data from the plan and step tables.

class changes.models.jobstep.JobStep(**kwargs)[source]

The most granular unit of work; run on a particular node, has a status and a result.

class changes.models.jobphase.JobPhase(**kwargs)[source]

A JobPhase is a grouping of one or more JobSteps performing the same basic task. The phases of a Job are intended to be executed sequentially, though that isn’t necessarily enforced.

One example of phase usage: a Job may have a test collection phase and a test execution phase, with a single JobStep collecting tests in the first phase and an arbitrary number of JobSteps executing shards of the collected tests in the second phase. By using two phases, the types of JobSteps can be tracked and managed independently.

Though JobPhases are typically created to group newly created JobSteps, they can also be constructed retroactively once a JobStep has finished based on phased artifacts. This is convenient but a little confusing, and perhaps should be handled by another mechanism.

class changes.models.latest_green_build.LatestGreenBuild(**kwargs)[source]

Represents the latest green build for a given branch of a given project

A project with multiple latest_green_builds is because it has multiple branches

class changes.models.log.LogSource(**kwargs)[source]

We log the console output for each jobstep. logsource is an entity table for these “logfiles”. logchunk contains the actual text.

If we’re using artifact store to store/host the log file, in_artifact_store will be set to true. No logchunk entries will be associated with such logsources.

class changes.models.log.LogChunk(**kwargs)[source]

Chunks of text. Each row in logchunk is associated with a particular logsource entry, and has an offset and blob of text. By grabbing all logchunks for a given logsource id, you can combine them to get the full log.

class changes.models.node.Cluster(**kwargs)[source]

A group of nodes. We refer to clusters in the step configurations (where should we run our tests?) Clusters are automatically added when we see them from jenkins results.

Apparently, clusters are only used in jenkins (not lxc, although nodes are used for both.) A cluster does not correspond to one master

class changes.models.node.ClusterNode(cluster=None, node=None, **kwargs)[source]

Which cluster does each node belong to? This is populated at the same time as cluster.

class changes.models.node.Node(*args, **kwargs)[source]

A machine that runs jobsteps.

This is populated by observing the machines picked by the jenkins masters (which themselves are configured by BuildStep params in the changes UI) when they’re asked to run task, and is not configured manually. Node machines have tags (not stored in the changes db)

class changes.models.patch.Patch(**kwargs)[source]

A patch that can be applied to a revision. Refers to a parent revision on which the patch is based, and contains a diff text field with the contents of the patch (in unified diff form? 2x check.)

Used by builds from phabricator diffs: see source for more details.

class changes.models.phabricatordiff.PhabricatorDiff(**kwargs)[source]

Whenever phabricator sends us a diff to do a build against (see source/patch for more info), we write an entry to this table with the details. revision_id and diff_id refer to the phabricator versions of this terminology: revision_id is the number in D145201 and diff_id represents a particular diff within that differential revision (the id in the revision update history table.)

This is 80% convenient logging. It also does light deduplication: we make sure to never kick off more than one build for a particular revision_id/diff_id from the api called by phabricator. Phabricator can occasionally fire a herald rule more than once, so its nice to have this.

class changes.models.plan.Plan(**kwargs)[source]

What work should we do for our new revision? A project may have multiple plans, e.g. whenever a diff comes in, test it on both mac and windows (each being its own plan.) In theory, a plan consists of a sequence of steps; in practice, a plan is just a wrapper around a single step.

class changes.models.project.Project(**kwargs)[source]

The way we organize changes. Each project is linked to one repository, and usually kicks off builds for it when new revisions come it (or just for some revisions based on filters.) Projects use build plans (see plan) to describe the work to be done for a build.

class changes.models.project.ProjectOption(**kwargs)[source]

Key/value table storing configuration information for projects. Here is an incomplete list of possible keys:

  • build.branch-names
  • build.commit-trigger
  • build.expect-tests
  • build.file-whitelist
  • build.test-duration-warning
  • green-build.notify
  • green-build.project
  • history.test-retention-days
  • mail.notify
  • mail.notify-addresses
  • mail.notify-addresses-revisions
  • mail.notify-author
  • phabricator.diff-trigger
  • phabricator.notify
  • phabricator.coverage
  • project.notes
  • project.owners
  • snapshot.current
class changes.models.repository.Repository(**kwargs)[source]

Represents a VCS repository that Changes will watch for new commits.

class changes.models.revision.Revision(**kwargs)[source]

Represents a commit in a repository, including some metadata. Author and committer are stored as references to the author table. Ideally there will be one revision row for every commit in every repository tracked by changes, though this is not always true, and some code tries to degrade gracefully when this happens.

Revisions are keyed by repository, sha. They do not have unique UUIDs

class changes.models.snapshot.Snapshot(**kwargs)[source]

A snapshot is a set of LXC container images (up to one for each plan in a project).

Each project can have an arbitrary number of snapshots, but only up to one “current snapshot” is actually used by builds (stored as ProjectOption) at any time.

Snapshots are used in the Mesos and Jenkins-LXC environments. Snapshots are currently only used with changes-client.

When running a build, the images of the current snapshot are used for individual jobs that are part of a build. A snapshot image can be shared between multiple plans by setting snapshot_plan_id of a Plan. By default, there is a separate image for each plan of a build.

The status of a snapshot indicates whether it can be used for builds; it doesn’t indicate whether the snapshot is actually used for builds right now. A snapshot is active if and only if all the corresponding snapshot images are active.

A snapshot is generated by a slightly special snapshot build that uploads a snapshot at the end of the build.

class changes.models.source.Source(**kwargs)[source]

This is the object that actually represents the code we run builds against.

Essentially its a revision, with a UUID, and a possible patch_id. Rows with null patch_ids are just revisions, and rows with patch_ids apply the linked patch on top of the revision and run builds against the resulting code.

Why the indirection? This is how we handle phabricator diffs: when we want to create a build for a new diff, we add a row here with the diff’s parent revision sha (NOT the sha of the commit phabricator is trying to land, since that will change every time we update the diff) and a row to the patch table that contains the contents of the diff.

Side note: Whenever we create a source row from a phabricator diff, we log json text to the data field with information like the diff id.

class changes.models.step.Step(**kwargs)[source]

A specific description of how to do some work for a build.

In theory, a plan can have multiple steps. In practice, every plan has only one step and plan is just a thin wrapper around step. Steps are not freeform, rather, each step is just configuration data for specific step implementations that are hard-coded in python.

class changes.models.task.Task(**kwargs)[source]

When we enqueue a task, we also write a db row to keep track of the task’s metadata (e.g. number of times retried.) There is a slightly icky custom data column that each task type uses in its own way. This db represents serialized version of tracked_task you see in the changes python codebase.

Tasks can have parent tasks. Parent tasks have the option of waiting for their children to complete (in practice, that always happens.)

Example: sync_job with sync_jobstep children

Tasks can throw a NotFinished exception, which will just mean that we try running it again after some interval (but this has nothing to do with retrying tasks that error!) Examples: Tasks with children will check to see if their children are finished; the sync_jobstep task will query jenkins to see if its finished.

Tasks can fire signals, e.g. build xxx has finished. There’s a table that maps signal types to tasks that should be created. Signals/listeners are not tracked as children of other tasks.

See Tasks for more details on what the task_name can refer to.

class changes.models.test.TestCase(**kwargs)[source]

A single run of a single test, together with any captured output, retry-count and its return value.

Every test that gets run ever has a row in this table.

At the time this was written, it seems to have 400-500M rows

(how is this still surviving?)

NOTE: DO NOT MODIFY THIS TABLE! Running migration on this table has caused unavailability in the past. If you need to add a new column, consider doing that on a new table and linking it back to tests via the ID.

class changes.models.testartifact.TestArtifact(**kwargs)[source]

Represents any artifacts generated by a single run of a single test. used e.g. in server-selenium to store screenshots and large log dumps for later debugging.

class changes.models.user.User(**kwargs)[source]

A table of the people who use changes.


Kicks off a newly created job within a build; enqueued for each job within a new build.[source][source]

Tasks fire signals by spawning fire_signal tasks; they grab every associated listener and spawn run_event_listener tasks for each[source]

Actually run the listener

See fire_signal, which doesn’t actually run it[source]

Downloads an artifact from jenkins.[source]

Updates jobphase and job statuses based on the status of the constituent jobsteps.[source]

Polls a build for updates. May have sync_artifact children.[source]