The purpose of this project is to provide data for developer productivity by collecting and reporting:
-
What are the costliest tasks (time x frequency) of local developers, by project?
-
What is the breakdown of these tasks in terms of cache effectiveness, network, something else?
-
Given a task, tell me the cache rate, mean/stddev/histogram, inputs, network, etc. How has this changed over time?
-
-
What are the costliest errors (again, time x frequency) and build failures, segmented by environment and failure type?
-
What is the impact of flaky test failures on local builds?
-
-
What versions of Gradle BT are in use at Gradle, and with what frequency? Similarly, what versions of guava are in use at Gradle across among active projects?
These applications collect and index Gradle Enterprise and GitHub events data into Google Cloud Storage and BigQuery.
This application is responsible for streaming build event data from a configured Gradle Enterprise export API endpoint into a specified Google Cloud Storage bucket. It pulls all builds down and all configured event types, avoiding as much data interaction (parsing, filtering) as possible.
Reference Materials:
gcloud compute instances create build-event-collectorator1 \
--preemptible \
--image-family debian-9 \
--image-project debian-cloud \
--machine-type n1-highmem-8 \
--scopes "userinfo-email,cloud-platform" \
--metadata startup-script='#!/bin/sh
APP_NAME="build-event-collectorator"
APP_VERSION="0.5.0"
export GRADLE_ENTERPRISE_HOSTNAME="gradle-enterprise.mycompany.com"
export GRADLE_ENTERPRISE_USERNAME="my-username"
export GRADLE_ENTERPRISE_PASSWORD="my-password"
export GCS_RAW_BUCKET_NAME="build-events-raw"
gsutil cp "gs://gradle-build-analysis-apps/maven2/org/gradle/buildeng/analysis/${APP_NAME}/${APP_VERSION}/${APP_NAME}-${APP_VERSION}.zip" .
apt-get update && apt-get -y --force-yes install openjdk-8-jdk unzip
update-alternatives --set java /usr/lib/jvm/java-8-openjdk-amd64/jre/bin/java
echo "Running ${APP_NAME}-${APP_VERSION}..."
unzip "${APP_NAME}-${APP_VERSION}.zip"
sh "${APP_NAME}-${APP_VERSION}/bin/${APP_NAME}"
echo "Application exited"'
By default the collector processes only builds from the moment the app is started, but you can collect past builds by setting export BACKFILL_DAYS=<number>
in the startup script.
Similarly, you can specify a export LAST_BUILD_ID="jh4qknspatp2y"
to start streaming from the build immediately after the given build ID.
This app is not necessary right now. It streamed build events to Google Cloud PubSub to allow fanout with many build event collectors.
Currently the limiting factor is network outbound from the Gradle Enterprise server and multiple downloaders do not make processing faster.
These applications are responsible for transforming raw data pulled from Google Cloud Storage, tranforming and combining those events using Apache Beam.
Each application consists of a EventModel
that represents the schema of the BigQuery table to be generated, an EventJsonTransformer
which filters and transforms data, and a BuildIndexer
which writes to BigQuery.
You can develop your own indexer by creating a Model, EventsJsonTransformer, and an Indexer.
You will find several indexers for inspiration under build-event-indexerator/src/main/kotlin
.
Note
|
We are using |
You can run any indexer locally using a task rule and the DirectRunner
:
./gradlew :build-event-indexerator:indexBuildEvents --args="--runner=DirectRunner --project=build-analysis --input=gs://build-events-raw/2019/01/01/22*.txt --output=build-analysis:gradle_builds.builds"
Once you’re happy with your Apache Beam setup, create a Google Dataflow job to run over a larger input.
./gradlew :build-event-indexerator:indexTestEvents --args="--runner=DataflowRunner --project=build-analysis --input=gs://build-events-raw/2019/01/** --output=build-analysis:gradle_builds.test_executions --region=us-central1 --tempLocation=gs://gradle-dataflow-tmp/$(openssl rand -hex 8)"
Let’s break this down a bit:
-
--runner=DataflowRunner
tells Apache Beam that you want to use Google Dataflow which uses Google Compute Engine under the hood. -
--project=build-analysis
configures the Google Cloud project name. We usebuild-analysis
. -
--input=gs://build-events-raw/2019/01/**
will consume all files from the given Google Cloud Storage bucket in the month of January 2019. These build files are keyed by build start time. -
--output=build-analysis:gradle_builds.my_builds_table
is the BigQuery table that will be created (if necessary) or appended to. -
--region=us-central1
Google Compute region to use for workers. Quotas are set by region, and we have requested increased capacity so that Dataflow can process TBs of build data before Eric dies of old age. This is optional and us-central1 is the default. -
--tempLocation=gs://gradle-dataflow-tmp/$(openssl rand -hex 8)
an existing GCS bucket and a random key that Dataflow jobs can use to store temporary files.
Reporting-specific tables must be created in BigQuery in order to keep reasonable Data Studio costs and performance.
You need to use the BigQuery CLI or API to create new partitioned tables. Using time partitioned tables allows us Data Studio to avoid querying all of the data when querying a subset of the time range.
You can generate tables for various dashboards using the following queries:
bq query --location="US" --destination_table="build-analysis:reports.builds_dashboard" --time_partitioning_field="date" --use_legacy_sql="false" --replace --batch '
SELECT
DATE(buildTimestamp) AS date,
rootProjectName AS project,
buildId,
STARTS_WITH(buildAgentId, "tcagent") AS ci,
SUM(wallClockDuration) AS total_build_time
FROM
`gradle_builds.builds`
WHERE
rootProjectName IN ("gradle",
"dotcom",
"dotcom-docs",
"gradle-kotlin-dsl",
"ci-health",
"build-analysis",
"gradle-profiler",
"gradle-site-plugin",
"gradlehub")
AND buildTimestamp > "2019-01-01"
GROUP BY
1,
2,
3,
4;'
bq query --location="US" --destination_table="build-analysis:reports.failures_dashboard" --time_partitioning_field="timestamp" --use_legacy_sql="false" --replace --batch '
SELECT
buildId,
rootProjectName AS project,
buildTimestamp AS timestamp,
wallClockDuration AS build_duration,
STARTS_WITH(buildAgentId, "tcagent") AS ci,
failureData.category AS failure_category,
failed_task,
JSON_EXTRACT(env.value,
"$.name") AS os
FROM
`gradle_builds.builds` builds,
UNNEST(failureData.taskPaths) AS failed_task
CROSS JOIN
UNNEST(environmentParameters) AS env
WHERE
rootProjectName IN ("gradle",
"dotcom",
"dotcom-docs",
"gradle-kotlin-dsl",
"ci-health",
"build-analysis",
"gradle-profiler",
"gradle-site-plugin",
"gradlehub")
AND buildTimestamp > "2019-01-01"
AND BYTE_LENGTH(failureId) > 0
AND env.key = "Os"'
bq query --location="US" --destination_table="build-analysis:reports.tasks_dashboard" --time_partitioning_field="date" --use_legacy_sql="false" --replace --batch '
SELECT
DATE(buildTimestamp) AS date,
rootProjectName AS project,
CONCAT(tasks.buildPath, " > ", tasks.path) AS absolute_task_path,
tasks.className AS task_type,
tasks.outcome,
tasks.cacheable,
CASE
WHEN tasks.cacheable IS FALSE THEN "NOT_CACHEABLE"
WHEN tasks.cacheable IS TRUE
AND tasks.outcome IN ("from_cache") THEN "CACHE_HIT"
WHEN tasks.cacheable IS TRUE AND tasks.outcome IN ("success", "failed") THEN "CACHE_MISS"
WHEN tasks.cacheable IS TRUE
AND tasks.outcome IN ("up_to_date",
"skipped",
"no_source") THEN "UP_TO_DATE"
ELSE "UNKNOWN"
END AS cache_use,
STARTS_WITH(buildAgentId, "tcagent") AS ci,
SUM(tasks.wallClockDuration) AS total_time_ms,
AVG(tasks.wallClockDuration) AS avg_duration,
STDDEV(tasks.wallClockDuration) AS stddev_duration
FROM
`gradle_builds.task_executions`,
UNNEST(tasks) AS tasks
WHERE
rootProjectName IN ("gradle",
"dotcom",
"dotcom-docs",
"gradle-kotlin-dsl",
"ci-health",
"build-analysis",
"gradle-profiler",
"gradle-site-plugin",
"gradlehub")
AND buildTimestamp > "2019-01-01"
GROUP BY
1,
2,
3,
4,
5,
6,
7,
8;'
bq query --location="US" --destination_table="build-analysis:reports.tests_dashboard" --time_partitioning_field="date" --use_legacy_sql="false" --replace --batch '
SELECT
DATE(buildTimestamp) AS date,
rootProjectName as project,
CONCAT(t.className, ".", t.name) AS test_name,
t.taskId as task_path,
exec.failed AS failed,
STARTS_WITH(buildAgentId, "tcagent") AS ci,
SUM(exec.wallClockDuration) AS total_time_ms,
AVG(exec.wallClockDuration) AS avg_duration,
STDDEV(exec.wallClockDuration) stddev_duration
FROM
`gradle_builds.test_executions`,
UNNEST(tests) AS t,
UNNEST(t.executions) AS exec
WHERE
rootProjectName IN ("gradle",
"dotcom",
"dotcom-docs",
"gradle-kotlin-dsl",
"ci-health",
"build-analysis",
"gradle-profiler",
"gradle-site-plugin",
"gradlehub")
AND buildTimestamp > "2019-01-01"
AND t.suite = FALSE
GROUP BY
1,
2,
3,
4,
5,
6;'
bq query --location="US" --destination_table="build-analysis:reports.dependencies_dashboard" --use_legacy_sql="false" --replace --batch '
SELECT
DISTINCT(CONCAT(md.group, ":", md.module)) AS group_and_module,
rootProjectName AS project_name,
md.version,
COUNT(buildId) build_count
FROM
`gradle_builds.dependencies` AS d,
UNNEST(moduleDependencies) AS md
WHERE
rootProjectName IN ("gradle",
"dotcom",
"dotcom-docs",
"gradle-kotlin-dsl",
"ci-health",
"build-analysis",
"gradle-profiler",
"gradle-site-plugin",
"gradlehub")
AND buildTimestamp > "2019-01-01"
GROUP BY
1,
2,
3;'
You can query build data using:
-
Google Cloud Project:
your-google-cloud-project
-
BigQuery Dataset:
gradle_builds
Here are many of the BigQuery tables generated. All of them that have a timestamp field are partitioned by that field:
-
builds
-
build_cache_interactions
-
build_failures
-
dependency_resolutions
-
exceptions
-
network_activity
-
task_executions
-
test_executions
Schemas are generated from data classes under build-event-indexerator/src/main/kotlin/org/gradle/buildeng/analysis/model/
using BigQueryTableSchemaGenerator
.
Some fields are JSON. See BigQuery JSON functions for reference.
You can use Google Cloud Scheduler or plain old cron
to schedule ~daily data updates. See Cloud Scheduler Docs
SELECT
FORMAT_TIMESTAMP('%Y-%m-%d', buildTimestamp) AS day,
STARTS_WITH(buildAgentId, 'tcagent') AS isCI,
COUNT(buildId) AS count
FROM
`gradle_builds.builds`
WHERE
buildTimestamp > TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL 7 DAY)
AND BYTE_LENGTH(failureId) > 0
GROUP BY 1, 2
ORDER BY 1, 2;
SELECT
buildToolVersion,
COUNT(buildId) as count
FROM
`gradle_builds.builds`
WHERE
rootProjectName = 'gradle'
and buildTimestamp > TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL 7 DAY)
GROUP BY
1
ORDER BY
2 DESC;
SELECT
JSON_EXTRACT(env.value,
'$.version') as jdk_version,
COUNT(env.value) as count
FROM
`gradle_builds.builds`,
UNNEST(environmentParameters) AS env
WHERE
buildAgentId NOT LIKE 'tcagent%'
AND rootProjectName = 'gradle'
AND env.key LIKE 'Jvm'
AND buildTimestamp > TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL 7 DAY)
GROUP BY
1
ORDER BY
2 DESC;
SELECT
buildAgentId,
JSON_EXTRACT(env.value,
'$.daemon') AS daemon,
JSON_EXTRACT(env.value,
'$.taskOutputCache') AS build_cache,
COUNT(env.value) AS count
FROM
`gradle_builds.builds`,
UNNEST(environmentParameters) AS env
WHERE
buildAgentId NOT LIKE 'tcagent%'
AND env.key LIKE 'BuildModes'
and (JSON_EXTRACT(env.value,
'$.daemon') = 'false' OR JSON_EXTRACT(env.value,
'$.taskOutputCache') = 'false')
AND buildTimestamp > TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL 30 DAY)
GROUP BY 1, 2, 3
ORDER BY 4 DESC;
-
Given a task, tell me the cache rate, mean/stddev/histogram, etc. How has this changed over time?
-
Given a test, tell me the outcome history, duration, flakiness, etc.
-
What are the costliest tests? Are there Test tasks that never fail? Could we run them less frequently?
-
What are the costliest errors (again, time x frequency) and build failures, segmented by environment and failure type?
gcloud config set compute/region us-central1
gcloud config set compute/zone us-central1-f