-
Notifications
You must be signed in to change notification settings - Fork 135
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add a way (annotations?) to mark a Tekton Resource to be skipped= #979
Comments
i agree with the feature request but not sure if it's the right solution for the use case - if i install / enable chains on a cluster with a heavy load of tekton workloads, then i need to annotate all the older workloads, which is... weird and clunky?? wdyt about a configuration option that tells chains not to sign older / already existing workloads on the cluster? something like |
@concaf the trick would be, what happens if the chains controller is shut down for an hour or two ? (during an upgrade or something). One would have to enable that field, and disable it later on. In addition, how do we "decide" what is "existing" ? The one that were created before the controller started, the one that were done at the point ? I do think the skip annotation approach does not necessarily prevent a "controller's configuration" to be set/put in place later on.
This is relatively easy to script and it has the additional benefit of easily know that those resources were skip on purpose (versus having to look kind-of deep to understand why some resources are signed and some not ?). |
An alternative is to introduce a new Chains configuration to ignore any resource created prior to a given date, e.g. ignore-older-resources: 2023-01-01T00:00:00Z That could potentially give a hook for an installer to better match the user's expectations. |
cc @tektoncd/chains-maintainers 🙏🏼 |
we could go via a timestamp approach as well, i.e. "don't sign anything that was created before 5 pm, 31 oct 2023"
i agree that it's easy to script, however it feels a bit incessant to me; building on your approach, there could also be a whitelisting flag that can help which would be both, explicit and obvious. if you set there can be prioritization of sorts - by default, chains watches and signs all workloads - then there should a configuration that tells chains only to sign the workloads with the whitelist annotation / label. another reason why i prefer whitelisting is that it's more explicit than blacklisting - say that a user (who knows nothing about chains) creates a tekton workload, then chains is going to sign it implicitly - in order to deny chains, they need to add an annotation; while on the other hand if whitelisting is "enabled", then only the workloads with that explicit annotation / label will be signed. wdyt? |
In general I would not recommend allowing pipeline users have a skip mode - you don't want nefarious users to be able to suppress provenance generation to mask what they're doing. Opt-in for testing things out / ramp up seems reasonable (like we do for
Are you seeing CPU load, memory load, both? Any metrics you can share? 👀 By default, Knative controllers are rate limited (default 10 qps / 100 burst). If this is taking up a crazy amount of resources then this may be a sign of a leak. My first thought would be to try and profile the controller and see if there's any low hanging fruit here.
I'd be okay with this, though the same problem could still pop up in other scenarios. 🤷 |
That could be an option I guess, but as soon as the user would switch to
I don't have all the detail, but it's happening on a cluster with about 4000 to 5000 (if not more) existing PipelineRun when they switch chains on (and a bit less 1000 PipelineRun per hours). They had to up they resource request to at least 4G of memory for the controller to not be killed. However, the controller takes.. a lot of time to handle all the existing PipelineRuns, and thus the queue to sign any "live" / new PipelineRun (or TaskRun) keeps growing. There is probably some exploration to do around the HA setup (similar to Pipeline, with multiple replicas, …), and fiddling with the controller's flag around the ratelimits, etc... |
thinking out loud here - another implicit way to accommodate this use case would be that the chains controller processes the current / active / incoming workloads first before moving to older workloads on the cluster which get queued up; chains can start processing older workloads when it gets some free cycles and a chance to breathe - wdyt? |
I don't think there is any way to control this from |
You can! Knative has slow/fast queues built in - active items come in on the fast queue, resyncs come in on the slow queue and are picked off as needed. What you could do is have a threshold for recent completed Runs, and if you're outside the threshold kick the key out to the slow queue to be handled later. |
Ohh that's interersting 🙃 |
Feature request
Provide a way to mark a Tekton resources (TaskRun or PipelineRun) to be skipped by the controller.
On of the possible idea would be to add an extra value to
chains.tekton.dev/signed
with askip
value, or an entirely new annotations (chains.tekton.dev/skip
).Use case
One of the use case for this is, if we enable
chains
on a very Tekton resources heavy cluster (like 5000s existing PipelineRun, or more…). At start, the chains controller will load them all and try to sign them all (more or less at once). This will possible generate a crazy amount of load if not completely kill the controller. One idea to reduce this, would be to allow a user to mark previous / existings resources to not be signed, making chains work essentially no-op when it starts (or very low).It could also, probably, be useful if one is experimenting with some Pipeline or Task and doesn't want chains to sign it.
cc @lcarva @concaf
The text was updated successfully, but these errors were encountered: