As someone currently working to move a large enterprise to GH Actions (not quite, but “yaml-based pipelines tied to git”) - what would discipline look like? If you can describe it, I can probably make it happen at my org.
All github action logic should be written in a language that compiles to yaml, for example dhall (https://dhall-lang.org/). Yaml is an awful language for programmers, and it's a worse language for non-programmers. It's good for no one.
2. To the greatest extent possible, do not use any actions which install things.
For example, don't use 'actions/setup-node'. Use bazel, nix, direnv, some other tool to setup your environment. That tool can now also be used on your developer's machines to get the same versions of software as CI is using.
3. Actions should be as short and simple as possible.
In many cases, they will be as simple as effectively "actions/checkout@v4", "run: ./ci/build.sh", and that's it.
Escape from yaml as quickly as possible, put basic logic in bash, and then escape from bash as quickly as possible too into a real langauge.
4. Do not assume that things are sane or secure by default.
Ideally you don't accept PRs from untrusted users, but if you do, read all the docs very carefully about what actions can run where, etc. Github actions on untrusted repos are a nightmare footgun.
I agree with most of the points but I would condense #2 and #3 to "Move most things into scripts". Sometimes it's difficult to avoid complex workflows but generally it's a safer bet to have actual scripts you can re-use and use for other environments than GitHub. It's a bad idea to make yourself dependent entirely on one company's CI system, especially if it's free or an add-on feature.
However I'd balk at the suggestion to use Dhall (or any equally niche equivalent) based on a number of factors:
1) If you need this advice, you probably don't know Dhall nor does anyone else who has worked or will work on these files, so everyone has to learn a new language and they'll all be novices at using that language.
2) You're adding an additional dependency that needs to be installed, maintained and supported. You also need to teach everyone who might touch the YAML files about this dependency and how to use it and not to touch the output directly.
3) None of the advice on GitHub Workflows out there will apply directly to the code you have because it is written in YAML so even if Dhall will generate YAML for you, you will need to understand enough YAML to convert it to Dhall correctly. This also introduces a chance for errors because of the friction in translating from the language of the code you read to the language of the code you write.
4) You are relying on the Dhall code to correctly map to the YAML code you want to produce. Especially if you're inexperienced with the language (see above) this means you'll have to double check the output.
5) It's a niche language so it's neither clear that it's the right choice for the project/team nor that it will continue to be useful. This is an extremely high bar considering the effort involved in training everyone to use it and it's not clear at all that the trade-off is worth it outside niche scenarios (e.g. government software that will have to be maintained for decades). It's also likely not to be a transferable skill for most people involved.
The point about YAML being bad also becomes less of an issue if you don't have much code in your YAML because you've moved it into scripts.
The other problem with Github Actions that I always mention that muddies the waters when it comes to discussions of it is that GHA itself is front loaded as actually multiple different things:
1. Event dispatching/triggers, the thing that spawns webhooks/events to do things
2. The orchestration implementation (steps/jobs, a DAG-like workflow execution engine)
3. The reusable Actions marketplace
4. The actual code that you are running as part of the build
5. The environment setup/secrets of GHA, in other words, the makeup of how variables and other configurations are injected into the environment.
The most maintainable setups only leverage 1 directly from GHA. 2-5 can be ignored or managed through containerized workflow in some actual build system like Bazel, Nix, etc.
You're also adding an extra build step that by its nature can't run in CI since it generates the CI pipelines. So now you need some way to keep your Dhall and YAML in sync. I suppose you could write one job in YAML that compiles the Dhall and fails the build if it's out of date, but it seems like a lot of extra work for minimal payoff.
Instead, if you want to stay away from YAML, I'd say just move as much of the build as possible into external scripts so that the YAML stays very simple.
Not too long ago, I went down a rabbit hole of specifying GHA yaml via dhall, and quickly hit some problems; the specific thing I was starting with was the part I was frustrated with, which was the "expresssions" evaluation stuff.
However, I quickly ran into the whole "no recursive data structures in dhall" (at least, not how you would normally think about it), and of course, a standard representation of expressions is a recursively defined data type.
I do get why dhall did this, but it did mean that I quickly ran into super advanced stuff, and realized that I couldn't in good conscience use this as my team of mixed engineers would need to read/maintain it in the future, without any knowledge of how to do recursive definitions in dhall, and without the inclination to care either.
basically, to do recursive definitions, you have to lambda encode your data types, work with them like that, and then finally "reify" them with, like, a concrete list type at the end, which means that all those lambdas evaluate away and you're just left with list data. This is neat and intresting and worthy of learning, but would be wildly overly-complicated for most eng teams I think.
> To the greatest extent possible, do not use any actions which install things.
Why not? I assume the concern is making sure development environments and production use the same configuration as CI. But that feels like somewhat of an orthogonal issue. For example, in Node.js, I can specify both the runtime and package manager versions using standard configuration. I think it's a bonus that how those specific versions get installed can be somewhat flexible.
> All github action logic should be written in a language that compiles to yaml
An imperative language that compiles to a declarative language that emulates imperative control flow and calls other programs written in imperative languages that can have side effects that change control flow? Please no.
0. Make 99% of your setups runnable locally, with Docker if need be. It's the fastest way to test something and nothing else come close. #1 and #2 derive from #0. This is actually a principle for code, too, if you have stuff like Lambda, make sure you have a command line entry point, too and you can also test things locally.
1. Avoid YAML if you can. Either plain configuration files (generated if need be - don't be afraid to do this) or full blown programming languages with all the rigor required (linting/static analysis, tests, etc).
2. Move ALL logic outside of the pipeline tool. Your actions should be ./my-script.sh or ./my-tool.
Source: lots of years of experience in build engineering/release engineering/DevOps/...
I understand the arguments for putting more things in scripts instead of GHA YAML. However, I also like that breaking things up into multiple YAML steps means I get better reporting via GitHub. Of course I could have multiple scripts that I run to get the same effect. But I wish there was a standard protocol for tools to report progress information to a CI environment. Something like the Test Anything Protocol[0], but targeted at CI/CD.
GitHub Actions workflow commands[1] are similar to what I'm thinking of, but not standardized.
It's frustrating that we're beholden to Github to add support for something like this to their platform, especially when the incentives are in the wrong direction— anything that's more generic and more portable reduces lock-in to Actions.
The golden rule is "will I need to make a dummy commit to test this?" and if yes, find a different way to do it. All good rules in sibling comments here derive from this rule.
You not want to ever need to make dummy commits to debug something in CI, it's awful. As a bonus, following this rule also means better access to debugging tools, local logs, "works on CI but not here" issues, etc. Finally if you ever want to move away from GitHub to somewhere else, it'll be easy.
For CI action: pre-build docker image with dependencies, then run your tests using this image as single GitHub action command.
If dependencies change, rebuild image.
Do not rely on gh caching, installs, multiple steps, etc.
Otherwise there will be a moment when tests pass locally, but not on gh, and debugging will be super hard. In this case you just debug in the same image.
1. Distinct prod and non-prod environments. I think you should have distinct Lab and Production environments. It should be practical to commit something to your codebase, and then test it in Lab. Then, you deploy that to Production. The Github actions model confuses the concepts of (source control) and (deployment environment). So you easily end up with no lab environment, and people doing development work against production.
2. Distinguish programming language expression and DSLs. Github yaml reminds me of an older time where people built programming languages in XML. It is an interesting idea, but it does not work out. The value of a programming language: the more features it has, the better. The value of a DSL: the fewer features it has, the better.
3. Security. There is a growing set of github-action libraries. The Github ecosystem makes it easy to install runners on workstations to accept dispatch from github actions. This combination opens opportunities for remote attacks.
Writing any meaningful amount of logic or configuration in yaml will inevitably lead to the future super-sentient yaml-based AI torturing you for all eternity for having taken any part in cursing it to a yaml-based existence. The thought-experiment of "Roko's typed configuration language" is hopefully enough for you to realize how this blog post needs to be deleted from the internet for our own safety.
> Declarative languages are great for specifying these sorts of things
Yes, good declarative languages are. I'm a happy nix user. I like dhall. cel is a cool experiment. jsonnet has its place. I stan XML.
A language with byzantine type rules, like 'on: yes' parses the same as "true: true", but only in some languages (like ruby's built-in yaml parser for example), but not others, and only with some settings, is not it chief.
It isn't even one language, since most yaml parsers only have like 90% coverage of the spec, and it's a different percent, so the same yaml document often won't be parsed the same even by two libraries in the same programming language.
It's really like 20 subtly incompatbile languages that are all called "yaml".
It is indefensible in any context.
Github actions should have been in starlark, xml, or even lisp or lua.
This generation will shudder when they are asked to bring discipline to deployments built from github actions.