Modern CI is too complex and misdirected

4 years ago/183 comments/gregoryszorc.com

I think that modern CI is actually too simple. They all boil down to "get me a Linux box and run a shell script". You can do anything with that, and there are a million different ways to do everything you could possibly want. But, it's easy to implement, and every feature request can be answered with "oh, well just apt-get install foobarbaz3 and run quuxblob to do that."

A "too complex" system, would deeply integrate with every part of your application, the build system, the test runner, the dependencies would all be aware of CI and integrate. That system is what people actually want ("run these browser tests against my Go backend and Postgres database, and if they pass, send the exact binaries that passed the tests to production"), but have to cobble together with shell-scripts, third-party addons, blood, sweat, and tears.

I think we're still in the dark ages, which is where the pain comes from.

4 years ago by noir_lord

Pretty much.

Docker in my experience is the same way, people see docker as the new hotness then treat it like a Linux box with a shell script (though at least with the benefit you can shoot it in the head).

One of the other teams had an issue with reproducibility on something they where doing so I suggested that they use a multistage build in docker and export the result out as an artefact they could deploy, they looked at me like I’d grown a second head yet have been using docker twice as long as me, though I’ve been using Linux for longer than all of them combined.

It’s a strange way to solve problems all around when you think about what it’s actually doing.

Also feels like people adopt tools and cobble shit together from google/SO, what happened to RTFM.

If I’m going to use a technology I haven’t before the first thing I do is go read the actual documentation - I won’t understand it all on the first pass but it gives me an “index”/outline I can use when I do run into problems, if I’m looking at adopting a technology I google “the problem with foobar” not success stories, I want to know the warts not the gloss.

It’s the same with books, I’d say two 3/4 of the devs I work with don’t buy programming books, like at all.

It’s all cobbled together knowledge from blog posts, that’s fine but a cohesive book with a good editor is nearly always going to give your a better understanding than piecemeal bits from around the net, that’s not to say specific blog posts aren’t useful but the return on investment on a book is higher (for me, for the youngsters, they might do better learning from tiktok I don’t know..).

4 years ago by pbrb

I personally love it. RTFM is pretty much the basis of my career. I always at minimum skim the documentation (the entire doc) so I have an index of where to look. It's a great jumping off point for if you do need to google anything.

Books are the same. When learning a new language for example, I get a book that covers the language itself (best practices, common design patterns, etc), not how to write a for loop. It seems to be an incredibly effective way to learn. Most importantly, it cuts down on reading the same information regurgitated over and over across multiple blogs.

4 years ago by cogman10

lol... yeah.... I've become "the expert" on so much shit just because of RTFM :D

It's amazing how much stuff is spelt out in manuals that nobody bothers to read.

The only issue is that so few people RTFM that some manuals are pure garbage to try and glean anything useful. In those cases, usually the best route is often to just read the implementation (though that is tedious).

4 years ago by na85

>Also feels like people adopt tools and cobble shit together from google/SO, what happened to RTFM.

Sometimes it's easier to google because TFM is written by people who are intricately familiar with the tool and forget what it's like to be unfamiliar with it.

Look at git for example; the docs are atrocious. Here's the first line under DESCRIPTION for "man git-push":

>Updates remote refs using local refs, while sending objects necessary to complete the given refs.

Not a single bit of explanation as to what the fuck a "ref" is, much less the difference between remote and local refs. If you didn't have someone to explain it to you, this man page would be 100% useless.

4 years ago by mariusor

I think that reading the man page for git-push is not reasonable if you don't understand git first.

That being the case, the first thing you need to read is the main git man page. At the bottom of it (sadly) you find references to gitrevisions and gitglossary man pages. Those should provide enough information and examples to understand what a ref is, yet probably even these could be better.

I'm in full agreement that this is terribly undiscoverable, but if you really want to RTFM, you mustn't stop at just the first page.

4 years ago by dkarl

> treat it like a Linux box with a shell script (though at least with the benefit you can shoot it in the head)

To be fair, that by itself is a game-changer, even if doesn't take full advantage of Docker.

4 years ago by toomanyducks

Any CI books you can recommend? I have been completely treating it as a linux box with a shell script and have cobbled together all of my shit from google/SO.

4 years ago by gravypod

This is something that I love about using Bazel. It allows you to do this. Bazel is aware of application-level concepts: libraries, binaries, and everything that glues this together. It has a simple way to describe a "test" (something that's run who's exit code determines pass/fail) and how to link/build infinitely complex programs. Do you build a game engine and need to take huge multi-gb assets folders and compile them into an efficient format for your game to ship with? You can use a genrule to represent this and now you, your CI, and everyone on your team will always have up-to-date copies of this without needing to worry about "Bob, did you run the repack script again?"

It also provides a very simple contract to your CI runners. Everything has a "target" which is a name that identifies it.

A great talk about some things that are possible: https://youtu.be/muvU1DYrY0w?t=459

At a previous company I got our entire build/test CI (without code coverage) from ~15 minutes to ~30 to ~60 seconds for ~40 _binary and ~50 _tests (~100 to ~500 unit tests).

4 years ago by stingraycharles

I agree with you in principle, but I have learned to accept that this only works for 80% of the functionality. Maybe this works for a simple Diablo or NodeJS project, but in any large production system there is a gray area of “messy shit” you need, and a CI system being able to cater to these problems is a good thing.

Dockerizing things is a step in the right direction, at least from the perspective of reproducibility, but what if you are targeting many different OS’es / architectures? At QuasarDB we target Windows, Linux, FreeBSD, OSX and all that on ARM architecture as well. Then we need to be able to set up and tear down whole clusters of instances, reproduce certain scenarios, and whatnot.

You can make this stuff easier by writing a lot of supporting code to manage this, including shell scripts, but to make it an integrated part of CI? I think not.

4 years ago by gryn

I'm curious what's a Diablo Project ? I've never heard of such technology unless you're speaking of the game with the same name.

Did you possibly mean Django ?

4 years ago by stingraycharles

Argh it was indeed Django, I was on mobile and it must have been autocorrected.

4 years ago by undefined

[deleted]

4 years ago by cogman10

While it comes up, I think it's more of a rare problem. So much stuff is "x86 linux" or in rare cases "ARM linux" that it doesn't often make sense to have a cross platform CI system.

Obviously a db is a counter example. So is node or a compiler.

But at least from my experience, a huge number of apps are simply REST/CRUD targeting a homogeneous architecture.

4 years ago by viraptor

Unless we're talking proprietary software deployed to only one environment, or something really trivial, it's still totally worth testing other environments / architectures.

You'll find dependency compilation issues, path case issues, reserved name usage, assumptions about filesystem layout, etc. which break the code outside of Linux x86.

4 years ago by reubenmorais

You're vehemently agreeing with the author, as far as I can see. The example you described is exactly what you could do/automate with the "10 years skip ahead" part at the end. You can already do it today locally with Bazel if you're lucky to have all your dependencies usable there.

4 years ago by pydry

It's weird that people keep building DSLs or YAML based languages for build systems. It's not a new thing, either - I remember using whoops-we-made-it-turing complete ANT XML many years ago.

Build systems inevitably evolve into something turing complete. It makes much more sense to implement build functionality as a library or set of libraries and piggyback off a well designed scripting language.

4 years ago by humanrebar

> Build systems inevitably evolve into something turing complete.

CI systems are also generally distributed. You want to build and test on all target environments before landing a change or cutting a release!

What Turing complete language cleanly models some bits of code running on one environment and then transitions to other code running on an entirely different environment?

Folks tend to go declarative to force environment-portable configuration. Arguably that's impossible and/or inflexible, but the pain that drives them there is real.

If there is a framework or library in a popular scripting language that does this well, I haven't seen it yet. A lot of the hate for Jenkinsfile (allegedly a groovy-based framework!) is fallout from not abstracting the heterogeneous environment problem.

4 years ago by pydry

>What Turing complete language cleanly models some bits of code running on one environment and then transitions to other code running on an entirely different environment?

Any language that runs in both environments with an environment abstraction that spans both?

>Folks tend to go declarative to force environment-portable configuration.

Declarative is always better if you can get away with it. However, it inevitably hamstrings what you can do. In most declarative build systems some dirty turing complete hack will inevitably need to be shoehorned in to get the system to do what it's supposed to. A lot of build systems have tried to pretend that this won't happen but it always does eventually once a project grows complex enough.

4 years ago by humanrebar

> Any language that runs in both environments with an environment abstraction that spans both?

Do you have examples? This is harder to do than it would seem.

You would need an on demand environment setup (a virtualenv and a lockfile?) or a homogeneous environment and some sort of RPC mechanism (transmit a jar and execute). I expect either to be possible, though I expect the required verbosity and rigor to impede significant adoption.

Basically, I think folks are unrealistic about the ability to be pithy, readable, and robust at the same time.

4 years ago by throwaway823882

I call this the fallacy of apparent simplicity. People think what they need to do is simple. They start cobbling together what they think will be a simple solution to a simple problem. They keep realizing they need more functionality, so they keep adding to their solution, until just "configuring" something requires an AI.

4 years ago by jayd16

Scripting languages aren't used directly because people want a declarative format with runtime expansion and pattern matching. We still don't have a great language for that. We just end up embedding snippits in some data format.

4 years ago by phtrivier

Who are the "people" who really want that, are responsible for a CI build, and are not able to use a full programming language ?

I used jenkins pipeline for a while, with groovy scripts. I wish it had been a type checked language to avoid failing a build after 5minutes because of a typo, but, it was working.

Then, somehow, the powers that be decided we had to rewrite everything in a declarative pipeline. I still fail to see the improvement ; but doing "build X, build Y, then if Z build W" is now hard to do.

4 years ago by ryan29

People used to hate on Gradle a lot, but it was way better than dealing with YAML IMO. Add in the ability to write build scripts in Kotlin and it was looking pretty good before I started doing less Java.

I think a CI system using JSON configured via TypeScript would be neat to see. Basically the same thing as Gradle via Kotlin, but for a modern container (ie: Docker) based CI system.

I can still go back to Gradle builds I wrote 7-8 years ago, check them out, run them, understand them, etc.. That's a good build system IMO. The only thing it could have done better was pull down an appropriate JDK, but I think that was more down to licensing / legal issues than technical and I bet they could do it today since the Intellij IDEs do that now.

4 years ago by phtrivier

I was waiting for jai to see how the build scripts are basically written in... Jai Itself.

It seems that zig [1] already does it. Hoping to try that someday...

[1] https://ziglearn.org/chapter-3/

4 years ago by imtringued

You can activate typechecking in groovy with @CompileStatic. It's an all or nothing thing though (for the entire file).

4 years ago by gilbetron

Joe Beda (k8s/Heptio) made this same point in one of his TGI Kubernetes videos: https://youtu.be/M_rxPPLG8pU?t=2936

I agree 100%. Every time I see "nindent" in yaml code, a part of my soul turns to dust.

4 years ago by theptip

> Every time I see "nindent" in yaml code, a part of my soul turns to dust.

Yup. For this reason it's a real shame to me that Helm won and became the lingua franca of composable/configurable k8s manifests.

The one benefit of writing in static YAML instead of dynamic <insert-DSL / language>, is that regardless of primary programming language, everyone can contribute; more complex systems like KSonnet start exploding in first-use complexity.

4 years ago by qbasic_forever

I wouldn't say helm has won, honestly. The kubectl tool integerated Kustomize into it and it's sadly way too underutilized. I think it's just that the first wave of k8s tutorials that everyone has learned from were all written when helm was popular. But now with some years of real use people are less keen on helm. There are tons of other good options for config management and templating--I expect to see it keep changing and improving.

4 years ago by hardwaresofton

Some of us use Make + evnsubst[0] (and more recently make + kustomize[1]) in defiance.

I haven't found time to take a look at Helm 3 yet though, it might be worth switching to.

[0]: https://www.vadosware.io/post/using-makefiles-and-envsubst-a...

[1]: https://www.vadosware.io/post/setting-up-mailtrain-on-k8s/#s...

4 years ago by gilbetron

Can just default to something like

  """
  apiVersion: v1
  appName: "blah"
  """.FromYaml().Execute()

or something.

4 years ago by ithkuil

I wish more people who for some reason are otherwise forces to use a textual templating system to output would remember that every json object is a valid yaml value, so instead of fiddling with indent you just ".toJson" or "| json" or whatever is your syntax and it pull get something less brittle.

(Or use a structural templating system like jsonnet or ytt)

4 years ago by lenkite

I so oppressed with YAML chosen as the configuration language for mainstream CI systems. How do people manage to live with this ? I always make mistakes - again and again. And I can never keep anything in my head. It's just not natural.

Why couldn't they choose a programming language ? Restrict what can be done by all means, but something that has a proper parser, compiler errors and IDE/editor hinting support would be great.

One can even choose an embedded language like Lua for restricted execution environments. Anything but YAML!

4 years ago by 0xbadcafebee

YAML is not a language, it's a data format. Why does nobody in the entire tech industry know the difference? I didn't even go to school and I figured it out.

Most software today that uses YAML for a configuration file is taking a data format (YAML) applying a very shitty parser to create a data structure, and then feeding that data structure to a function, which then determines what other functions to call. There's no grammar, no semantics, no lexer, no operators, and no types, save those inherent to the data format it was encoded in (YAML). Sometimes they'll look like they include expressions, but really they're just function arguments.

4 years ago by anonydsfsfs

The gp isn't talking about YAML in itself, they're talking about YAML as it's used by mainstream CI systems. Github Actions, for example, encodes conditionals, functions, and a whole lot of other language elements in its YAML format. To say it's "just a data format" is like saying XSLT is "just a markup language" because it's written in XML.

4 years ago by aliswe

All CI yaml configs look basically the same, so I believe this is missing the intended point.

But, TIL. Thanks

4 years ago by oblio

> YAML is not a language, it's a data format.

Yet Another Markup <<Language>> (which later supposedly became "YAML Ain't Markup Language", because every villain needs a better backstory).

4 years ago by zaat

VS code with the prettier extension is an IDE with hinting, parser and immediately show when you have (not compiler but) errors. If there is an extension for your CI system try installing it too.

4 years ago by stitched2gethr

This is what chef did and I enjoyed it, but it seems that's not the way most systems went.

4 years ago by kzrdude

I guess YAML has the best solution to nested scopes, just indentation

4 years ago by nohuck13

"Bazel has remote execution and remote caching as built-in features... If I define a build... and then define a server-side Git push hook so the remote server triggers Bazel to build, run tests, and post the results somewhere, is that a CI system? I think it is! A crude one. But I think that qualifies as a CI system."

---

Absolutely.

The advisability of rolling your own CI aside, treating CI as "just another user" has real benefits, and this was a pleasant surprise for me when using Bazel. When your run the same build command (`say bazel test //...`) across development and CI, then:

- you get to debug your build pipeline locally like code

- the CI DSL/YAML files mostly contain publishing and other CI-specific information (this feels right)

- the ability of a new user to pull the repo, build, and have everything just work, is constantly being validated by the CI. With a bespoke CI environment defined in a Docker image or YAML file this is harder.

- tangentially: the remote execution API [2] is beautiful in its simplicity it's doing a simple core job.

[1] OTOH: unless you have a vendor-everything monorepo like Google, integrating with external libraries/package managers is unnatural; hermetic toolchains are tricky; naively-written rules end up system-provided utilities that differ by host, breaking reproducibility, etc etc.

[2] https://github.com/bazelbuild-remote-apis/blob/master/build/...

4 years ago by qznc

How does Bazel deal with different platforms? For example, run tests on Windows, BSD, Android, Raspberry Pi, RISCv5, or even custom hardware?

4 years ago by EdSchouten

Pretty well! You can set up a build cluster that provides workers for any of these different platforms. Each of these platforms is identified by a different set of label values. Then you can run Bazel on your personal system to 'access' any of those platforms to run your build actions or tests.

In other words: A 'bazel test' on a Linux box can trigger the execution of tests on a BSD box.

(Full transparency: I am the author of Buildbarn, one of the major build cluster implementations for Bazel.)

4 years ago by nohuck13

Bazel differentiates between the "host" environment (your dev box) the "execution" environment (where the compiler runs) and the "target" environment (e.g. RISCv5)

Edit: there's a confusing number of ways of specifying these things in your build, e.g. old crosstool files, platforms/constraints, toolchains. A stylized 20k foot view is:

Each build target specifies two different kinds of inputs: sources (code, libraries) and "tools" (compilers). A reproducible build requires fully-specifying not just the sources but all the tools you use to build them.

Obviously cross-compiling for RISCv5 requires different compiler flags than x86_64. So instead of depending on "gcc" you'd depend on an abstract "toolchain" target which defines ways to invoke different version(s) of gcc based on your host, execution, and target platforms.

In practice, you wouldn't write toolchains yourself, you'd depend on existing implementations provided by library code, e.g. many many third party language rules here: https://github.com/jin/awesome-bazel#rules

And you _probably_ wouldn't depend on a specific toolchain in every single rule, you'd define a global one for your project.

"platforms" and "constraints" together let you define more fine-grained ways different environments differ (os, cpu, etc) to avoid enumerating the combinatoric explosion of build flavors across different dimensions.

HTH, caveat, I have not done cross-compilation in anger. Someone hopefully will correct me if my understanding is flawed.

4 years ago by lstamour

The reason this isn't a concern is because Bazel tries very hard to not let any system libraries or configurations interfere with the build, at all, ever. So it should rarely matter what platform you're running a build on, the goal should be the same output every time from every platform.

Linux is recommended, or a system that can run Docker and thus Linux. From there it depends on the test or build step. I haven't done much distributed Bazel building or test runs yet myself. I imagine you can speak to other OSes using qemu or network if speed isn't a concern. You can often build for other operating systems without natively using other operating systems using a cross-compiling toolchain.

That said Bazel is portable - it generally needs Java and Bash and is generally portable to platforms that have both, though I haven't checked recently. There are exceptions though, and it will run natively in Windows, just not as easily. https://docs.bazel.build/versions/master/windows.html It also works on Mac, but it's missing Linux disk sandboxing features and makes up for it using weird paths and so on.

4 years ago by oblio

> That said Bazel is portable - it generally needs Java and Bash and is generally portable to platforms that have both, though I haven't checked recently. There are exceptions though, and it will run natively in Windows, just not as easily. https://docs.bazel.build/versions/master/windows.html It also works on Mac, but it's missing Linux disk sandboxing features and makes up for it using weird paths and so on.

The good old: in theory it's portable, but in practice the target of that port better look 100% like Linux :-)

4 years ago by joshuamorton

https://docs.bazel.build/versions/master/platforms.html is probably what you want.

So you can, conceivably, bazel running on your local, x86 machine, run the build on an ARM (rpi) build farm, crosscompiling for RISCv5.

I presume that this specific toolchain isn't well supported today.

4 years ago by lame88

I had to integrate Azure Pipelines and wanted to shoot myself in the face. The idea that you are simply configuring a pipeline yaml is just one big lie; it's code, in the world's shittiest programming language using YML syntax - code that you have no local runtime for, so you have to submit to the cloud like a set of punch cards to see that 10 minutes later it didn't work and to try again. Pipelines are code, pure and simple. The sooner we stop pretending it isn't, the better off we'll be.

4 years ago by aliswe

Yeah, agreed. Feels like programming a PHP/ASP website directly on the server through FTP, like we did in the 90's

4 years ago by bob1029

We got tired of using external tools that were not well-aligned with our build/deployment use cases - non-public network environments. GitHub Actions, et. al. cannot touch the target environments that we deploy our software to. Our customers are also extremely wary of anything cloud-based, so we had to find an approach that would work for everyone.

As a result, we have incorporated build & deployment logic into our software as a first-class feature. Our applications know how to go out to source control, grab a specified commit hash, rebuild themselves in a temporary path, and then copy these artifacts back to the working directory. After all of this is completed, our application restarts itself. Effectively, once our application is installed to some customer environment, it is like a self-replicating organism that never needs to be reinstalled from external binary artifacts. This has very important security consequences - we build on the same machine the code will execute on, so there are far fewer middle men who can inject malicious code. Our clients can record all network traffic flowing to the server our software runs on and definitively know 100% of the information which constitutes the latest build of their application.

Our entire solution operates as a single binary executable, so we can get away with some really crazy bullshit that most developers cannot these days. Putting your entire app into a single self-contained binary distribution that runs as a single process on a single machine has extremely understated upsides these days.

4 years ago by lstamour

Sounds like Chrome, minus the build-on-the-customer's-machine part. Or like Homebrew, sort of. Also sounds like a malware dropper. That said, it makes sense. I would decouple the build-on-the-customer's machine part from the rest, having a CI system that has to run the same way on every customer's machine sounds like a bit of a nightmare for reproducibility if a specific machine has issues. I'd imagine you'd need to ship your own dependencies and set standards on what version of Linux, CPU arch and so on you'd support. And even then I'd feel safer running inside overlays like Docker allows for, or how Bazel sandboxes on Linux.

Also reminds me a bit of Istio or Open Policy Agent in that both are really apps that distribute certificates or policy data and thus auto-update themselves?

4 years ago by bob1029

We use .NET Core + Self-Contained Deployments on Windows Server 2016+ only. This vastly narrows the scope of weird bullshit we have to worry about between environments.

The CI system running the same way on everyone's computer is analogous to MSBuild working the same way on everyone's computer. This is typically the case due to our platform constraints.

4 years ago by ElijahLynn

> GitHub Actions, et. al. cannot touch the target environments that we deploy our software to.

It can with on-prem self-hosted Runners > https://docs.github.com/en/actions/hosting-your-own-runners/...

I just had this same complaint about using Actions and was pointed to this document.

4 years ago by cyberpunk

Yeah and that's a total shitshow too. You want to run that in k8s?

    * First off it won't work with a musl image
    * You need to request a new token every 2 hours
    * It doesn't spawn pods per build like gitlab, it's literally a dumb runner, the jobs will execute *IN* the runner container, so no isolation, and you need all the tools under the sun installed in the container (our runner image clocked in at 2gb for a java/node stack)
    * Get prepared for a lot of .net errors on your linux boxes (yes, not exactly a showstopper but.. urgh).

I hated my time with GitHub actions and will not miss it.

4 years ago by Mic92

Yeah. I really hope they fix the dumb runner part: https://github.com/actions/runner/pull/660 I resorted to this one instead https://github.com/summerwind/actions-runner-controller but it requires kubernetes...

4 years ago by jugg1es

Forgive me, but it sounds dangerous and insecure to give your software the kind of access that would be required to do what you described. Even with safety measures and auditing in place, I'm not sure if I would feel comfortable doing this.

4 years ago by bob1029

How is this any less secure than handing the customer a zip file containing arbitrary binary files and asking them to execute them with admin privileges?

4 years ago by solatic

I worked for a place which simply ran antivirus/malware scans on the vendor-supplied binaries. Way easier to review an antivirus scan giving an approval or rejection, compared to allowing code to download source code from a vendor server (which you hope is not compromised), which does not pass human review before being compiled and run. The latter is far more likely to result in infection, unless the source code in the former is verified somehow (signed commits from a whitelist of signatures, at the very least).

4 years ago by jugg1es

I wouldn't do that either, but it's even less secure than that because the software would have credentials to the source control system. It also means your source control system has to be public.

4 years ago by oblio

Not insecure for the customer, insecure for you, the software vendor.

4 years ago by Smaug123

"Build Systems à la Carte" is not so much ringing a bell as shattering it with the force of its dong.

https://www.microsoft.com/en-us/research/uploads/prod/2018/0...

To expand, the OP ends with an "ideal world" that sounds to me an awful lot like someone's put the full expressive power of Build Systems à la Carte into a programmable platform, accessible by API.

4 years ago by jacques_chester

Nitpick: I think you meant to write "gong".

4 years ago by rubiquity

Call me crazy, but I don't think they did...

4 years ago by simonw

It genuinely hadn't crossed my mind that a CI system and a build system were different things - maybe because I usually work in dynamic rather than compiled languages?

I've used Jenkins, Circle CI, GitLab and GitHub Actions and I've always considered them to be a "remote code execution in response to triggers relating to my coding workflow" systems, which I think covers both build and CI.

4 years ago by oblio

Did your dynamic languages not have some sort of build tool? Heck, Javascript has Gulp, Grunt, Webpack, a million others. Ruby has rake, Python pipenv/poetry, I guess. Python is actually kind of outlier, I guess you're expected to write Python scripts to manage your packaging and such.

4 years ago by bencalam

Same boat. I am surprised that I had to to scroll so far down to see this comment.

Daily Digest

Get a daily email with the the top stories from Hacker News. No spam, unsubscribe at any time.

Home About GitHub Kaggle

AI Blog Deep Learning Apps Security Checklist

Bookmarks Hacker News My Stack