This is a huge achievement for Debian and the free software world.
It took a while though until this was understood. In 2007 when pointing out on debian-devel that this is needed, I was still told what huge waste of time this would be. And indeed it took a huge amount of work by many people to get there, but it is well worth it.
There was no bug or attack on Debian since 2007 that reproducible packages would prevent.
"Well worth it" is not correct. And it just ups the the contribution barrier to Debian higher, I already heard a lot of people complaining that contributing to Debian is hard and while in past I defended it by "they need all the checks and bounds to make sure packages play with eachother nicely", this is just step that makes it hard for no reason and little benefit.
> as Ubuntu package will have same files structured exactly same way as Debian one.
As opposed to what? If Ubuntu uses the same source, of course they get the same binaries. And if Ubuntu applies patches, they'll get something different. And that's still true.
(Not OP, but...) I still fail to see the current value in confirming that a reproducing builder also included the same compromised dependency that I did when I built it. I understand that reproducible builds are guarding against dynamic attacks within build infrastructure. However I just don't see those happening. Compromised source dependencies are a 100x more common problem.
No, it wouldn't. The xzutils attacker compromised the source repository. The build pipeline portions were used to obscure the purpose of the exploit embedded in the source code repository.
Your wrong. It was both. The payload was embedded in the binary blob test file. The mechanism to pull it into the build was added to the release tarball only.
Here's the quote from the guy that discovered it in the initial public disclosure [1]:
After observing a few odd symptoms around liblzma (part of the xz package) on Debian sid installations over the last weeks (logins with ssh taking a lot of CPU, valgrind errors) I figured out the answer. The upstream xz repository and the xz tarballs have been backdoored. At first I thought this was a compromise of debian's package, but it turns out to be upstream. One portion of the backdoor is *solely in the distributed tarballs* and debian's import of the tarball ... it is also present in the tarballs for 5.6.0 and 5.6.1.
You're mistaking a compromised build pipeline versus a compromised source repo that only triggers in some build pipelines. You can do reproducible builds from compromised source tarballs. Nothing about reproducible builds necessarily requires source control. Yes, if some people who built from source control compared their builds to the builds from the tarballs it could detect the xzutils compromise. However I have yet to see a reproducible build project that includes such cross-build checks.
Nowadays you would work in git and then you would be able to easily detect any discrepancy between the upstream tar ball and the upstream source imported via git. But yes, better support for securing more of the process is needed.
In xz-utils hack the attacker slipped changes into the Github release tarball that were not present in the Github version / git commit history. The Debian maintainer built from the release tarball instead of just pulling from the git repo directly. Shouldn't have been doing that but good luck convincing him not to use the workflow he's been using for the last X years (I tried). With repro builds we can clone the git directly confirm we get the same build.
I agree that compromised source dependencies are the bigger problem, but that doesn't mean a compromised build infrastructure isn't. Just this last week, we had two Linux kernel LPEs that could have been leveraged to implement just such an attack, for example.
Another thing to consider is that Debian has quite a few derivatives who may also rebuild packages from source, so you have a multiplier there.
I know why they are useful. I am arguing they are waste of time for effort involved.
Forcing devs to use hardware keys to sign commits/CI requests would be actual security improvement, thwarting many supply chain attacks that only worked coz the attacker got to developer credentials. Hardware keys at least have option to make some operations require physically pressing the key so there is chance developer will notice.
But thanks to reproducible builds, at least someone can... validate that the binary code of vulnerable package can be reproduced. Very fucking useful.
I am not saying it is useless. I am saying it is one of highest hanging fruits on security tree.
A hardware key does not help if the developer's machine is compromised, as there is no change to understand what is signed anymore, or do you think the hardware key will show all the source code on its little display before signing?
With reproducible builds, you do not need to trust that the system that build the binary was not compromised, because this would be detected immediately.
Source compromises are still an issue, but there is a much bigger change that they are detected. Also if there is a compromise, reproducible builds allow you to later track it to the source. For an infected binary it is much more difficult to understand how it got there and what else might be compromised.
The way at least YK works is that you can set it up that pressing a key is requires for signing so at least "silently steals creds and sends malicious code" case (which is vast majority of compromises) gets handled
> Also if there is a compromise, reproducible builds allow you to later track it to the source.
They do not. Git log and build logs allow for that.
Reproductive builds only have value after the source. They protect build servers from being compromised (and then only if some other uncompromised environment is also running verification passes), if the bug is at source reproductive builds are exactly as valuable as writing commit that was used for build in app's code/package metadata.
If your compiler (or other tool or automatic build environment) is compromised and inserts a backdoor in the binary during building, the fact that you need to hold a key while signing or not is completely irrelevant.
git log and build logs do not help you at all, if you can not even determine that the compromised binary has any relation to the build log or the source you may want to look it. This is what reproducible builds give you. You are right that it does not protect against compromised sources.
> I know why they are useful. I am arguing they are waste of time for effort involved.
Not being reproducible is a bug.
There is no reason for a build to not be reproducible, but somehow we let the built binaries become infested with timestamps, login names and file system paths. We recently moved to reproducible builds at work, and discovered that our login names and local home directory paths were being shipped in every release. No one was was very happy about leaking PPI like that.
You may not consider it worth the effort, but you aren't the one putting in the effort so I'm not sure why that matters to you. It is very much worth the effort to those people doing the work. Debian is a do-orcracy and so the people doing the work get to make the decisions.
Sure. The site appears to be a bunch of warm-fuzzies that could apply to almost any other measure you take, it's nothing specific to reproducible builds. As the original poster said "There was no bug or attack on Debian since 2007 that reproducible packages would prevent". In fact, it could be argued that reproducible builds lead to a reduction in security, not an improvement: They give an attacker an exact fixed memory layout for all of the binaries, so if you develop something like a ROP exploit for a copy on your local system you know that exploit will work on every other system as well because the binary layout is identical. It's a perfect monoculture where everything is vulnerable to the same exploit. It seems to have been something created by geeks to impress other geeks, without any considerations for whether it has any actual benefit or not.
This comment is misinformed. Non-deterministic builds would also result in one tarball redistributed to all distro users. The ROP exploits don't work because of ASLR.
To move away from organizational dependence, there should be an installable project for debian where I can dedicate some configurable small percentage of my compute when idle to reproducibly building debian components to make a robust verification system, starting with the most critical code.
Obviously, it would be a ton of work to make such a system resistant to gaming by malicious actors (see GNU Guix for useful efforts), but it would provide valuable diversity in architecture and (political or other) control.
It would be even cooler if we could have independent projects that could run on various distros and OS, and build packages for any of them. Having packages for bsd verified on linux and vice-versa with statistical logging (this code has been verified x times on y OSes) would be reassuring.
I don't know of anything Ubuntu is doing that is significantly beyond what Debian is doing in this regard, nor that they have a distributed reproduction system set up???
Reproducible builds are applicable not only to respond to ‘attacks’, a subject you seem to be bikeshedding, but also for other reasons too.
Anyone having to maintain a code base or a distributed fleet of devices will gain from this decision, immensely, as their operational periods come and go.
Reproducible builds are about longevity as much as they are about security.
Please don’t make bold claims about ‘no reason and little benefit’ while demonstrating ignorance of this hard fact: reproducible builds should have been the norm, in computing, from the get-go.
I don't think they do, actually. Longevity sounds good, but in reality anything that's old probably has critical security holes and so you shouldn't use it anyway.
I've long ago realized that archival needs to be a separate task left to archivists and archive systems. If you take it into account when designing a live system it's liable to seriously compromise your system design.
Say you're making a chat app - you wouldn't incorporate a delete feature, and you might be tempted to use some kind of blockchain to prove all messages were delivered without gaps. But if you ignore archival needs you design something similar to IRC which is much simpler.
> Anyone having to maintain a code base or a distributed fleet of devices will gain from this decision, immensely, as their operational periods come and go.
Just baking in build ID and commit is enough. What you think reproducible builds add ?
> Please don’t make bold claims about ‘no reason and little benefit’ while demonstrating ignorance of this hard fact: reproducible builds should have been the norm, in computing, from the get-go.
So far not a single person in the thread gave me concrete example (as in existing project, existing problem, no other solution can solve it). Just claiming it's better based on their feelings. Come on, be the first one.
I already gave you an example, you dismissed it because you know better, but it is clear that you haven’t thought this through from the perspective of systems designers who have to deploy a base OS, with expected lifetimes of years, across a large fleet of devices.
Think industrial applications, such as rail and heavy industry transportation. We use reproducible builds here as part of a wider safety-critical protocol which guarantees that what we are running is what we expect to run - nothing more, nothing less.
Reproducible builds are certifiable. They can be relied on in environments where certification costs millions and takes years.
The xz hack was still reproducible, because it was included in the distribution archive which did not match the upstream source -- even then, it was so obfuscated it likely would have gone unnoticed, but nevertheless it only lived in the uploaded tarball and not in the repo. Reproducibility is a good thing, but the next step is build provenance.
Still, lots of good non-security benefits to reproducible builds too.
The xz utils compromise is a very good example... of why reproducible packages doesn't actually solve anything security-wise!
The backdoor relied first on a difference between building a package in a packaging environment versus building the package on your own. And also, it relied on the very common practice of checking in unreviewable artifacts into the source tree (e.g., the configure script, malicious binary test artifacts).
Reproducible builds guarantee that two people can follow the same instructions and get the same, bit-identical outcome. It does nothing to guarantee that those instructions have not been compromised, and all of the great packaging security failures of my lifetime that I can think of have relied on those instructions being compromised (e.g., xz utils, Debian OpenSSL keygen issues).
An attack would be far easier without reproducible packages. One could upload a compromised binary to debian by becoming a debian developer, blackmail a debian developer to so, or compromise the computer of a debian developer or the distribution.
At the time of xz attack, the package was already reproducible.
I'll give an analogy to email and spam. A lot of effort has been spent making sure that if an email is from x@example.com, it actually came from x@example.com, giving us things like SPF, DKIM, and DMARC. And it turns out that the most eager adopters of the newest technology are... the spammers themselves! Because they don't need to lie about their email address; they can have that be completely honest, and instead resort to other tricks to mislead users as to who they are (e.g., the display name, which most email clients blindly trust and happily display).
Similarly for package managers, the biggest issues are typo-squatting or maintainer credentials compromise. And in neither case does the attacker have any incentive to take advantage of it in a way that breaks reproducibility--they can be completely honest about what they're doing. Now even if I were an attacker who had compromised a maintainer's machine, I'd still probably go for compromising the source rather than compromising the final artifact-generation process... simply because compromising the source makes the exploit live longer.
As xz shows, once you have a compromised maintainer, there's basically nothing you can do to fix it except by having someone else discover the compromise and locking out the maintainer.
Typo-squatting attacks are more of an issue for non-curated software collections, not so much for Debian. If you use npm or cargo or similar, then you have indeed far bigger worries. Compromising the source has the disadvantage for the attacker that it much easier to detect. Again, if you always install the newest things from an non-curated collection that may not matter much, but for something such as Debian, this increases the probability of detection a lot. One can argue that xz shows that it is possible to hide things in the source, but it also shows how much effort was needed to do this. (and the xz package was reproducible, so compromising the debian system and uploading a binary would have a high risk of detection. That this was not done can therefor not serve as evidence that binary attacks are not an issue. )
> There was no bug or attack on Debian since 2007 that reproducible packages would prevent.
I'm reading this as a suggestion that the reproducible builds effort was an ineffective deterrent.
However, note that your observation could also be explained by the opposite: the reproducible builds effort was an effective deterrent, so nobody bothered with attempts.
> And it just ups the the contribution barrier to Debian higher
Until yesterday, the package just got flagged in the tracker, and you could either ignore it, or fix it yourself, or the kind people behind the reproducible builds effort supplied a patch themselves.
Now, you can no longer ignore it. But fixes are often trivial. Use a (stable) timestamp provided by the build, seed RNGs with some constant (instead of eg: time), etc. These are best practices anyway.
> However, note that your observation could also be explained by the opposite: the reproducible builds effort was an effective deterrent, so nobody bothered with attempts.
There was no attack that reproducible builds would help protect from before 2007 either.
> Until yesterday, the package just got flagged in the tracker, and you could either ignore it, or fix it yourself, or the kind people behind the reproducible builds effort supplied a patch themselves.
> Now, you can no longer ignore it. But fixes are often trivial. Use a (stable) timestamp provided by the build, seed RNGs with some constant (instead of eg: time), etc.
that's the entirety of the problem. App developers don't want to be package experts or build experts.
> These are best practices anyway.
They are not. They are best practices if you want reproducible builds. They are entirely useless waste of time if you don't care.
> They are not. They are best practices if you want reproducible builds.
Or if you're writing a test suite, and you want failing test results to be actionable.
Or you have any other type of behavior that you'd like to reproduce somehow.
One of the first things app developers ask for in bug/issue templates are the steps to reproduce something. I wonder why you'd think that they would suddenly be opposed to the concept when thinking of a build peocess.
> Or if you're writing a test suite, and you want failing test results to be actionable.
The class of bugs would be extremely small as the stuff that makes build hard to reproduce are 99% of the time stuff irrelevant to runtime like some build time embedded in binary, some file metadata having different timestamp, or maybe linker putting stuff in a bit different order.
> One of the first things app developers ask for in bug/issue templates are the steps to reproduce something. I wonder why you'd think that they would suddenly be opposed to the concept when thinking of a build peocess.
I think you will find amount of people that had problems reproducing because of non-100% exact build is vanishingly small, possibly non-existent.
And that is because if you get package version and want to reproduce it, you get the package, install it and try to reproduce it. The package WILL be 100% the same as the one you got in bug report because you both downloaded the same artifact from same mirror network. You don't need reproducibility to get same binary to reproduce bug
https://wiki.debian.org/ReproducibleBuilds has some more infos; some is outdated, but it also has a chart showing how many packages are built in the CI, and how many of those are reproducible builds.
(Orange = FTBR = "failed to build reproducibly")
I'm not good at reading numbers from charts, but I'd guess it's a few percent (4-5ish?).
As pointed in your link, NetBSD achieved this with some help from Debian. If I understand correctly, it's not that NetBSD tried harder, it's that their problem was easier: fewer packages which change less (they still use CVS, "stability" is an understatement!).
BTW, most Debian packages have reproducible builds. Those which have not (I'd say 5%) are shown in orange in the graph there: https://wiki.debian.org/ReproducibleBuilds
Also, the *BSD are structured somewhat differently to a Linux distro.
It's not like the Linux world where you have distinct projects like the Kernel, GNU, OpenSSL, and then it's the distributions job to assemble everything.
In the BSD projects, the scope is developing and distributing an entire base system, i.e., the kernel but also the libc, the shell/all posix utilities, and a few third parties like OpenSSH (which are usually "softforked").
Additional packages you could get from pkg_in/pkgsrc (NetBSD), pkg-ng/ports (FreeBSD) or pkg_add (OpenBSD) are clearly distinct from the base system, installed in a dedicated subtree (/usr/src in NetBSD, /usr/local/ OpenBSD/FreeBSD), and provided in a best effort manner.
The reproducible build target was almost certainly only for the base system, which is a few percent of what Debian tries to achieve, and on which NetBSD has a tighter control over (developer + distributor instead of downstream assembler+distributor).
A reproducible base system is useful, but given how quickly you typically need to install packages from pkgsrc, it's not quite enough.
While we are bragging, stagex was the first to hit 100% full source bootstrapped deterministic and hermetic builds last year and the first to make multiple signed reproductions by different maintainers on their own hardware mandatory for every release.
Debian has come along way, but when Debian says reproducible they mean they grab third party binaries to build theirs. When we say reproducible we mean 100% bootstrapped from source code all the way through the entire software supply chain.
Unfortunately, the term “reproducible” can be interpreted in many ways because there is no strict and complete definition. People and projects bend it to their liking.
I'm so happy to see this change. I got involved with reproducible builds in 2021 after reading in horror about the SolarWinds attack. [1]
I think Magnus Ihse Bursie said it best while working on reproducible builds of OpenJDK: "If you were to ask me, the fact that compilers and build tools ever started to produce non-deterministic output has been a bug from day one." [2]
I wonder why this is a thing nowadays. I use yocto for embedded devices and it was almost a no-brainer to implement reproducible builds. I can also easily enable Debian package management, so everything is already available.
Reproducible builds are an essential method in industrial computing - Debian isn’t at the forefront of this, it is merely adopting industry wide techniques also applied to other operating systems in use in long-term and safety-related applications.
Certainly, a lot of the hard work of the Yocto and Debian developers is already in your hands.
What is interesting is that this is now being applied in a more forward-focused policy by the Debian developers, that it will now be the norm rather than an option…
I am always surprised Debian are leading this and not the commercial vendors. You'd think big organisations paying for RHEL and Ubuntu would be beating down the door for verifiable binaries.
If a competitor can prove that their packages are bit-for-bit identical to what a big organization is shipping, that allows the competitor to benefit from the security assurances of the big org. This is great for software freedom, not so great for wannabe monopolists.
Unfortunately, many of these "protections" don't know what is a bot or a human. Many clueless websites are often just blocking huge swaths of legitimate readers and customers.
Why would you block access to a static page, even Bots, what's the point? I'm not a bot, very typical non-privacy setup (Firefox, Linux, VPN) for personal usage.
It does work with my privacy/scrapping setup (residential proxy, spoofed fingerprints, Qubes and so on), great job debian.
What people really don't understand about reproducible builds is that they're not a guarantee that there's no backdoor.
They're a guarantee that if there's a backdoor, it's reproducible 100% of the time.
This is a godsend for white hats fighting the good fight.
And, as a side note, it's strongarming vs the bad guys: "Would be too bad if we could reproduce your shiny exploit 100% of the time wouldn't it!?".
Note that we should go further (but it's a bit orthogonal to reproducible builds): builds of the final binary/package should happen by first entirely discarding all files not necessary for the final build (like all test cases and all test assets). The build should literally happen in an environment that gets rid of those (after, of course, having test in another environment that all tests cases succeed): if I'm not mistaken get rid of test assets would have stopped Jia Tan's XZ backdoor attempt dead in its track (for example). Because IIRC there were binary data part of the backdoor hidden in some asset only used by test cases.
P.S: as a bonus they also allow to detect bit-flips (I'm not saying there aren't other ways to detect bit-flips: what I'm saying is that if you have deterministic builds anyway and something doesn't reproduce correctly due to a flipped-bit, it's going to be noticed).
This fights against "opensource-washing" which is the practice of large companies claiming to release open source code, but the compile takes so long (as well as being overly-convoluted) that most people and many distros can't afford to maintain the package.
It feels like AI and traditional software are converging in complexity.
Has anyone fought Microsoft Visual Studio successfully to produce reproducible builds of C++ programs? From what I have heard, it is one of the worst contexts to do it.
It's that RICH header that you need to exclude. I just tested my copy of MSVC 2019, and `/emittoolversioninfo:no` will exclude the RICH header from the binary. Supposedly also works in MSVC 2022.
The build timestamps in the PE header and export table are also a problem as well.
”Optimize the code for 5 seconds”, as many compilers, including vc++ on windows did, was probably one of the dumbest thing ever invented. It meant that the binaries became more optimized when building on faster computers.
As someone who recently spent a lot of time on making a large C++ program entirely reproducible on 4 different OS’es, one cannot understate just how many tiny details matter here.
it's funny that as a non-native speaker, I have to check with Gemini about how "cannot overstate" is used
I also asked Gemini whether we express ourselves that way in my mother tongue (Mandarin), and yes, we do, but it came off as being too formal way of speaking. We don't normally use it (I'm not from China/Taiwan though)
That's cool but I'm honestly a bit disappointed in how apt refused to embrace/support both the container and AI/GPU aspects of computing. Are we going to see some changes there?
Those seem like unrelated things? I can imagine ways for apt to integrate with containers, but what would it possibly do for AI or GPU other than delivering packages like it already does?
since there is no other way to reach you please allow me to use this off topic message to let you know that there is a response to your comments on the gnupg discussion from two weeks ago.
So much time has been wasted on reproducible builds which could have better spent on securing more important parts of Debian. Practically minor changes like a build timestamp being different is not an issue.
Anyone can verify the actual code in the binary matches even if some bytes within the binary file itself are different. The verification routine doesn't have to be a basic bit for bit equality test.
the idea that debian has a few million dollars to spare creates the assumption, that even if they would have... you would either not know how to fix issues, or not worth it.
Debian, like any other legacy distro, mush became declarative, because the '80s model of manual deploy and the absurd pain of D/I and Preseed must end.
In the end, Nix is just a thin veneer on this stuff.
Given how many quick & dirty sed patching or exec commands I've seen in the few nix package/modules I've read, I would not exactly bet my life on it being completely idempotent & reproducible.
it's the best option after IllumOS (OpenSolaris) IPS integrated with ZFS. Far less powerful not imposing zfs (only well supported for root, swap, encryption etc), so not integrated in the package system and bootloader management (BEs, Boot Environments).
It's not reproducible bit by bit, it fetch the current version of anything, but it's still easy to reproduce enough, stable enough and complete enough, while classic distros need a fresh install every major release or facing issues an keeping a system in unknown state for long until it explode.
Debian has had a better "software supply chain" posture than any other player in the ecosystem since before the turn of the century. While we all face the risk of malware from upstream, Debian is the least at risk of being affected by it. See for example the stream of issues from npm et al. None of it has affected Debian.
It's npm that's affected, therefore it's not even considered when choosing language/ecosystem for writing distro tools. You'll find no sane distro writing package manager in javascript precisely to avoid this joke of a supply chain.
I quite like the OpenBSD approach to Go and Rust projects in ports. They store all the dependencies and their hashes in the build recipe, not trusting the project ones. And they’re more readable.
Here is jujutsu’s list of dependencies[0] and their hashes[1]. As an aside, that’s why I don’t like those packages managers. Something like Python’s numpy or lib curl, get sliced into atomic portions.
“Hasn’t happened” is quite naive. It happens internally - putting unscrupulous code in a company’s distro before torching the place is a surprisingly regular occurrence in places which have long since adopted Debian as a platform host. IT departments around the globe will benefit from this immensely.
The one single fail point they prevent is infected build hosts.
That might be some reasonable benefit for the company if it is building it on public architecture, but for projects like Debian that insist build hosts are basically offline (package in, package out with no internet access during build process) it is very fringe benefit.
This question is meaningless. Attackers will pick the best attack if they have more at their disposal. The fact that they didn't push a commit shows it's better not to. So closing that attack is good.
There is meaning. The difference in detection time does have meaning. If the improvement of detection time was marginal there may have been a different project time could have been invested in to make it even faster to catch such things than reproducible builds.
Why should it only be valuable if the effects were to be publicly known?
There are plenty of places in industrial computing where reproducible builds have prevented subterfuge within the organizations themselves. Injecting binaries to do inf-/exfiltration is a long-standing industrial espionage activity which is of immense value to all users of the operating system - not just the consumer users.
My magic beans have prevented thousands of tiger attacks in top secret underground moon bases, never you mind that there's no way for me to actually prove this.
There's a certain irony in pushing for verifiable builds with completely unverifiable claims.
I've worked at several of the biggest targets for espionage, industrial or otherwise, and to the best of my knowledge, the only thing that's ever been discovered by their reproducible build efforts has been failing hardware on build reproducers
You probably don’t have enough experience with professional enterprise IT departments. Rootfs audits are a thing made a lot easier, and more effective, with reproducible builds.
Maybe not by itself, but it does allow for the ecosystem to be audited, in a way that ultimately benefits the end-user. It really is an important part of a healthy supply chain.
no problem in Debian since the start of the effort would be solved by reproductible builds
This is nice pat yourself on the back achievement for people that prefer security theatre and checking boxes than doing something actually useful, and they wasted thousands man hours of poor victims that had to implement it
That's not what reproducible builds aim to prevent, and no one claims that. When upstream pushes bad code, that's on upstream.
The thing reproducible builds aim to prevent is Debian or individual developers and system administrators with access rights to binary uploads and signing keys to get forced to sign and upload binary packages by attackers - be these governments (with or without court orders) or criminal organizations.
As of now, say if I were an administrator of Debian's CI infrastructure, technically there would be nothing preventing me from running an "extra" job on the CI infrastructure building a package for openssh with a knock-knock backdoor, properly signing it and uploading it to the repository. For someone to spot the attack and differentiate it, they'd have to notice that there is a package in the repository that has no corresponding build logs or has issues otherwise.
But with reproducible builds, anyone can set up infrastructure to rebuild Debian packages from source automatically and if there is a mismatch with what is on Debian's repository, raise alarm bells.
Reproducible builds shows that, within a specific configuration, the code produced the binary, regardless of who signed or published it.
Indeed, this could mitigate an attacker replacing the binary with something that's not produced from the code, but it does not mitigate the tool chain or code itself containing the exploit, creating a malicious binary.
Well - reproducible also means code guarantee. It may not improve an end-user experience directly, but you get an extra quality control step, as guarantee, here. I think reproducibility is great. If we can achieve that, it should be achieved. See also NixOS; it can guarantee that snapshot xyz works, not just for one user, but ALL users. I see it as hopping from guarantee to guarantee. That's actually a good thing in the long run. Just think differently here.
This is some of the best news I've heard recently when it comes to figuring out how to produce high quality Software Bills of Materials for the upcoming EU Cyber Resilience Act, for what it's worth. Reproducible packages are actually worth a great deal when you are selling products with digital elements. Much easier to scan through, audit, etc. with confidence.
If you find yourself holding opinions of the kind: "If it can't be made perfect, it shouldn't be changed at all?" you may want to consider that most things that work well today were incrementally improved.
Reproducable builds are not solving all issues as you rightly observed, but they can be a stepping stone (or even a pre-condition) for further measures.
You could already do that since Debian cryptographically signs all its package indexes, and the indexes contain the hash of all packages. The additional guarantee that reproducible builds bring is that you can re-build the packages in your own controlled environment and verify that the resulting package is bit-for-bit identical to what Debian offers.
This is a huge achievement for Debian and the free software world.
It took a while though until this was understood. In 2007 when pointing out on debian-devel that this is needed, I was still told what huge waste of time this would be. And indeed it took a huge amount of work by many people to get there, but it is well worth it.
There was no bug or attack on Debian since 2007 that reproducible packages would prevent.
"Well worth it" is not correct. And it just ups the the contribution barrier to Debian higher, I already heard a lot of people complaining that contributing to Debian is hard and while in past I defended it by "they need all the checks and bounds to make sure packages play with eachother nicely", this is just step that makes it hard for no reason and little benefit.
There was perhaps no detected bug or attack. There have most likely been bugs or attacks that reproducible builds would have prevented.
And you base it on what exactly ? It's "just" making sure the build process is always ordered.
If anything it will make attacker's job easier, as Ubuntu package will have same files structured exactly same way as Debian one.
> as Ubuntu package will have same files structured exactly same way as Debian one.
As opposed to what? If Ubuntu uses the same source, of course they get the same binaries. And if Ubuntu applies patches, they'll get something different. And that's still true.
There have most likely been bugs or attacks that reproducible builds would have prevented.
Like what exactly?
” If you are wondering why we are doing this at all, then hopefully the Reproducible Builds website will explain why this is useful.”
https://reproducible-builds.org/
Could you perhaps respond to the argumentation here?
(Not OP, but...) I still fail to see the current value in confirming that a reproducing builder also included the same compromised dependency that I did when I built it. I understand that reproducible builds are guarding against dynamic attacks within build infrastructure. However I just don't see those happening. Compromised source dependencies are a 100x more common problem.
https://en.wikipedia.org/wiki/XZ_Utils_backdoor
https://news.ycombinator.com/item?id=48083768
that's example of attack reproducible packages do not protect from, why you are linking it ?
A distro automatically verifying that installed packages are reproducible would protect the user?
No, it wouldn't. The xzutils attacker compromised the source repository. The build pipeline portions were used to obscure the purpose of the exploit embedded in the source code repository.
Your wrong. It was both. The payload was embedded in the binary blob test file. The mechanism to pull it into the build was added to the release tarball only.
Here's the quote from the guy that discovered it in the initial public disclosure [1]:
[1]: https://www.openwall.com/lists/oss-security/2024/03/29/4
You're mistaking a compromised build pipeline versus a compromised source repo that only triggers in some build pipelines. You can do reproducible builds from compromised source tarballs. Nothing about reproducible builds necessarily requires source control. Yes, if some people who built from source control compared their builds to the builds from the tarballs it could detect the xzutils compromise. However I have yet to see a reproducible build project that includes such cross-build checks.
Nowadays you would work in git and then you would be able to easily detect any discrepancy between the upstream tar ball and the upstream source imported via git. But yes, better support for securing more of the process is needed.
> Yes, if some people who built from source control compared their builds to the builds from the tarballs it could detect the xzutils compromise.
Good. Then we are on the same page.
In xz-utils hack the attacker slipped changes into the Github release tarball that were not present in the Github version / git commit history. The Debian maintainer built from the release tarball instead of just pulling from the git repo directly. Shouldn't have been doing that but good luck convincing him not to use the workflow he's been using for the last X years (I tried). With repro builds we can clone the git directly confirm we get the same build.
I agree that compromised source dependencies are the bigger problem, but that doesn't mean a compromised build infrastructure isn't. Just this last week, we had two Linux kernel LPEs that could have been leveraged to implement just such an attack, for example.
Another thing to consider is that Debian has quite a few derivatives who may also rebuild packages from source, so you have a multiplier there.
I know why they are useful. I am arguing they are waste of time for effort involved.
Forcing devs to use hardware keys to sign commits/CI requests would be actual security improvement, thwarting many supply chain attacks that only worked coz the attacker got to developer credentials. Hardware keys at least have option to make some operations require physically pressing the key so there is chance developer will notice.
But thanks to reproducible builds, at least someone can... validate that the binary code of vulnerable package can be reproduced. Very fucking useful.
I am not saying it is useless. I am saying it is one of highest hanging fruits on security tree.
A hardware key does not help if the developer's machine is compromised, as there is no change to understand what is signed anymore, or do you think the hardware key will show all the source code on its little display before signing?
With reproducible builds, you do not need to trust that the system that build the binary was not compromised, because this would be detected immediately.
Source compromises are still an issue, but there is a much bigger change that they are detected. Also if there is a compromise, reproducible builds allow you to later track it to the source. For an infected binary it is much more difficult to understand how it got there and what else might be compromised.
The way at least YK works is that you can set it up that pressing a key is requires for signing so at least "silently steals creds and sends malicious code" case (which is vast majority of compromises) gets handled
> Also if there is a compromise, reproducible builds allow you to later track it to the source.
They do not. Git log and build logs allow for that.
Reproductive builds only have value after the source. They protect build servers from being compromised (and then only if some other uncompromised environment is also running verification passes), if the bug is at source reproductive builds are exactly as valuable as writing commit that was used for build in app's code/package metadata.
If your compiler (or other tool or automatic build environment) is compromised and inserts a backdoor in the binary during building, the fact that you need to hold a key while signing or not is completely irrelevant.
git log and build logs do not help you at all, if you can not even determine that the compromised binary has any relation to the build log or the source you may want to look it. This is what reproducible builds give you. You are right that it does not protect against compromised sources.
> I know why they are useful. I am arguing they are waste of time for effort involved.
Not being reproducible is a bug.
There is no reason for a build to not be reproducible, but somehow we let the built binaries become infested with timestamps, login names and file system paths. We recently moved to reproducible builds at work, and discovered that our login names and local home directory paths were being shipped in every release. No one was was very happy about leaking PPI like that.
You may not consider it worth the effort, but you aren't the one putting in the effort so I'm not sure why that matters to you. It is very much worth the effort to those people doing the work. Debian is a do-orcracy and so the people doing the work get to make the decisions.
Sure. The site appears to be a bunch of warm-fuzzies that could apply to almost any other measure you take, it's nothing specific to reproducible builds. As the original poster said "There was no bug or attack on Debian since 2007 that reproducible packages would prevent". In fact, it could be argued that reproducible builds lead to a reduction in security, not an improvement: They give an attacker an exact fixed memory layout for all of the binaries, so if you develop something like a ROP exploit for a copy on your local system you know that exploit will work on every other system as well because the binary layout is identical. It's a perfect monoculture where everything is vulnerable to the same exploit. It seems to have been something created by geeks to impress other geeks, without any considerations for whether it has any actual benefit or not.
This comment is misinformed. Non-deterministic builds would also result in one tarball redistributed to all distro users. The ROP exploits don't work because of ASLR.
Reproducible builds reduce the need for trusted parties.
Have many organizations produce the binaries independently and post the arifacts.
Once n of m parties agree on the arifact hash, take that as the trusted build.
If every party reaches a different hash then we cannot build consensus.
To move away from organizational dependence, there should be an installable project for debian where I can dedicate some configurable small percentage of my compute when idle to reproducibly building debian components to make a robust verification system, starting with the most critical code.
Obviously, it would be a ton of work to make such a system resistant to gaming by malicious actors (see GNU Guix for useful efforts), but it would provide valuable diversity in architecture and (political or other) control.
It would be even cooler if we could have independent projects that could run on various distros and OS, and build packages for any of them. Having packages for bsd verified on linux and vice-versa with statistical logging (this code has been verified x times on y OSes) would be reassuring.
I think that project is called Ubuntu.
Building Ubuntu does not produce identical binaries to Debian, so no, that's not what they're asking for
I don't know of anything Ubuntu is doing that is significantly beyond what Debian is doing in this regard, nor that they have a distributed reproduction system set up???
It makes shipping backdoors a whole lot harder, yes.
Hmm, it prevents Trojan binaries which is a small subset of backdoor IMHO.
Defense in depth obviously is a good thing
Unless someone spins entirely separate infrastructure dedicated just for verifying Debian packages, it doesn't.
Reproducible builds are applicable not only to respond to ‘attacks’, a subject you seem to be bikeshedding, but also for other reasons too.
Anyone having to maintain a code base or a distributed fleet of devices will gain from this decision, immensely, as their operational periods come and go.
Reproducible builds are about longevity as much as they are about security.
Please don’t make bold claims about ‘no reason and little benefit’ while demonstrating ignorance of this hard fact: reproducible builds should have been the norm, in computing, from the get-go.
I longevity is harmed though. Your certs need to expire in a few years we think that your toolchain will not be downloadable.
Those problems need to be solved as well.
I don't think they do, actually. Longevity sounds good, but in reality anything that's old probably has critical security holes and so you shouldn't use it anyway.
A warning is sufficient. Old tech should continue to work, for preservation and archival reasons.
just archiving binary artifacts and source packages is enough
Reproducibility adds nothing here
It sure does: no need to keep the binaries around if they are reproducible.
I've long ago realized that archival needs to be a separate task left to archivists and archive systems. If you take it into account when designing a live system it's liable to seriously compromise your system design.
Say you're making a chat app - you wouldn't incorporate a delete feature, and you might be tempted to use some kind of blockchain to prove all messages were delivered without gaps. But if you ignore archival needs you design something similar to IRC which is much simpler.
You’re not thinking like an industrial user but rather as a consumer. Maybe you should extend your scope a little bit.
> your toolchain will not be downloadable.
Why not? Debian has a fantastic track record of providing old versions, for instance here's the build tools from Debian 2.0 from 1998:
https://mirrors.accretive-networks.net/debian-archive/debian...
> Anyone having to maintain a code base or a distributed fleet of devices will gain from this decision, immensely, as their operational periods come and go.
Just baking in build ID and commit is enough. What you think reproducible builds add ?
> Please don’t make bold claims about ‘no reason and little benefit’ while demonstrating ignorance of this hard fact: reproducible builds should have been the norm, in computing, from the get-go.
So far not a single person in the thread gave me concrete example (as in existing project, existing problem, no other solution can solve it). Just claiming it's better based on their feelings. Come on, be the first one.
I already gave you an example, you dismissed it because you know better, but it is clear that you haven’t thought this through from the perspective of systems designers who have to deploy a base OS, with expected lifetimes of years, across a large fleet of devices.
Think industrial applications, such as rail and heavy industry transportation. We use reproducible builds here as part of a wider safety-critical protocol which guarantees that what we are running is what we expect to run - nothing more, nothing less.
Reproducible builds are certifiable. They can be relied on in environments where certification costs millions and takes years.
Think outside your consumer box for a minute.
Is the "Jia Tan" XZ Utils compromise not a good example? That relied on code snuck into a release that was not in source.
(It was caught before being promoted into a stable Debian release, yes, but this sort of relied on a happy accident, too close for comfort)
The xz hack was still reproducible, because it was included in the distribution archive which did not match the upstream source -- even then, it was so obfuscated it likely would have gone unnoticed, but nevertheless it only lived in the uploaded tarball and not in the repo. Reproducibility is a good thing, but the next step is build provenance.
Still, lots of good non-security benefits to reproducible builds too.
The xz utils compromise is a very good example... of why reproducible packages doesn't actually solve anything security-wise!
The backdoor relied first on a difference between building a package in a packaging environment versus building the package on your own. And also, it relied on the very common practice of checking in unreviewable artifacts into the source tree (e.g., the configure script, malicious binary test artifacts).
Reproducible builds guarantee that two people can follow the same instructions and get the same, bit-identical outcome. It does nothing to guarantee that those instructions have not been compromised, and all of the great packaging security failures of my lifetime that I can think of have relied on those instructions being compromised (e.g., xz utils, Debian OpenSSL keygen issues).
No, reproducible builds make such backdoors more difficult to sneak in together with other checks.
An attack would be far easier without reproducible packages. One could upload a compromised binary to debian by becoming a debian developer, blackmail a debian developer to so, or compromise the computer of a debian developer or the distribution.
At the time of xz attack, the package was already reproducible.
I'll give an analogy to email and spam. A lot of effort has been spent making sure that if an email is from x@example.com, it actually came from x@example.com, giving us things like SPF, DKIM, and DMARC. And it turns out that the most eager adopters of the newest technology are... the spammers themselves! Because they don't need to lie about their email address; they can have that be completely honest, and instead resort to other tricks to mislead users as to who they are (e.g., the display name, which most email clients blindly trust and happily display).
Similarly for package managers, the biggest issues are typo-squatting or maintainer credentials compromise. And in neither case does the attacker have any incentive to take advantage of it in a way that breaks reproducibility--they can be completely honest about what they're doing. Now even if I were an attacker who had compromised a maintainer's machine, I'd still probably go for compromising the source rather than compromising the final artifact-generation process... simply because compromising the source makes the exploit live longer.
As xz shows, once you have a compromised maintainer, there's basically nothing you can do to fix it except by having someone else discover the compromise and locking out the maintainer.
Typo-squatting attacks are more of an issue for non-curated software collections, not so much for Debian. If you use npm or cargo or similar, then you have indeed far bigger worries. Compromising the source has the disadvantage for the attacker that it much easier to detect. Again, if you always install the newest things from an non-curated collection that may not matter much, but for something such as Debian, this increases the probability of detection a lot. One can argue that xz shows that it is possible to hide things in the source, but it also shows how much effort was needed to do this. (and the xz package was reproducible, so compromising the debian system and uploading a binary would have a high risk of detection. That this was not done can therefor not serve as evidence that binary attacks are not an issue. )
"mimimimi".
Those people do not care about quality in opensource at all. For longliving software this is very important.
Of course, all those javascript and kubernetes packages which are irrelevant in a few years again, might complain, but let them complain.
> There was no bug or attack on Debian since 2007 that reproducible packages would prevent.
I'm reading this as a suggestion that the reproducible builds effort was an ineffective deterrent.
However, note that your observation could also be explained by the opposite: the reproducible builds effort was an effective deterrent, so nobody bothered with attempts.
> And it just ups the the contribution barrier to Debian higher
Until yesterday, the package just got flagged in the tracker, and you could either ignore it, or fix it yourself, or the kind people behind the reproducible builds effort supplied a patch themselves.
Now, you can no longer ignore it. But fixes are often trivial. Use a (stable) timestamp provided by the build, seed RNGs with some constant (instead of eg: time), etc. These are best practices anyway.
That’s a big logical fallacy, I’m not sure if that’s what you want to go with
> However, note that your observation could also be explained by the opposite: the reproducible builds effort was an effective deterrent, so nobody bothered with attempts.
There was no attack that reproducible builds would help protect from before 2007 either.
> Until yesterday, the package just got flagged in the tracker, and you could either ignore it, or fix it yourself, or the kind people behind the reproducible builds effort supplied a patch themselves.
> Now, you can no longer ignore it. But fixes are often trivial. Use a (stable) timestamp provided by the build, seed RNGs with some constant (instead of eg: time), etc.
that's the entirety of the problem. App developers don't want to be package experts or build experts.
> These are best practices anyway.
They are not. They are best practices if you want reproducible builds. They are entirely useless waste of time if you don't care.
> that's the entirety of the problem. App developers don't want to be package experts or build experts.
App developers and Debian package maintainers are already separate groups.
> They are not. They are best practices if you want reproducible builds.
Or if you're writing a test suite, and you want failing test results to be actionable.
Or you have any other type of behavior that you'd like to reproduce somehow.
One of the first things app developers ask for in bug/issue templates are the steps to reproduce something. I wonder why you'd think that they would suddenly be opposed to the concept when thinking of a build peocess.
> Or if you're writing a test suite, and you want failing test results to be actionable.
The class of bugs would be extremely small as the stuff that makes build hard to reproduce are 99% of the time stuff irrelevant to runtime like some build time embedded in binary, some file metadata having different timestamp, or maybe linker putting stuff in a bit different order.
> One of the first things app developers ask for in bug/issue templates are the steps to reproduce something. I wonder why you'd think that they would suddenly be opposed to the concept when thinking of a build peocess.
I think you will find amount of people that had problems reproducing because of non-100% exact build is vanishingly small, possibly non-existent.
And that is because if you get package version and want to reproduce it, you get the package, install it and try to reproduce it. The package WILL be 100% the same as the one you got in bug report because you both downloaded the same artifact from same mirror network. You don't need reproducibility to get same binary to reproduce bug
https://wiki.debian.org/ReproducibleBuilds has some more infos; some is outdated, but it also has a chart showing how many packages are built in the CI, and how many of those are reproducible builds.
(Orange = FTBR = "failed to build reproducibly")
I'm not good at reading numbers from charts, but I'd guess it's a few percent (4-5ish?).
all I get is this:
> Forbidden
> <p>You are not allowed to access this!</p>
(yes, with HTML tags on display) :)
EDIT: I also found a "I Challenge Thee" page in history. did I just get blocked by antibot measures? why???
Do you have JavaScript disabled? They put one of those anti-scraper things on it.
nope, it's enabled. I can pass Cloudflare, reCaptcha, whatever Microsoft is doing, and Annubis, but Debian caught me off-guard
Good thing. NetBSD has fully reproductible build since 2017. https://blog.netbsd.org/tnf/entry/netbsd_fully_reproducible_...
As pointed in your link, NetBSD achieved this with some help from Debian. If I understand correctly, it's not that NetBSD tried harder, it's that their problem was easier: fewer packages which change less (they still use CVS, "stability" is an understatement!).
BTW, most Debian packages have reproducible builds. Those which have not (I'd say 5%) are shown in orange in the graph there: https://wiki.debian.org/ReproducibleBuilds
Also, the *BSD are structured somewhat differently to a Linux distro.
It's not like the Linux world where you have distinct projects like the Kernel, GNU, OpenSSL, and then it's the distributions job to assemble everything.
In the BSD projects, the scope is developing and distributing an entire base system, i.e., the kernel but also the libc, the shell/all posix utilities, and a few third parties like OpenSSH (which are usually "softforked").
It's quite visible in the sources, it's a lot more than just a kernel: https://github.com/NetBSD/src
Additional packages you could get from pkg_in/pkgsrc (NetBSD), pkg-ng/ports (FreeBSD) or pkg_add (OpenBSD) are clearly distinct from the base system, installed in a dedicated subtree (/usr/src in NetBSD, /usr/local/ OpenBSD/FreeBSD), and provided in a best effort manner.
The reproducible build target was almost certainly only for the base system, which is a few percent of what Debian tries to achieve, and on which NetBSD has a tighter control over (developer + distributor instead of downstream assembler+distributor).
A reproducible base system is useful, but given how quickly you typically need to install packages from pkgsrc, it's not quite enough.
> it's not that NetBSD tried harder, it's that their problem was easier: fewer packages which change less
Maybe that's trying harder on design rather than trying to remedy the consequences later.
While we are bragging, stagex was the first to hit 100% full source bootstrapped deterministic and hermetic builds last year and the first to make multiple signed reproductions by different maintainers on their own hardware mandatory for every release.
Debian has come along way, but when Debian says reproducible they mean they grab third party binaries to build theirs. When we say reproducible we mean 100% bootstrapped from source code all the way through the entire software supply chain.
We think that distinction matters.
https://stagex.tools
That distro has smaller codebase than Debian Installer.
This!
Unfortunately, the term “reproducible” can be interpreted in many ways because there is no strict and complete definition. People and projects bend it to their liking.
Your approach is correct.
https://www.bootstrappable.org/
newcomers will always have it much easier. also guix i think also reached this.
also, stagex and others probably profited QUITE A LOT from the debian efforts, because they started to go upstream and talking to developers..
just arch linux profited from debian maintainers a decade before that an debian people asking upstream to improve...
Stage -1: `hexdump`, `xxd`, or whatever you use to write files to your filesystem.
A great milestone, congrats Debian on taking a stance and holding high standards for yourself, especially in the current era.
I'm so happy to see this change. I got involved with reproducible builds in 2021 after reading in horror about the SolarWinds attack. [1]
I think Magnus Ihse Bursie said it best while working on reproducible builds of OpenJDK: "If you were to ask me, the fact that compilers and build tools ever started to produce non-deterministic output has been a bug from day one." [2]
[1] https://www.linux.com/news/preventing-supply-chain-attacks-l...
[2] https://github.com/openjdk/jdk/pull/9152#issue-1270543997
I wonder why this is a thing nowadays. I use yocto for embedded devices and it was almost a no-brainer to implement reproducible builds. I can also easily enable Debian package management, so everything is already available.
What do you mean why is it a thing nowadays?
Reproducible builds are an essential method in industrial computing - Debian isn’t at the forefront of this, it is merely adopting industry wide techniques also applied to other operating systems in use in long-term and safety-related applications.
Certainly, a lot of the hard work of the Yocto and Debian developers is already in your hands.
What is interesting is that this is now being applied in a more forward-focused policy by the Debian developers, that it will now be the norm rather than an option…
Did you actively verify that your builds were bit-reproducible?
amd64 forky
reproduced: 97.02% good: 17586 bad: 511 fail: 30 unknown: 0
This, statistics for other architectures, and the reasons for unreproducibility can be found at https://reproduce.debian.net.
I am always surprised Debian are leading this and not the commercial vendors. You'd think big organisations paying for RHEL and Ubuntu would be beating down the door for verifiable binaries.
If a competitor can prove that their packages are bit-for-bit identical to what a big organization is shipping, that allows the competitor to benefit from the security assurances of the big org. This is great for software freedom, not so great for wannabe monopolists.
Reproducible builds exist to reduce the need for trust, while commercial vendors are in the business of selling trust.
Forbidden
You don't have permission to access this resource. Apache Server at lists.debian.org Port 443
:/
I can see it just fine; maybe an overzealous firewall thinks you're a bot? At any rate, the Wayback Machine has it: https://web.archive.org/web/20260510074120/https://lists.deb...
Unfortunately, many of these "protections" don't know what is a bot or a human. Many clueless websites are often just blocking huge swaths of legitimate readers and customers.
Why would you block access to a static page, even Bots, what's the point? I'm not a bot, very typical non-privacy setup (Firefox, Linux, VPN) for personal usage.
It does work with my privacy/scrapping setup (residential proxy, spoofed fingerprints, Qubes and so on), great job debian.
What people really don't understand about reproducible builds is that they're not a guarantee that there's no backdoor.
They're a guarantee that if there's a backdoor, it's reproducible 100% of the time.
This is a godsend for white hats fighting the good fight.
And, as a side note, it's strongarming vs the bad guys: "Would be too bad if we could reproduce your shiny exploit 100% of the time wouldn't it!?".
Note that we should go further (but it's a bit orthogonal to reproducible builds): builds of the final binary/package should happen by first entirely discarding all files not necessary for the final build (like all test cases and all test assets). The build should literally happen in an environment that gets rid of those (after, of course, having test in another environment that all tests cases succeed): if I'm not mistaken get rid of test assets would have stopped Jia Tan's XZ backdoor attempt dead in its track (for example). Because IIRC there were binary data part of the backdoor hidden in some asset only used by test cases.
P.S: as a bonus they also allow to detect bit-flips (I'm not saying there aren't other ways to detect bit-flips: what I'm saying is that if you have deterministic builds anyway and something doesn't reproduce correctly due to a flipped-bit, it's going to be noticed).
This fights against "opensource-washing" which is the practice of large companies claiming to release open source code, but the compile takes so long (as well as being overly-convoluted) that most people and many distros can't afford to maintain the package.
It feels like AI and traditional software are converging in complexity.
Has anyone fought Microsoft Visual Studio successfully to produce reproducible builds of C++ programs? From what I have heard, it is one of the worst contexts to do it.
Probably easiest way is to use Bazel to leverage the effort that has gone in there
Well, you can't build MSVS yourself, reproducibly or otherwise, so this is a less appealing endeavor I would think.
It's that RICH header that you need to exclude. I just tested my copy of MSVC 2019, and `/emittoolversioninfo:no` will exclude the RICH header from the binary. Supposedly also works in MSVC 2022.
The build timestamps in the PE header and export table are also a problem as well.
... and most of this work is done by other distros and maintainers. Starting with binutils
”Optimize the code for 5 seconds”, as many compilers, including vc++ on windows did, was probably one of the dumbest thing ever invented. It meant that the binaries became more optimized when building on faster computers.
A small step for debian,
giant leap for mankind.
As someone who recently spent a lot of time on making a large C++ program entirely reproducible on 4 different OS’es, one cannot understate just how many tiny details matter here.
"overstate"
Whoops, yes. Well I hope the point came across anyway.
it's funny that as a non-native speaker, I have to check with Gemini about how "cannot overstate" is used
I also asked Gemini whether we express ourselves that way in my mother tongue (Mandarin), and yes, we do, but it came off as being too formal way of speaking. We don't normally use it (I'm not from China/Taiwan though)
That's cool but I'm honestly a bit disappointed in how apt refused to embrace/support both the container and AI/GPU aspects of computing. Are we going to see some changes there?
Those seem like unrelated things? I can imagine ways for apt to integrate with containers, but what would it possibly do for AI or GPU other than delivering packages like it already does?
What exactly are you talking about? Those don't seem related.
Why the fuck does that site break the back button? DO NOT do that.
since there is no other way to reach you please allow me to use this off topic message to let you know that there is a response to your comments on the gnupg discussion from two weeks ago.
Debian must ship packages without the hard dependence on systemd.
So much time has been wasted on reproducible builds which could have better spent on securing more important parts of Debian. Practically minor changes like a build timestamp being different is not an issue.
It allows verifying that the binaries actually match the source, which is extremely valuable.
Bit for bit matching is not required for that.
Yes, making sure build timestamps are reproducible isn't a security win.
What is a win is that two independent parties can run the same build, and get the same binaries.
This is important because it removes trust from builders: anyone can verify their output.
It just so happens that unimportant things like build versions impede that.
Anyone can verify the actual code in the binary matches even if some bytes within the binary file itself are different. The verification routine doesn't have to be a basic bit for bit equality test.
For sure.
This has been the status quo in Debian for a while now. You can build, and use diffoscope to audit the differences.
It's a stronger security property to have bit-for-bit reproducibilty, and it looks like Debian are ready to commit to it.
you are free to provide patches instead of bitching.
And Debian is able to offer me a few million dollars yearly to help fix their security situation.
the idea that debian has a few million dollars to spare creates the assumption, that even if they would have... you would either not know how to fix issues, or not worth it.
Debian, like any other legacy distro, mush became declarative, because the '80s model of manual deploy and the absurd pain of D/I and Preseed must end.
In the end, Nix is just a thin veneer on this stuff.
Given how many quick & dirty sed patching or exec commands I've seen in the few nix package/modules I've read, I would not exactly bet my life on it being completely idempotent & reproducible.
it's the best option after IllumOS (OpenSolaris) IPS integrated with ZFS. Far less powerful not imposing zfs (only well supported for root, swap, encryption etc), so not integrated in the package system and bootloader management (BEs, Boot Environments).
It's not reproducible bit by bit, it fetch the current version of anything, but it's still easy to reproduce enough, stable enough and complete enough, while classic distros need a fresh install every major release or facing issues an keeping a system in unknown state for long until it explode.
bootcrew have bootc Containerfiles for Debian, Ubuntu, Arch, and openSUSE:
https://github.com/bootcrew/mono
I've been 100% on NixOS on many years, but it's Debian that really drove this project.
They're still a pragmatic choice for many usecases.
zero improvement on end-user experience. does not solve supply chain issues, debian package will reproducabily contain the malware from upstream.
Debian has had a better "software supply chain" posture than any other player in the ecosystem since before the turn of the century. While we all face the risk of malware from upstream, Debian is the least at risk of being affected by it. See for example the stream of issues from npm et al. None of it has affected Debian.
> for example the stream of issues from npm et al.
Curious, what distros where affected by npm supply chain attacks?
It's npm that's affected, therefore it's not even considered when choosing language/ecosystem for writing distro tools. You'll find no sane distro writing package manager in javascript precisely to avoid this joke of a supply chain.
I quite like the OpenBSD approach to Go and Rust projects in ports. They store all the dependencies and their hashes in the build recipe, not trusting the project ones. And they’re more readable.
Here is jujutsu’s list of dependencies[0] and their hashes[1]. As an aside, that’s why I don’t like those packages managers. Something like Python’s numpy or lib curl, get sliced into atomic portions.
[0]: https://github.com/openbsd/ports/blob/master/devel/jujutsu/c...
[1]: https://github.com/openbsd/ports/blob/master/devel/jujutsu/d...
ECMA-262 doesn't require the use of NPM or NodeJS. (In fact, they are at odds, even 10+ years after modules were standardized in ES6.)
You do remember the xz-utils backdoor was found in Sid right?
https://en.wikipedia.org/wiki/XZ_Utils_backdoor
It would have been found in a whole lot more places if it hadn't been for that meddling Microsoft employee.
It does not solve all supply chain issues, it do solve some supply chain issues.
Not being able to see if the source code shipped is the same as been used for creating the binary is scary
Has there been a single publicly known attack that would have been prevented by this?
Several actually. Pypi is regularly targeted in this way.
Hasn't happened in Debian
“Hasn’t happened” is quite naive. It happens internally - putting unscrupulous code in a company’s distro before torching the place is a surprisingly regular occurrence in places which have long since adopted Debian as a platform host. IT departments around the globe will benefit from this immensely.
And reproducible builds do not prevent that.
The one single fail point they prevent is infected build hosts.
That might be some reasonable benefit for the company if it is building it on public architecture, but for projects like Debian that insist build hosts are basically offline (package in, package out with no internet access during build process) it is very fringe benefit.
Nonsense, of course reproducible builds can be used by IT departments to catch nefarious behavior - they regularly do.
But how many of those attackers also had the ability to publish a github commit but didn't to remain more stealthy.
This question is meaningless. Attackers will pick the best attack if they have more at their disposal. The fact that they didn't push a commit shows it's better not to. So closing that attack is good.
There is meaning. The difference in detection time does have meaning. If the improvement of detection time was marginal there may have been a different project time could have been invested in to make it even faster to catch such things than reproducible builds.
Zero in Debian. They have enough other procedures to catch it.
Less diligent projects had it but there are easier ways to fix it
Why should it only be valuable if the effects were to be publicly known?
There are plenty of places in industrial computing where reproducible builds have prevented subterfuge within the organizations themselves. Injecting binaries to do inf-/exfiltration is a long-standing industrial espionage activity which is of immense value to all users of the operating system - not just the consumer users.
My magic beans have prevented thousands of tiger attacks in top secret underground moon bases, never you mind that there's no way for me to actually prove this.
There's a certain irony in pushing for verifiable builds with completely unverifiable claims.
I've worked at several of the biggest targets for espionage, industrial or otherwise, and to the best of my knowledge, the only thing that's ever been discovered by their reproducible build efforts has been failing hardware on build reproducers
You probably don’t have enough experience with professional enterprise IT departments. Rootfs audits are a thing made a lot easier, and more effective, with reproducible builds.
> zero improvement on end-user experience.
Maybe not by itself, but it does allow for the ecosystem to be audited, in a way that ultimately benefits the end-user. It really is an important part of a healthy supply chain.
no problem in Debian since the start of the effort would be solved by reproductible builds
This is nice pat yourself on the back achievement for people that prefer security theatre and checking boxes than doing something actually useful, and they wasted thousands man hours of poor victims that had to implement it
That's not what reproducible builds aim to prevent, and no one claims that. When upstream pushes bad code, that's on upstream.
The thing reproducible builds aim to prevent is Debian or individual developers and system administrators with access rights to binary uploads and signing keys to get forced to sign and upload binary packages by attackers - be these governments (with or without court orders) or criminal organizations.
As of now, say if I were an administrator of Debian's CI infrastructure, technically there would be nothing preventing me from running an "extra" job on the CI infrastructure building a package for openssh with a knock-knock backdoor, properly signing it and uploading it to the repository. For someone to spot the attack and differentiate it, they'd have to notice that there is a package in the repository that has no corresponding build logs or has issues otherwise.
But with reproducible builds, anyone can set up infrastructure to rebuild Debian packages from source automatically and if there is a mismatch with what is on Debian's repository, raise alarm bells.
Reproducible builds shows that, within a specific configuration, the code produced the binary, regardless of who signed or published it.
Indeed, this could mitigate an attacker replacing the binary with something that's not produced from the code, but it does not mitigate the tool chain or code itself containing the exploit, creating a malicious binary.
Well - reproducible also means code guarantee. It may not improve an end-user experience directly, but you get an extra quality control step, as guarantee, here. I think reproducibility is great. If we can achieve that, it should be achieved. See also NixOS; it can guarantee that snapshot xyz works, not just for one user, but ALL users. I see it as hopping from guarantee to guarantee. That's actually a good thing in the long run. Just think differently here.
This is some of the best news I've heard recently when it comes to figuring out how to produce high quality Software Bills of Materials for the upcoming EU Cyber Resilience Act, for what it's worth. Reproducible packages are actually worth a great deal when you are selling products with digital elements. Much easier to scan through, audit, etc. with confidence.
If you find yourself holding opinions of the kind: "If it can't be made perfect, it shouldn't be changed at all?" you may want to consider that most things that work well today were incrementally improved.
Reproducable builds are not solving all issues as you rightly observed, but they can be a stepping stone (or even a pre-condition) for further measures.
> zero improvement on end-user experience
The end-user experience is that now you can host your Debian binaries in caches and CDNs without worrying about supply chain hackers.
You can verify that file hashes match the ones on Debian's website and sleep much better at night.
If you don't trust Debian's website then you can rebuild yourself and check if Debian has been compromised.
You could already do that since Debian cryptographically signs all its package indexes, and the indexes contain the hash of all packages. The additional guarantee that reproducible builds bring is that you can re-build the packages in your own controlled environment and verify that the resulting package is bit-for-bit identical to what Debian offers.
Who is this mythical end user? Reproducible builds are good for everyone - not just the average joe.