Automating some parts of packaging does not remove the important part: reviewing the code, checking for past security issues, adding sandboxing, removing privacy breaches, testing that package works well.
Packaging for Debian is not meant to be a 5 minutes effort like some other distributions - by design.
It takes work to make sure users can trust some software.
I don't disagree - I'm not suggesting anywhere that this be used to directly upload to unstable. There's definitely a class of users who would like the balance between freshness and quality to be closer to freshness for some packages.
There's a big warning on https://janitor.debian.net/fresh about the packages not having had human review, and the last paragraph of the blog post mentions this as well.
That said, I do believe strongly that we can let automation take care of more of the (boring bits) of the packaging process even for unstable, and leave the interesting bits that automation can't help with to the humans.
I hope to live long enough to see TOTAL AUTOMATION (tm) !!!
Joking aside, this is truly impressive. Even if it is just a logical step, it's great to see that a group of independent people (i.e. not paid by a single company) can coordinate at that level... (I come from an age where swapping floppy disks to share code was acceptable)
IIRC before 2007 debian policy was that packages had to be installed from tarball even if a repo existed. Thus it is likely that the info about the repository was simply not tracked. It might still be the case for many packages.
It stemmed from the fact that often releases tars might not be exactly the same as asking git to do a tarfile. For example releasing Python packages may require a build step and installing from PyPI or a git repo is sometime not the same.
That's still the norm! You're supposed to use the upstream tarball if one exists, and all of the tooling expects a tarball of some sort.
In fact there's an active discussion right now on debian-devel about maybe changing the norms to prefer VCS archives to upstream release tarballs (and rerunning any of the code generation steps like autoconf as part of the Debian build): https://lists.debian.org/debian-devel/2021/08/msg00133.html
... Also, the tooling not only expects a tarball, it expects every tarball with the same name to be bit-identical. So if foo 1.0 is already in Debian, you'd better not generate a slightly different foo-1.0.orig.tar.gz when you do a packaging update but you're still packaging version 1.0 of the upstream sources. In 2007 a tool called "pristine-tar" showed up to allow you to compute just enough information to regenerate the actual foo-1.0.orig.tar.gz from the foo 1.0 VCS tag and commit that delta onto a separate VCS branch. It's a pretty crucial but extremely finicky part of most workflows involving maintaining Debian packages via a VCS. And then it turns out that the author of pristine-tar actually was somewhat trying to "point out absurdities in the way things are done" (https://joeyh.name/blog/entry/twenty_years_of_free_software_...) and people just happily adopted the tool and made things more absurd.
I'm not using debian that much anymore, but I'd be happy to make any changes upstream to make it easier on debian.
When possible I try to have reproducible builds (https://reproducible-builds.org/) that respect SOURCE_DATE_EPOCH extracted from the commit datetime. At least for IPython I alway build twice to make sure the sha is identical.
I'd also be happy to have a way to signal to debian that there is a new release instead of having QA watch have to scan things regularly.
> I'd also be happy to have a way to signal to debian that there is a new release instead of having QA watch have to scan things regularly.
That would be great! Unfortunately the current forging model is very centralized, centered around the UX of private companies with their internal teams and little (if any) outside cooperation. That's why we need decentralized forging systems: https://staticadventures.netlib.re/blog/decentralized-forge/
There's an ongoing 5000€ bounty for implementing ForgeFed federation (based on ActivityPub) on top of Gitea, in case you know people familiar with golang/ActivityPub who'd like to help.
Also worth noting, i feel like there is room for experimenting with anti-authoritarian forging where you don't need upstream permission to receive webhooks (or other update notification). I've started experimenting with this but still pre-alpha quality: https://thunix.net/~southerntofu/forge/
> All of these CI/CD plateforms consider the repository itself should contain the tasks to be run, for example in a .gitlab-ci.yml file. This top-down deployment model is well suited to an organization controling the whole of its software supply chain, but is a severe restriction to 3rd party involvement, which mostly hinders volunteer-run projects.
> The forge suite adopts an opposite approach, where anyone can receive updates from remote repositories, and run the tasks they wish. This allows anyone within or without your projects to setup new test suites, benchmarks, and integrations. The applications are endless and should benefit your projects in many ways.
You can actually notify the Janitor when there are new commits (and possibly tags indicating releases) in an upstream repository. The URL to target in the webhook is simply "https://janitor.debian.net/", and supported webhooks are those from GitLab, GitHub, Launchpad and Gitea.
Note that vcswatch doesn't actually scan upstream repositories, just Debian packaging repositories.
For those interested in reproducible builds, the gitian [1] project is a fairly simple VM which sets the up the necessary environment for doing this sort of thing.
The tooling and community around reproducible builds is growing all the time, and imo we should be insisting on it for things such as government apps.
Building from tarball has many advantages over building from a VCS, especially a DVCS. The main one is that tarballs are serialized, hash-able, cache-able, mirror-able, checkable objects. Unlike a tag from which some serialized version must be synthesized (and therefore has multiple interpretations) and may require multiple round-trips to do from a remote resource.
In my Linux distribution, if something doesn't have a release tarball then I do something similar to "pristine-tar" to create the synthetic version that's stable over the decades (since I may need to rebuild the same software 10 years into the future). This is done through an HTTP-based service so I can fetch the result as if it were a tarball, and use all of the caching (by hash, see [0])as a result.
Release tarballs are VASTLY preferred for this reason.
Languages that try to "help" by circumventing this -- making reproducible builds into the future difficult -- have special helpers which go and fetch the resources (again, as serialized, checkable, hash-able objects) and make them available.
Some projects also attempt to download things from the Internet during the "build" phase of the project, so I created a tool to drop network access during that phase as well [1].
Debians's view of the package source being a fork of the original source is definitely a bit odd. Most other packaging format see the original tarball as its own thing and then provide a recipe to build that tarball.
A lot of Debian packages come from small projects, often ones with a single developer. Before 2007, there weren't a lot of options for public source control -- GitHub didn't exist yet, SourceForge's CVS/SVN hosting was a pain to work with, and hosting it yourself was even more of a pain. Many of these small projects were probably only available as release tarballs, so that's what Debian built from.
I guess it depended on which part of the open source crowd you were/are involved with. Pretty much every project I cared about back then, as far back as the mid- to late-90's, had propped up a self-hosted vcs server. I used to run a personal svn server which wasn't that big a deal to set up and nearly zero effort to maintain. Where svn fell apart was the distributed part of the equation: it worked fine for the core developers who had commit access, not so great for anyone else who wanted to contribute back. Even today there are number of projects that are still resistant to git (i.e. git, not github) and clinging to svn.
I'm thinking of the projects that weren't even big enough to have multiple contributors. Nowadays, it's typical for most software authors to have their source code in version control and host it somewhere, regardless of whether there's anyone besides them committing to it. In the mid-2000s, you wouldn't typically go to that effort unless you had multiple active contributors. Single-developer projects, and projects with only occasional outside contributors, wouldn't typically be in public VCS.
I think what this graph shows is, contrary to what the text alludes, the VCS use for Debian packaging. There was no standardized way to declare that until late 2006, and it was offically documented only in 2008:
Debian packaging is its own form of source control: there's a debian/changelog file with versions, and every time you submit something new to the Debian archive, it has to be a new version with a new changelog entry.
Debian also has a strong individual ownership culture of packages, and it was even stronger then. Each package has a listed maintainer (nowadays it's often a team, but there isn't an "everyone" team short of the Debian QA Group, which is who orphaned packages get reassigned to). While every Debian developer has technical access to upload every package, it's strongly socially frowned upon to upload someone else's package. So under that model, you don't really need the collaboration or concurrent-editing features of source control systems.
And yes, I remember that in about 2007 very few Debian packages had their packaging in any sort of standard version control tool (CVS, SVN, etc.) You'd get the sources via apt-get source, and then if you wanted to contribute, you could send in a patch by email to the bug tracker and the maintainer would incorporate it.
Of course there are a whole lot of other quality-of-life features that real version control systems get you, which is why people ended up adopting them. But it's not like there was a shared directory on a server somewhere that every Debian developer just edited in place or whatever.
> While every Debian developer has technical access to upload every package, it's strongly socially frowned upon to upload someone else's package.
I wonder how much malware is in there, that we haven't found, because of this. I'm willing to bet there is some in there, especially since at this point Debian is for sure targeted by professionals.
Well, first off, if you're providing all the software for medical devices or an airport, you probably don't care everything in the OS. You care about the software going into your platform, and from a minimal-trusted-base / least-privilege standpoint, you probably want to minimize the amount of software you use to the extent possible, whether or not you review it. So it's unlikely they're auditing the entire distro.
Second, if these companies are in fact auditing the code (which is a lot of code!), as opposed to just selling insurance and hoping for the best, that means that
a) it's some employee's job to spend a portion of their time reviewing the code
b) when they find issues, they report those bugs somewhere, so that the bug gets fixed
Can you point to either a job posting that lists reviewing Debian packages as part of the job requirements (or equivalently, someone's résumé/LinkedIn that says they did this work), or a bug report from one of these auditors?
Some commercial distros and various internal-only distros have a legal team to do license review. That's just the first step.
You can easily find jobspecs for security analysts in tech companies, or system engineers hired to handle the software lifecycle.
I opened the bug reports you are mentioning myself, and security advisories. I cannot name companies and colleagues, obviously, otherwise I would have done it already. The companies I work[ed] for rebuild entire ecosystems of packages, do legal review but don't do security audit on things like games.
As a developer of an application, I have no special relationship with Debian. It’s nice they exist, and I do recognize the important role they’ve played in Linux, but still as a maintainer & author I don’t care about them and their practices and requirements any more than Homebrew, or a bunch of other Linux distros/packaging projects. Is what you wrote consistent with what I just wrote?
Debian is almost like a trusted retail shop. They mediate between you (the producer of the software) and me (the consumer of the software). The quality control they are responsible for, does make them a trusted "shop" for the users - for example, one believes that using debian packages protects from malware.
It's natural the user cares more about this system than the producer - for the producer it looks like hassle and a middleman.
(Sidenote: Sometimes I'm the producer side of debian packages too.)
> Sometimes I'm the producer side of debian packages too.
But my point is there's no such thing as creating a "debian package" from the producer side. We create software to be used on certain operating systems; perhaps only unix-like OSs. But we don't produce software "for debian".
I do mot think that what you wrote and what I wrote intersect in any way, to the point where I'm wondering if you replied to the wrong comment. (So, yes, they're consistent.)
I'm discussing how Debian handles versioning for the packaging scripts/metadata, which are effectively a very small software project that depends on your project. Those practices are no more relevant to you than some third-party software project project that compiles yours and uses it as a library, or some Ansible playbook that installs your project, or whatever. You might be interested in the existence of those projects and might want to address bugs they find, but whether they use version control is quite irrelevant to you.
I did mean to reply to your comment, but sorry if I've conflated packaging and application development. It was this sentence that confused me most:
> in about 2007 very few Debian packages had their packaging in any sort of standard version control tool (CVS, SVN, etc.) You'd get the sources via apt-get source, and then if you wanted to contribute, you could send in a patch by email to the bug tracker and the maintainer would incorporate it.
Here I think you're using "contribute" to refer to contributing to the application codebase, not the packaging codebase. But why would contributing to the application codebase have anything to do with Debian technology such as apt-get source and a debian bug tracker, and why would the maintainer pick up patches from such places? Surely, to contribute to the application you go to the application homepage (often Github nowadays) and open a PR?
No, I mean contributing to the packaging codebase. That graph is from https://trends.debian.net/#version-control-system , which has information about Debian packaging codebases ("source packages").
If anyone other than the package's maintainer / member of the maintaining team (whether a fellow Debian developer, an upstream author/contributor, or a user) has a contribution to make to the Debian package, they should contact the Debian maintainer. In 2007, for almost every package, the only way to do that would be to send the maintainer a patch file, generally through the Debian bug tracking system. (Nowadays, the majority of Debian packaging repos are on https://salsa.debian.org, a GitLab instance, and you can usually open a merge request there and maintainers will notice and accept them.)
The Debian packaging repo consists of, effectively, a build recipe and possibly pre/post install/remove scripts to run on the end user's system. (In practice there are common tools to do the build and to generate those scripts, so you'll find a lot of files to influence the behavior of such tools.) It can also consist of extra changes added by Debian, such as missing manpages or translations, or even patches to the code.
If you have a change to make to a piece of software packaged in Debian, but it's unsuitable for the upstream repo, you can send it to the Debian maintainer and ask them to include it in the packaging repo.
(It also used to be a vague norm that you would always contact the Debian maintainer, who would forward contributions onto upstream if suitable. It's questionable whether that was the right model, and it's certainly not the norm today.)
> The QA watch service regularly polls all watch locations in the archive and makes the information available, so it’s possible to know which packages have changed without downloading each one of them.
Is there a way to notify the watch service of a new version to avoid (or decrease frequency) of regular polling ?
> Of course, since these packages are built automatically without human supervision it’s likely that some of them will have bugs in them that would otherwise have been caught by the maintainer.
Human supervision isn't enough to protect the supply chain, and I can't think of a time that it's actually stopped an attack at the packaging stage, but having some extra "friction" in the process seems like it should be a benefit. Ideally an attacker would have to get past both the upstream author and the Debian maintainer, rather than these being two separate single points of failure.
Fortunately the Debian project is improving the situation with regards to supply chain attacks by continuing to work on Reproducible Builds. I think the next step from there needs to be Binary Transparency, with the adoption of the sort of approach being trialled by Arch Linux:
Oh cool, I didn't know about this. It's always been something I've been concerned about, but wasn't sure how to address. I like their summary:
Cryptographic signatures prove that a message originates from somebody with control over the private key, but it's impossible to prove that the private key didn't sign additional messages. In update security this means a signed update is a strong indicator the update is authentic, but you can't be sure this update is the same update everybody else got. Somebody in control of the update key could craft a malicious update, sign it and feed it specifically to you.
To alleviate that concern a bit, Debian apt can be configured to fetch updates via TOR, so that it becomes (more) difficult to send YOU -specifically- a doctored update.
My suggestion would be that enabling automatic package updates should require a public key that the author uses to sign their tarballs, or at least check in a signature to the repository if not using tarballs.
The only repo I manage that's packaged in Debian is used by the packager as a base, but they work directly off a fork (https://github.com/kilobyte/3270font) of my repo (https://github.com/rbanffy/3270font) to generate packages. As far as I understand, Adam publishes many other packages and has an automated workflow for that. They end directly on Sid.
It seems like a logical option to eventually offer if it doesn't already (which I presume it doesn't, or the post would have mentioned it) - [1] is a post about the bot cutting PRs when it finds other things it thinks it knows how to correct.
Ian Murdock is considered the founder of Debian, but he shifted his efforts to Alpine Linux while working at Docker. I personally find Alpine and Arch packaging so much easier overall.
Automating some parts of packaging does not remove the important part: reviewing the code, checking for past security issues, adding sandboxing, removing privacy breaches, testing that package works well.
Packaging for Debian is not meant to be a 5 minutes effort like some other distributions - by design.
It takes work to make sure users can trust some software.