dotwaffle 19 hours ago

I have done extensive research on CDC and it almost never works out because most utilities don't create compressed archives in an "rsyncable" (rsync does CDC) format, I actually saved a lot of storage using restic when I switched my backups of certain things so that files were stored in archives uncompressed, and sorted in a stable order. I know syncthing eventually removed CDC and just went with constant-size block sizes.

Bazel, on the other hand, is completely in control of this, and it makes perfect sense to do this at that point -- and it seems to be a relatively efficient implementation too, really nice to see!

a_t48 1 day ago

This is something I'm very interested in implementing for Docker builds. I've tested out CDC for the final image outputs, it results in smaller outputs but requires tuning between saved space versus request count when pulling. For build cache it might be even more advantageous.

  • stabbles 16 hours ago

    Isn't that rather difficult given the `.tar.gz` layers?

    • tracnar 16 hours ago

      It also supports .tar but that's probably not very commonly used.

    • auscompgeek 16 hours ago

      In theory eStargz layers should be amenable to CDC.

      • a_t48 12 hours ago

        It feels that way, but eStargz is still only addressable as a single layer, or range of one.

    • a_t48 12 hours ago

      I have a custom pull client/registry/builder that uses a different format, but can output standard OCI if needed.

londons_explore 1 day ago

Doesn't this mean that malicious inputs can deliberately cause super tiny or super huge chunks?

  • ramchip 1 day ago

    The same is true without CDC, and you can configure a maximum size.

  • rienbdj 20 hours ago

    Bazel caches tend to have a size limit.

    You need to trust your build execution machine anyway. They have your source code and you will be executing the artifacts that they produce!