Breaking Changes in Python-3.14

6 points by wef 2 months ago

One of the reasons Linux is so successful is Torvald's insistence on never breaking user space.

The python people don't adhere to this principle and maybe I need to give up on it, I'm just sick of this crap.

Those of us of a certain vintage will recall the tumult of python-2 to -3. Suddenly, production code needed significant re-writing. We got an arguably better python out of it, but oh! the pain.

In 3.14, (among many other things) the python developers decided to make 'forkserver' the default instead of 'fork' for the Process() method (this is for starting a child process - https://docs.python.org/3/library/multiprocessing.html). Why on earth break our code in such a wanton way? Why not leave the default alone - there was always the option to use 'forkserver' if one wanted it. Or maybe they could have created a new entrypoint with the new behaviour Process_fastserver() or some such? Oh no! Just break it and make their customers patch furiously!

When we adopt a language, we like to think that what runs today will run tomorrow - C and bash programs that I wrote 30 years ago still run. Not with python - if you use it, buckle up and make sure your regression tests are thorough, it'll be a rough ride.

Move slow and break things, perhaps?

mattbillenstein 2 months ago

Why do you consider using forkserver as the default a breaking change? This should be transparent to most users?

fork() is very hard to get right as it keeps a lot of state around from the parent process you have to cleanly discard...

perrygeo 2 months ago

This is far from transparent. It's a massive change in behavior. Fork may be very hard to get right, but every python developer using multiprocessing has already paid that cost - and expects it to keep working!
With fork, you could pass objects that couldn't be pickled (lambdas, local functions, file handles, database connections). With forkserver, everything must be pickleable. That alone breaks thousands of repos of code.
You can no longer work with global module-level objects, so it fundamentally changes how scoping rules work.
It launches a server with some extra machinery at runtime - startup cost and hidden complexity just snuck its your app without you knowing.
forkserver may be technically a better choice. But that's irrelevant. Changing the default breaks existing code.
- zahlman 2 months ago
  
  > That alone breaks thousands of repos of code.
  1. Can you point at any?
  2. The fork option didn't disappear. It takes a single line of code to reconfigure it, and code that depends on the fork behaviour is better off for making that fact explicit.
- mattbillenstein 2 months ago
  
  Eh, not letting the language ever evolve is a sure way to death, ymmv.
  Forkserver is probably a better default, inheriting file handles, globals, and sockets leads to a bunch of subtle bugs - I'm not sure that's even a good feature, also ymmv.
  And fork() is still available, so if it breaks things, the solution would be to explicitly ask for fork() - and I'd say for most casual uses of multiprocessing, a user won't know one way or the other which is what I meant by transparent.

d--b 2 months ago

What's worse is that Python libraries like pandas do that too.

The more I use Python in a professional environment the less I think it is suited for professional environments.

CopyOnWrite 2 months ago

What really annoys me about Python, is that a lot of problems that the language/infrastructure/community has can easily be tracked back to well understood problems that have been solved for decades in other communities. Some of these problems have been fixed with breaking changes, some others probably never will be fixed.

Just a list of bad/wrong decisions IMHO:

- Reference counting instead of using a real garbage collector

- The pyproject.toml format is under-specified, comes decades too late for a problem that has been solved good enough by Apache Maven more than 2 decades ago

- The absolutely weak support for functional programming, which then was patched by list comprehensions and co later

- venv and other solutions to isolate dependencies for a project

Python is successful because of the community support and a binding to almost everything, and this sadly outweighs a lot of poor choices from the language designers and the implementation. I just always feel frustrated, that during the great breakup from 2-3, they didn't try to fix more of the known issues (which again, have been solved by other communities decades before) instead of breaking the world and still managing to half ass it.

zahlman 2 months ago

> I just always feel frustrated, that during the great breakup from 2-3, they didn't try to fix more of the known issues
Absolutely agreed. I would rather they left 2.x in someone else's hands entirely. Instead they were ultimately forced to produce 2.7, wheedle and bargain to not have to produce 2.8, and give 2.7 an extended support window instead; and they still got complaints and abuse from people upset about making changes that were generally quite easy to make.
I would still have completely switched to 3.x (at least when it appeared ready) simply based on it coming from the same team and advertising the main features and fixes that it ended up with.
> Reference counting instead of using a real garbage collector
This is not explicitly part of the design. You are free to use other implementations.
But before the `with` block was added (in 2.5), reference counting was useful for deterministic cleanup.
> The absolutely weak support for functional programming, which then was patched by list comprehensions and co later
I'll give you this one. GvR actually didn't like this stuff very much, which is how `reduce` lost its builtin status. The basics can take you really far, though. The important thing is that functions are first-class objects.
> venv and other solutions to isolate dependencies for a project
There is nothing fundamentally wrong with venvs. 99% of the complaints I hear about venvs are not actually the venv's fault. (They're usually pip's fault. Pip is bad.)
A venv is literally just a place to put packages (organized roughly the same way the system packages are) and a simple scheme for enabling Python to understand where that place is (so that `sys.path` can be modified at startup). Every other possibility has the same problem: either you have to learn one or more options for telling Python to use that package location, or it works by magic (which means eventually you have to understand the magic, and anyway "Explicit is better than implicit").
> The pyproject.toml format is under-specified, comes decades too late for a problem that has been solved good enough by Apache Maven more than 2 decades ago
pom.xml is not at all "good enough". The pyproject.toml format is actually quite detailed (https://packaging.python.org/en/latest/specifications/pyproj...). It also has explicit design goals of a) allowing arbitrary tools to store their config and b) being human-readable and -writable.
More importantly, Python didn't have something like Maven in 2004 alongside Maven because Maven only has to worry about Java code; the Python packaging system has to worry about potentially every programming language, and very commonly C, C++, FORTRAN and now Rust. In 2003 (with Python 2.2) it got `distutils`. That was designed (and Setuptools picked up the torch) to use code in `setup.py` for everything, because nobody had any concept of a real "ecosystem" yet (see if you can find some archives of the Cheeseshop from its first year, if you dare) and it was simple and seemed Pythonic at the time.
And now we're in a thread where OP is upset that Python code doesn't work for 30 years, which seems like a pretty common take. It turns out that when you accommodate backwards compatibility, it gets in the way of actually fixing old problems quite a bit. There are still projects out there that are just Python code that could be perfectly well built using Flit, that are using (slow!) legacy setup.py-based build setups. Not because there's anything wrong with pyproject.toml for their purposes (there absolutely isn't), but because nobody is forcing them.
People don't even keep their setup.py contents up to date, let alone pay attention to any kind of deprecation warnings. Literally the only thing that gets anyone to do anything is to break their setup, and then for most of them their action is to complain and make the Setuptools team put things back how they were.
I'm not just talking about random developers on tiny forgettable projects, either. I'm talking about PSF-managed projects like Requests that:
* are pure Python code
* are among the most popular packages on PyPI (Requests is top ten)
* had their "source builds" break this year (https://github.com/psf/requests/issues/6775) because of a removed import in setup.py for something that nobody has actually used for years (a legacy testing setup that has been completely abandoned since the project uses tox)
* already had a `pyproject.toml` file but weren't actually using it to store even basic build configuration (still in `setup.cfg`, so they had all three files for really no reason).
(Thankfully, something is actually getting done in Requests. Supposedly. The work for this was apparently "already done" in May 2024 and is supposed to appear "in the next minor version release". The last release as a patch release this August. The main branch of the repo doesn't reflect this work.)

WillowAndrew 2 months ago

If you want things to stay as they are, just pin your versions.

zahlman 2 months ago

> The python people don't adhere to this principle

It's harder to "not break user space" when you don't have a "kernel space". No, the C API does not count.

> Those of us of a certain vintage will recall the tumult of python-2 to -3. Suddenly, production code needed significant re-writing. We got an arguably better python out of it, but oh! the pain.

I have been using Python since 2005. I recall code that required minor edits, many of which could be and were automated using CPython's own provided tools. I remember the joy of getting text files that could trivially handle a file encoding and universal newline support at the same time, and binary files that were actually binary, supplying a fundamentally distinct type of data. I remember no longer having to worry about getting `UnicodeDecodeError` from an encoding operation (https://stackoverflow.com/questions/9644099) or vice-versa (https://stackoverflow.com/questions/5096776) and no longer having to explain that nonsense to people. I remember the realization that beginners would no longer be taught to introduce arbitrary code execution exploits into their programs on the first day of lessons.

We got an inarguably, massively better Python out of it.

> C and bash programs that I wrote 30 years ago still run.

Because they are very nearly self-contained. Your life will be happier if you think of non-essential parts of the Python standard library (like `sys` and `os`) as if they were third-party.

> the python developers decided to make 'forkserver' the default instead of 'fork' for the Process() method.... Why on earth break our code in such a wanton way?

Everyone always seems to take this sort of thing extremely personally for some reason.

The entire point of default arguments is to reflect what most users are most likely to want. Sometimes that changes. The same document you link clearly explains the reason for the change:

> Changed in version 3.14: On POSIX platforms the default start method was changed from fork to forkserver to retain the performance but avoid common multithreaded process incompatibilities. See gh-84559.

And there is a link to the bug tracker where the OP explains what's wrong with `fork` on POSIX:

> By default, multiprocessing uses fork() without exec() on POSIX. For a variety of reasons this can lead to inconsistent state in subprocesses: module-level globals are copied, which can mess up logging, threads don't survive fork(), etc.

and later:

> Given people's general experience, I would not say that "fork" works on Linux either. More like "99% of the time it works, 1% it randomly breaks in mysterious way".

Note that the discussion for this goes back to 2020.

> Not with python - if you use it, buckle up and make sure your regression tests are thorough, it'll be a rough ride.

You have not even attempted to explain how this broke your program. For many other users, it demonstrably fixed issues.

In the time that you spend posting this, you could have trivially just added `multiprocessing.set_start_method("fork")` to your code; or taken more drastic action by for example vendoring the old standard library package (it's under a permissive license and is as far as I can tell pure Python, mainly layers of wrappers around `subprocess.Popen` and `os.fork` and such).

Aside from which, if you have a proper reproducer for something that works with 'fork' but not with 'forkserver', the proper place for your complaint is the issue tracker.

> buckle up and make sure your regression tests are thorough

Good advice in every programming language. Especially if you want to avoid NIH.