But still a kludge. Better: use something equivalent to Go's testing/synctest[0] package, which lets you write tests that run in a bubble where time is fixed and deterministic.
- generating test data in a realistic way is often better then hard coding it (also makes it easier to add prop testing or similar)
- make the current time an input to you functions (i.e. the whole old prefer pure functions discussion). This isn't just making things more testable it also can matter to make sure: 1. one unit of logic sees the same time 2. avoid unneeded calls to `now()` (only rarely matters, but can matter)
Similarly, I like .NET's TimeProvider abstraction [1]. You pass a TimeProvider to your functions. At runtime you can provide the default TimeProvider.System. When testing FakeTimeProvider has a lot of handy tools to do deterministic testing.
One of the further benefits of .NET's TimeProvider is that it can also be provided to low level async methods like `await Task.Delay(time, timeProvider, cancellationToken)` which also increases the testability of general asynchronous code in a deterministic sandbox once you learn to pass TimeProvider to even low level calls that take an optional one.
These fake-time environments let you set the time, so you can test how the code will behave in 2039 without waiting for 13 years. For Go's synctest, 1-1-2000 is just the default initial value for now().
I’ve used freezetime (Python) a decent amount and have experienced some very very very funny flakes due to it.
- Sometimes your test code expects time to be moving forward
- sometimes your code might store classes into a hashmap for caching, and the cache might be built before the freeze time class override kicks in
- sometimes it happens after you have patched the classes and now your cache is weirdly poisoned
- sometimes some serialization code really cares about the exact class used
- sometimes test code acts really weird if time stops moving forward (when people use freezetime frozen=true). Selenium timeouts never clearing was funny
- sometimes your code gets a hold of the unpatched date clsss through silliness but only in one spot
Fun times.
The nicest thing is being able to just pass in a “now” parameter in things that care about time.
This sort of thing can be a real problem for bootstrappable/reproducible builds, where you want to verify that the tests all pass. For a while, GNU Guix wouldn't bootstrap with tests enabled because it wanted to build openssl-1.1.1l for some reason, and the test suite contained expired certificates. (This was especially bad in a Nix-ish environment, where changing whether or not tests run changes the build command that the derivation uses, which means that you can't turn the tests off without changing the hash of every dependent package.)
We've done that at a few places I've been at - it's tricky because if the failure is too short its just annoying toil, but if it's too long there's risk of losing context and having to remember what the heck we were thinking.
Overall it's still net positive for me in certain cases of enforcing things to be temporary, or at least revisited.
I always wanted to make feature flags system where each FF must declare an expiration date max 1 year in the future and start failing CI beyond that date to force someone to reevaluate and clean up.
It's just too easy to keep adding new feature flags and never removing them. Until one day the FF backend goes down and you have 300 FFs all evaluate to false.
We had something like this where I last worked. Whenever we were adding new features or adding things that had potential for significant regressions, we were expected to add feature flags around the change/addition and set an expiration date for three months or so in advance. Once that rolled around, we'd either remove the old path or evaluate if it was necessary to have around as a permanent feature.
I think it worked out really well even though it increased the administrative overhead. We were always able to quickly revert behavior without needing to push code and it let us gradually shrink a lot of the legacy features we had on the project.
Are you joking? This is the kind of thing that leads to flaky tests. I was always counseled against the use of randomness in my tests, unless we're talking generative testing like quickcheck.
or, maybe, there is something hugely wrong with your code, review pipeline or tests if adding randomness to unit test values makes your tests flaky and this is a good way to find it
or, maybe, it signals insufficient thought about the boundary conditions that should or shouldn't trigger test failures.
doing random things to hopefully get a failure is fine if there's an actual purpose to it, but putting random values all over the place in the hopes it reveals a problem in your CI pipeline or something seems like a real weak reason to do it.
Not a good idea for CI tests. It will just make things flaky and gum up your PR/release process. Randomness or any form of nondeterminism should be in a different set of fuzzing tests (if you must use an RNG, a deterministic one is fine for CI).
Only if it becomes obvious why it is flaky. If it's just sometimes broken but really hard to reproduce then it just gets piled on to the background level of flakiness and never gets fixed.
To get around this, I have it log the relevant inputs, so it can be reproduced.
The whole concept of allowing a flaky unit test to exist is wild and dangerous to me. It makes a culture of ignoring real failures in what, should be, deterministic code.
Well, if people can't reproduce the failures, people won't fix them.
So, yes, logging the inputs is extremely important. So is minimizing any IO dependency in your tests.
But then that runs against another important rule, that integration tests should test the entire system, IO included. So, your error handling must always log very clearly the cause of any IO error it finds.
I remember having a flaky test with random number generation a few years ago - it failed very rarely (like once every few weeks) and when I finally got to fixing it, it was an actual issue (an off by one error).
If this isn't a joke, I'd be very interested in the reasoning behind that statement, and whether or not there are some qualifications on when it applies.
If it was math_multiply(), then adding the jitter would fail - that would have to be multiplied in.
Nowadays I think this would be done with fuzzing/constraint tests, where you define "this relation must hold true" in a more structured way so the framework can choose random values, test more at once, and give better failure messages.
Damn, must be why only white hair is growing on my head now.
>Nowadays I think this would be done with fuzzing/constraint tests, where you define "this relation must hold true" in a more structured way so the framework can choose random values, test more at once, and give better failure messages.
So the concept of random is still there but expressed differently ? (= Am I partially right ?)
Yes, the randomness is still there but less manually specified by the developer. But also I haven't actually used it myself but had seen stuff on it before, so I had the wrong term: it's "property-based testing" you want to look for.
Randomness is useful if you expect your code to do the correct thing with some probability. You test lots of different samples and if they fail more than you expect then you should review the code. You wouldn't test dynamic random samples of add(x, y) because you wouldn't expect it to always return 3, but in this case it wouldn't hurt.
humans are very good at overlooking edge cases, off by one errors etc.
so if you generate test data randomly you have a higher chance of "accidentally" running into overlooked edge cases
you could say there is a "adding more random -> cost" ladder like
- no randomness, no cost, nothing gained
- a bit of randomness, very small cost, very rarely beneficial (<- doable in unit tests)
- (limited) prop testing, high cost (test runs multiple times with many random values), decent chance to find incorrect edge cases (<- can be barely doable in unit tests, if limited enough, often feature gates as too expensive)
- (full) prop testing/fuzzing, very very high cost, very high chance incorrect edge cases are found IFF the domain isn't too large (<- a full test run might need days to complete)
I've learnt that if a test only fails sometimes, it can take a long time for somebody to actually investigate the cause,in the meantime it's written off as just another flaky test. If there really is a bug, it will probably surface sooner in production than it gets fixed.
people often take flaky test way less serious then they should
I had multiple bigger production issues which had been caught by tests >1 month before they happened in production, but where written off as flaky tests (ironically this was also not related to any random test data but more load/race condition related things which failed when too many tests which created full separate tenants for isolation happened to run at the same time).
And in some CI environments flaky test are too painful, so using "actual" random data isn't viable and a fixed seed has to be used on CI (that is if you can, because too much libs/tools/etc. do not allow that). At least for "merge approval" runs. That many CI systems suck badly the moment you project and team size isn't around the size of a toy project doesn't help either.
Flaky tests are a very strong signal of a bug, somewhere. Problem is it's not always easy to tell if the bug's in the test or in the code under test. The developer who would rather re-run the test to make it pass than investigate probably thinks it's the test which is buggy.
Can't one get randomness and determinism at the same time? Randomly generate the data, but do so when building the test, not when running the test. This way something that fails will consistently fail, but you also have better chances of finding the missed edge cases that humans would overlook. Seeded randomness might also be great, as it is far cleaner to generate and expand/update/redo, but still deterministic when it comes time to debug an issue.
Most test frameworks I have seen that support non-determinism in some way print the random seed at the start of the run, and let you specify the seed when you run the tests yourself. It's a good practice for precisely the reasons you wrote.
There's another good reason that hasn't been detailed in the comments so far: expressing intent.
A test should communicate its reason for testing the subject, and when an input is generated or random, it clearly communicates that this test doesn't care about the specific _value_ of that input, it's focussed on something else.
This has other beneficial effects on test suites, especially as they change over the lifetime of their subjects:
* keeping test data isolated, avoiding coupling across tests
* avoiding magic strings
* and as mentioned in this thread, any "flakiness" is probably a signal of an edge-case that should be handled deterministically
and
* it's more fun [1]
Generate fuzz tests using random values with a fixed seed, sure, but using random values in tests that run on CI seems like a recipe for hard-to-reproduce flaky builds unless you have really good logging.
Arguably you should have a fixed start date for any given test, but time is quite hard to abstract out like that (there's enough time APIs you'd want OS support, but linux for example doesn't support clock namespaces for the realtime clock, only a few monotonic clocks)
I mean, sure, that can happen, but that obviously depends on what the test is testing, it's not like it's bad in all cases to say "now plus 1 year". In the case in question it's really just "cookie is far enough in the future so it hasn't expired", so "expire X years in the future from now" is fine.
Now I feel bad for using (system foundation timestamp)+100 years as end of "forever" ownership relations in one of my systems. Looking now, it's only 89 years left. I think I should use nulls instead.
https://factorio.com/blog/post/fff-388 - they wanted to use a 64 bit int for the tick count, but Lua doesn't have one; so they used the one available and worked out when it would lose precision.
"More than 2 million years seems to be enough for us to not be around any more when the bug reports start appearing."
I once saw a pop-up in a game saying something along the lines of: wow, it's 10 years later and this game is still being played! Made me laugh out loud, nice little easter egg.
Sadly, I don't recall which game it was. Maybe SpaceChem?
I've got some tests in active code bases that are using the end of 32-bit Unix time as "we'll never get there". That's not because the devs were lazy, these tests date from when that was the best they could possibly do. They're on track to be cycled out well before then (hopefully this year), so, hopefully, they'll be right that their code "won't get there"... but then there's the testing and code that assumes this that I don't know about that may still be a problem.
"End of Unix time" is under 12 years now, so, a bit longer than the time frame of this test, but we're coming up on it.
I seem to recall much smugness on Slashdot around the "idiot winblows users limited by DOS y2k" and how the time_t was "so much better". Even then a few were prophesying that it would come bite us eventually ...
While there was a lot of FUD in the media, there were also a lot of scenarios that were actually possible but were averted due to a LOT of work and attention ahead of time. It should be looked at, IMO, as a success of communication, warnings, and a lot of effort that nothing of major significance happened.
"Tragically, we are failing to avoid serious impacts"
"We have now brought the planet into climatic conditions never witnessed by us or our prehistoric relatives within our genus, Homo"
"Despite six IPCC reports, 28 COP meetings, hundreds of other reports, and tens of thousands of scientific papers, the world has made only very minor headway on climate change"
"projections paint a bleak picture of the future, with many scientists envisioning widespread famines, conflicts, mass migration, and increasing extreme weather that will surpass anything witnessed thus far, posing catastrophic consequences for both humanity and the biosphere"
I don't mean to lessen the impact of that statement. I think climate change is a serious problem. But also most of the geologic time that genus Homo has existed, Earth has been in an ice age. Much of which we'd consider a "snowball Earth". The last warm interglacial period, the Eemian, was 120,000 years ago.
this is the same style comment as "no offense, but <offensive thing>"
if you didnt intend to lessen the impact of that statement, why say something that is specifically meant to lessen the impact of the statement? just say what you want to say without the hedging.
What you just wrote is the same as: 'the entire lifecycle of humanity has no precursor to the conditions' we are about to face.
We aren't facing the ice age that has been the last 120,000 years.
I'm sure the rocky planet will survive just fine, maybe even some extreemophiles, even if we completely screw up the atmosphere. Not 6 billion humans though.
I can both be alarmed at how quickly the ice age humanity has evolved within is ending, and find that a very funny way of phrasing it. These things don't conflict in me, though it seems triggering to some. People are downvoting me with moral conscience, but I'm just over here laughing at a funny conjunction of paleoclimate and word choice. :) People getting offended by it kinda makes it funnier.
That's an interesting bit of detail. As you intended, it does not lessen the impact of the statement: "conditions never witnessed by us or our prehistoric relatives". It confirms it, with some additional context.
To me, it seems to make it even more significant. Because as you point out, Homo evolved under ice age conditions over millions of years. Well, here we are about to be thrust into uncharted territory, in an extremely short period of time. With very fragile global interdependencies, an overpopulated planet, and billions of people exposed to the consequences.
Right? I would only caution that neither has the ice age been particularly kind to humanity. It seems at least a couple times to have almost gotten us all. There's a genetic bottleneck in genus Homo which seems to date back ~80k years, which aligns suspiciously with the Toba supervolcano eruption. And another around 850k years ago. During each there were likely fewer than 2,000 breeding humans.
Earth has certainly thrived with a warmer climate. No reason we can't too. The problems - for us and other life - stem from the rate of change. Which is easy to see is very very rapid compared to the historical cycles, but still a slow motion trainwreck compared to an asteroid strike, supervolcano, or gamma ray pulse, all of which it seems Earth has experienced. Life and human society will adapt if it has enough time. The quicker the catastrophe the more challenging that is.
I guess what I'm saying is that we're not doing ourselves any favors, but we also shouldn't underestimate mother nature's ability to throw us a curve ball in the 9th inning that makes everything worse. Life has endured an awful lot on this little rock.
> At the Great Midnight at the century's end, signifying culture will flip over into a number-based counterculture, retroprocessing the last 100 years. Whether global disaster ensues or not, Y2K is a singularity for cybernetic culture. It's time to get Y2K positive.
As others have stated, the lack of visible effect is not the same thing as there never having been a land mine in the first place.
I can tell you anecdotally that on 12/31/2000 I was hanging with some friends. At 12PM UTC we turned on the footage from London. At first it appeared to be a fiery hellscape armageddon. while it turned out to just be fireworks with a wierd camera angle, there was a moment where we were concerned something was actually happening. Most of us in the room were technologists, and while we figured it'd all be no big deal, we weren't *sure* and it very much alarmed us to see it on the screen.
Most updates to avoid the 2038 problem really just delay it until 10889. Maybe in eight in a half millennia, they will have figured out something that lasts longer.
Depends on the unit and how you interpret the bits. Nanoseconds as a signed integer "only" make it about 300 years while seconds as a 64 bit IEEE float enjoy integral precision somewhere out past 250 million years (but if you need microsecond precision then it's the same number but as years instead of mega years).
i had to plant a 10 year time bomb in our SAML SP certificate because AFAIK there is no other way to do it. It’s been 7 years since then. Dreading contacting all the IDPs and getting them to update the SAML config.
But before you judge the fix too hashly, I bet it’s just a quick and easy fix that will suffice while a proper fix (to avoid depending on external state) is written.
of course it is just an easy fix. it's the kind of solution that even someone like me could write who has no understanding of the code a all. (i am not trying to imply that the submitter of the PR doesn't understand the code, just that understanding it is unlikely to be necessary, thus the change bears no risk.
but, the solution now hides the problem. if i wanted to get someone to solve the problem i'd set the new date in the near future until someone gets annoyed enough to fix it for real.
and i have to ask, why is this a hardcoded date at all? why not "now plus one week"?
There’s a lot to be said for simplicity. The more logic you put into handling the dates correctly in the tests, the more likely you are to mess up the tests themselves. These tests were easy to write, easy to review, easy to verify, and served perfectly well for 10 years.
> Us, ten years after generating the certificate: "Who could have possibly foreseen that a computer science department would still be here ten years later."
This was why there was a Y2K bug. Most of that code was written in the 80s, during the Reagan era. Nobody expected civilization to make it to the year 2000.
No, people thought that storing a year as two digits was fine because computers were advancing so fast that it was unlikely they'd still be used in the year 2000 - or if they were it was someone else's problem.
And they were mostly right! Not many 80s machines were still being used in 1999, but lots of software that had roots to then was being used. Data formats and such have a tendency to stick around.
Software has incredible inertia compared to hardware.
It is effectively trivial to buy millions of dollars of hardware to upgrade your stuff when compared with paying for existing software to be rewritten for a new platform.
Funnily enough I worked at a company with a codebase written in the 1980s - no idea what it originally ran on but someone decided in the mid 2000s to update it to run on modern hardware. Unfortunately they chose Itanium... so 20 years later they're paying lots of money for Itanium hardware.
Please don't complain about tangential annoyances—e.g. article or website formats, name collisions, or back-button breakage. They're too common to be interesting.
But still a kludge. Better: use something equivalent to Go's testing/synctest[0] package, which lets you write tests that run in a bubble where time is fixed and deterministic.
[0] https://pkg.go.dev/testing/synctest
in general
- generating test data in a realistic way is often better then hard coding it (also makes it easier to add prop testing or similar)
- make the current time an input to you functions (i.e. the whole old prefer pure functions discussion). This isn't just making things more testable it also can matter to make sure: 1. one unit of logic sees the same time 2. avoid unneeded calls to `now()` (only rarely matters, but can matter)
Similarly, I like .NET's TimeProvider abstraction [1]. You pass a TimeProvider to your functions. At runtime you can provide the default TimeProvider.System. When testing FakeTimeProvider has a lot of handy tools to do deterministic testing.
One of the further benefits of .NET's TimeProvider is that it can also be provided to low level async methods like `await Task.Delay(time, timeProvider, cancellationToken)` which also increases the testability of general asynchronous code in a deterministic sandbox once you learn to pass TimeProvider to even low level calls that take an optional one.
[1] https://learn.microsoft.com/en-us/dotnet/standard/datetime/t...
Java has an interface named InstantSource for this purpose: https://docs.oracle.com/en/java/javase/17/docs/api/java.base...
Paradoxically, InstantSource may have a delay. ;)
Also, if you do use `now()` in this case you can always do `now() + SomeDistantDuration`
This can cause other types of bugs to go unnoticed, such as leap year fun (if you handle 100 years, did you handle the 400th year?).
Doesn't that just turn bugs in test in n years into bugs in prod in n years?
That seems like a downgrade to me!
No, because prod doesn't have hardcoded cookies baked into it?
If you always test with a date of 1/1/2000 then you don't know that your choice fails in 2039.
These fake-time environments let you set the time, so you can test how the code will behave in 2039 without waiting for 13 years. For Go's synctest, 1-1-2000 is just the default initial value for now().
libfaketime is cool for testing this kind of thing too.
Not as convenient for unit tests cause you have to run the test with LD_PRELOAD.
I’ve used freezetime (Python) a decent amount and have experienced some very very very funny flakes due to it.
- Sometimes your test code expects time to be moving forward
- sometimes your code might store classes into a hashmap for caching, and the cache might be built before the freeze time class override kicks in
- sometimes it happens after you have patched the classes and now your cache is weirdly poisoned
- sometimes some serialization code really cares about the exact class used
- sometimes test code acts really weird if time stops moving forward (when people use freezetime frozen=true). Selenium timeouts never clearing was funny
- sometimes your code gets a hold of the unpatched date clsss through silliness but only in one spot
Fun times.
The nicest thing is being able to just pass in a “now” parameter in things that care about time.
Found this one for rust: https://github.com/museun/mock_instant
Ruby's timecop is great for those test scenarios https://github.com/travisjeffery/timecop
This sort of thing can be a real problem for bootstrappable/reproducible builds, where you want to verify that the tests all pass. For a while, GNU Guix wouldn't bootstrap with tests enabled because it wanted to build openssl-1.1.1l for some reason, and the test suite contained expired certificates. (This was especially bad in a Nix-ish environment, where changing whether or not tests run changes the build command that the derivation uses, which means that you can't turn the tests off without changing the hash of every dependent package.)
Isn't it common to set a fake static date and time for reproducible builds?
Interesting, from the title I thought it was intentional, as a "forced code review." Apparently not, but now I really like that idea!
We've done that at a few places I've been at - it's tricky because if the failure is too short its just annoying toil, but if it's too long there's risk of losing context and having to remember what the heck we were thinking.
Overall it's still net positive for me in certain cases of enforcing things to be temporary, or at least revisited.
Which is why SSL certs are now 47 days long or whatever it is.
TLS certs are 200 days (as of last month). Or whatever
https://www.digicert.com/blog/tls-certificate-lifetimes-will...
I always wanted to make feature flags system where each FF must declare an expiration date max 1 year in the future and start failing CI beyond that date to force someone to reevaluate and clean up.
It's just too easy to keep adding new feature flags and never removing them. Until one day the FF backend goes down and you have 300 FFs all evaluate to false.
We had something like this where I last worked. Whenever we were adding new features or adding things that had potential for significant regressions, we were expected to add feature flags around the change/addition and set an expiration date for three months or so in advance. Once that rolled around, we'd either remove the old path or evaluate if it was necessary to have around as a permanent feature.
I think it worked out really well even though it increased the administrative overhead. We were always able to quickly revert behavior without needing to push code and it let us gradually shrink a lot of the legacy features we had on the project.
Just skimmed the PR, I'm sure the author knows more than I - but why hard code a date at all? Why not do something like `today + 1 year`?
Because it should be `today + 1 year + randomInt(1,42) days`.
Always include some randomness in test values.
Interesting, haven't heard this before (I don't know much about testing). Is this kind of like fuzzing?
I recently had race condition that made tests randomly fail because one test created "data_1" and another test also created "data_1".
- Test 1 -> set data_1 with value 1
- Test 1 -> `do some magic`
- Test 1 -> assert value 1 + magic = expected value
- Test 2 -> set data_1 with value 2
But this can fail if `do some magic` is slow and Test 2 starts before Test 1 asserts.
So I can either stop parallelism, but in real life parallelism exists, or ensure that each test as random id, just like it would happen in real life.
Are you joking? This is the kind of thing that leads to flaky tests. I was always counseled against the use of randomness in my tests, unless we're talking generative testing like quickcheck.
`today` is random.
It's dynamic, but it certainly isn't random, considering it follows a consistent sequence
If "today" were random, our universe would be pretty fricken weird.
or, maybe, there is something hugely wrong with your code, review pipeline or tests if adding randomness to unit test values makes your tests flaky and this is a good way to find it
or, maybe, it signals insufficient thought about the boundary conditions that should or shouldn't trigger test failures.
doing random things to hopefully get a failure is fine if there's an actual purpose to it, but putting random values all over the place in the hopes it reveals a problem in your CI pipeline or something seems like a real weak reason to do it.
I don't think anyone is advocating for random application of randomness.
Not a good idea for CI tests. It will just make things flaky and gum up your PR/release process. Randomness or any form of nondeterminism should be in a different set of fuzzing tests (if you must use an RNG, a deterministic one is fine for CI).
That's why it's "randomInt(1,42)", not "randomLong()".
if it makes thing flaky
then it actually is a huge success
because it found a bug you overlooked in both impl. and tests
at least iff we speak about unit tests
Only if it becomes obvious why it is flaky. If it's just sometimes broken but really hard to reproduce then it just gets piled on to the background level of flakiness and never gets fixed.
To get around this, I have it log the relevant inputs, so it can be reproduced.
The whole concept of allowing a flaky unit test to exist is wild and dangerous to me. It makes a culture of ignoring real failures in what, should be, deterministic code.
Well, if people can't reproduce the failures, people won't fix them.
So, yes, logging the inputs is extremely important. So is minimizing any IO dependency in your tests.
But then that runs against another important rule, that integration tests should test the entire system, IO included. So, your error handling must always log very clearly the cause of any IO error it finds.
Burma-shave
This will often break on stuff like daylight saving changes, while almost as often you don't give a rats ass about the boundary behaviour.
I remember having a flaky test with random number generation a few years ago - it failed very rarely (like once every few weeks) and when I finally got to fixing it, it was an actual issue (an off by one error).
> Always include some randomness in test values.
If this isn't a joke, I'd be very interested in the reasoning behind that statement, and whether or not there are some qualifications on when it applies.
Must be some Mandela effect about some TDD documentation I read a long time ago.
If you test math_add(1,2) and it returns 3, you don't know if the code does `return 3` or `return x+y`.
It seems I might need to revise my view.
I vaguely remember the same advice, it's pretty old. How you use the randomness is test specific, for example in math_add() it'd be something like:
If it was math_multiply(), then adding the jitter would fail - that would have to be multiplied in.
Nowadays I think this would be done with fuzzing/constraint tests, where you define "this relation must hold true" in a more structured way so the framework can choose random values, test more at once, and give better failure messages.
> it's pretty old.
Damn, must be why only white hair is growing on my head now.
>Nowadays I think this would be done with fuzzing/constraint tests, where you define "this relation must hold true" in a more structured way so the framework can choose random values, test more at once, and give better failure messages.
So the concept of random is still there but expressed differently ? (= Am I partially right ?)
Yes, the randomness is still there but less manually specified by the developer. But also I haven't actually used it myself but had seen stuff on it before, so I had the wrong term: it's "property-based testing" you want to look for.
Here's an example with a python library: https://hypothesis.readthedocs.io/en/latest/tutorial/introdu...
The strategy "st.lists(st.integers())" generates a random list of integers that get passed into the test function.
And also this page says by default tests would be run (up to) 100 times: https://hypothesis.readthedocs.io/en/latest/tutorial/setting...
So I'm thinking... (not tested)
...which is of course a little silly, but math_add() is a bit of a silly function anyway.
Randomness is useful if you expect your code to do the correct thing with some probability. You test lots of different samples and if they fail more than you expect then you should review the code. You wouldn't test dynamic random samples of add(x, y) because you wouldn't expect it to always return 3, but in this case it wouldn't hurt.
This sounds like the idea behind mutation testing
humans are very good at overlooking edge cases, off by one errors etc.
so if you generate test data randomly you have a higher chance of "accidentally" running into overlooked edge cases
you could say there is a "adding more random -> cost" ladder like
- no randomness, no cost, nothing gained
- a bit of randomness, very small cost, very rarely beneficial (<- doable in unit tests)
- (limited) prop testing, high cost (test runs multiple times with many random values), decent chance to find incorrect edge cases (<- can be barely doable in unit tests, if limited enough, often feature gates as too expensive)
- (full) prop testing/fuzzing, very very high cost, very high chance incorrect edge cases are found IFF the domain isn't too large (<- a full test run might need days to complete)
I've learnt that if a test only fails sometimes, it can take a long time for somebody to actually investigate the cause,in the meantime it's written off as just another flaky test. If there really is a bug, it will probably surface sooner in production than it gets fixed.
sadly yes
people often take flaky test way less serious then they should
I had multiple bigger production issues which had been caught by tests >1 month before they happened in production, but where written off as flaky tests (ironically this was also not related to any random test data but more load/race condition related things which failed when too many tests which created full separate tenants for isolation happened to run at the same time).
And in some CI environments flaky test are too painful, so using "actual" random data isn't viable and a fixed seed has to be used on CI (that is if you can, because too much libs/tools/etc. do not allow that). At least for "merge approval" runs. That many CI systems suck badly the moment you project and team size isn't around the size of a toy project doesn't help either.
Flaky tests are a very strong signal of a bug, somewhere. Problem is it's not always easy to tell if the bug's in the test or in the code under test. The developer who would rather re-run the test to make it pass than investigate probably thinks it's the test which is buggy.
> it's written off as just another flaky test
So don't do that. That's bad practice. The test has failed for a reason and that needs to be handled.
Can't one get randomness and determinism at the same time? Randomly generate the data, but do so when building the test, not when running the test. This way something that fails will consistently fail, but you also have better chances of finding the missed edge cases that humans would overlook. Seeded randomness might also be great, as it is far cleaner to generate and expand/update/redo, but still deterministic when it comes time to debug an issue.
Most test frameworks I have seen that support non-determinism in some way print the random seed at the start of the run, and let you specify the seed when you run the tests yourself. It's a good practice for precisely the reasons you wrote.
There's another good reason that hasn't been detailed in the comments so far: expressing intent.
A test should communicate its reason for testing the subject, and when an input is generated or random, it clearly communicates that this test doesn't care about the specific _value_ of that input, it's focussed on something else.
This has other beneficial effects on test suites, especially as they change over the lifetime of their subjects:
* keeping test data isolated, avoiding coupling across tests * avoiding magic strings * and as mentioned in this thread, any "flakiness" is probably a signal of an edge-case that should be handled deterministically and * it's more fun [1]
[1] https://arxiv.org/pdf/2312.01680
Generate fuzz tests using random values with a fixed seed, sure, but using random values in tests that run on CI seems like a recipe for hard-to-reproduce flaky builds unless you have really good logging.
That introduces dependency of a clock which might be undesirable, just had a similar problem where i also went for hardcoding for that reason.
Arguably you should have a fixed start date for any given test, but time is quite hard to abstract out like that (there's enough time APIs you'd want OS support, but linux for example doesn't support clock namespaces for the realtime clock, only a few monotonic clocks)
There's already a clock dependency. The test fails because of that.
That can easily lead to breaking tests due to time-zones, daylight saving time or the variable length of months.
We experienced several of those over the years, and generally it was the test that was wrong, not the code it was testing.
For example, this simplified test hits several of those pitfalls:
I mean, sure, that can happen, but that obviously depends on what the test is testing, it's not like it's bad in all cases to say "now plus 1 year". In the case in question it's really just "cookie is far enough in the future so it hasn't expired", so "expire X years in the future from now" is fine.
Any time constant will be exceeded someday.
An impossibly short period of time after the heat death of the universe on a system that shouldn’t even exist: ERROR TIME_TEST FAILURE
Posted on HN in 2126: 100 years ago, someone wrote a test for servo that included an expiry in 2126
Now I feel bad for using (system foundation timestamp)+100 years as end of "forever" ownership relations in one of my systems. Looking now, it's only 89 years left. I think I should use nulls instead.
Well, it won't be your problem /j
https://factorio.com/blog/post/fff-388 - they wanted to use a 64 bit int for the tick count, but Lua doesn't have one; so they used the one available and worked out when it would lose precision.
"More than 2 million years seems to be enough for us to not be around any more when the bug reports start appearing."
I once saw a pop-up in a game saying something along the lines of: wow, it's 10 years later and this game is still being played! Made me laugh out loud, nice little easter egg.
Sadly, I don't recall which game it was. Maybe SpaceChem?
I've got some tests in active code bases that are using the end of 32-bit Unix time as "we'll never get there". That's not because the devs were lazy, these tests date from when that was the best they could possibly do. They're on track to be cycled out well before then (hopefully this year), so, hopefully, they'll be right that their code "won't get there"... but then there's the testing and code that assumes this that I don't know about that may still be a problem.
"End of Unix time" is under 12 years now, so, a bit longer than the time frame of this test, but we're coming up on it.
I seem to recall much smugness on Slashdot around the "idiot winblows users limited by DOS y2k" and how the time_t was "so much better". Even then a few were prophesying that it would come bite us eventually ...
Who here remembers the fud of Y2K?
Exciting times with an anticlimactic end; I was in middle school, relishing the chaos of the adult world.
I remember the reality of all the work needed to avoid issues.
While there was a lot of FUD in the media, there were also a lot of scenarios that were actually possible but were averted due to a LOT of work and attention ahead of time. It should be looked at, IMO, as a success of communication, warnings, and a lot of effort that nothing of major significance happened.
Yes, Y2K is a success story, similar to the alert and response related to ozone layer and CFCs.
Dissimilar to the global climate catastrophe, unfortunately.
---
The 2024 state of the climate report: Perilous times on planet Earth
https://academic.oup.com/bioscience/article/74/12/812/780859...
"Tragically, we are failing to avoid serious impacts"
"We have now brought the planet into climatic conditions never witnessed by us or our prehistoric relatives within our genus, Homo"
"Despite six IPCC reports, 28 COP meetings, hundreds of other reports, and tens of thousands of scientific papers, the world has made only very minor headway on climate change"
"projections paint a bleak picture of the future, with many scientists envisioning widespread famines, conflicts, mass migration, and increasing extreme weather that will surpass anything witnessed thus far, posing catastrophic consequences for both humanity and the biosphere"
I don't mean to lessen the impact of that statement. I think climate change is a serious problem. But also most of the geologic time that genus Homo has existed, Earth has been in an ice age. Much of which we'd consider a "snowball Earth". The last warm interglacial period, the Eemian, was 120,000 years ago.
this is the same style comment as "no offense, but <offensive thing>"
if you didnt intend to lessen the impact of that statement, why say something that is specifically meant to lessen the impact of the statement? just say what you want to say without the hedging.
What you just wrote is the same as: 'the entire lifecycle of humanity has no precursor to the conditions' we are about to face.
We aren't facing the ice age that has been the last 120,000 years.
I'm sure the rocky planet will survive just fine, maybe even some extreemophiles, even if we completely screw up the atmosphere. Not 6 billion humans though.
The genus Homo dates back nearly 2 million years.
Yes. And virtually all of that time has been colder than average: https://en.wikipedia.org/wiki/Geologic_temperature_record#/m...
Sometimes a great deal so. Sometimes less. But nearly always below average. For our whole existence.
That's why the choice of wording struck me.
You can zoom out a bit more and it just gets clearer: https://en.wikipedia.org/wiki/Geologic_temperature_record#/m...
Further out and we're still one of the coldest periods: https://en.wikipedia.org/wiki/Geologic_temperature_record#/m...
We're ice-age dwellers. Always have been.
I can both be alarmed at how quickly the ice age humanity has evolved within is ending, and find that a very funny way of phrasing it. These things don't conflict in me, though it seems triggering to some. People are downvoting me with moral conscience, but I'm just over here laughing at a funny conjunction of paleoclimate and word choice. :) People getting offended by it kinda makes it funnier.
That's an interesting bit of detail. As you intended, it does not lessen the impact of the statement: "conditions never witnessed by us or our prehistoric relatives". It confirms it, with some additional context.
To me, it seems to make it even more significant. Because as you point out, Homo evolved under ice age conditions over millions of years. Well, here we are about to be thrust into uncharted territory, in an extremely short period of time. With very fragile global interdependencies, an overpopulated planet, and billions of people exposed to the consequences.
Right? I would only caution that neither has the ice age been particularly kind to humanity. It seems at least a couple times to have almost gotten us all. There's a genetic bottleneck in genus Homo which seems to date back ~80k years, which aligns suspiciously with the Toba supervolcano eruption. And another around 850k years ago. During each there were likely fewer than 2,000 breeding humans.
Earth has certainly thrived with a warmer climate. No reason we can't too. The problems - for us and other life - stem from the rate of change. Which is easy to see is very very rapid compared to the historical cycles, but still a slow motion trainwreck compared to an asteroid strike, supervolcano, or gamma ray pulse, all of which it seems Earth has experienced. Life and human society will adapt if it has enough time. The quicker the catastrophe the more challenging that is.
I guess what I'm saying is that we're not doing ourselves any favors, but we also shouldn't underestimate mother nature's ability to throw us a curve ball in the 9th inning that makes everything worse. Life has endured an awful lot on this little rock.
Another victim of the preparedness paradox.
Don't mistake a defused bomb for a dud.
https://en.wikipedia.org/wiki/Preparedness_paradox
Thanks! I think about this concept a lot, and now I know there's a name for it. "Preparedness paradox". I'll have to remember that.
And to your point, Y2K is right there on the wiki page for it.
Made me think of Mark Fisher's Y2K Positive text:
> At the Great Midnight at the century's end, signifying culture will flip over into a number-based counterculture, retroprocessing the last 100 years. Whether global disaster ensues or not, Y2K is a singularity for cybernetic culture. It's time to get Y2K positive.
Mark Fisher (2004). Y2K Positive in Mute.
Tell us you weren't involved in Y2K iwithout telling us you weren't involved in Y2K.
As others have stated, the lack of visible effect is not the same thing as there never having been a land mine in the first place.
I can tell you anecdotally that on 12/31/2000 I was hanging with some friends. At 12PM UTC we turned on the footage from London. At first it appeared to be a fiery hellscape armageddon. while it turned out to just be fireworks with a wierd camera angle, there was a moment where we were concerned something was actually happening. Most of us in the room were technologists, and while we figured it'd all be no big deal, we weren't *sure* and it very much alarmed us to see it on the screen.
Yep - that's why I always choose my time constants to be during years when I will be retired, or possibly dead.
If you're going to kick the can down the road, why not kick it pretty far?
Most updates to avoid the 2038 problem really just delay it until 10889. Maybe in eight in a half millennia, they will have figured out something that lasts longer.
How is 10889 a problem? I thought the move to 64 bit added billions of years.
Depends on the unit and how you interpret the bits. Nanoseconds as a signed integer "only" make it about 300 years while seconds as a 64 bit IEEE float enjoy integral precision somewhere out past 250 million years (but if you need microsecond precision then it's the same number but as years instead of mega years).
It's for 48-bit timestamps.
This is why I always use the year 2525. Not my problem, assuming man is still alive.
Hmm. Interesting to call out someone like this. Stuff happens. We're all humans. For now. At least we were back then.
i had to plant a 10 year time bomb in our SAML SP certificate because AFAIK there is no other way to do it. It’s been 7 years since then. Dreading contacting all the IDPs and getting them to update the SAML config.
Classic!
But before you judge the fix too hashly, I bet it’s just a quick and easy fix that will suffice while a proper fix (to avoid depending on external state) is written.
I'll bet you one US Dollar that this is a scenario where the temporary fix becomes the permanent one. (Well, at least, permanent for a hundred years.)
Some day, Pham Nuwen is going to be bitching about this test suite between a pair of star systems.
That’s one of my favorite books :)
I agree that it’s plausible!
of course it is just an easy fix. it's the kind of solution that even someone like me could write who has no understanding of the code a all. (i am not trying to imply that the submitter of the PR doesn't understand the code, just that understanding it is unlikely to be necessary, thus the change bears no risk.
but, the solution now hides the problem. if i wanted to get someone to solve the problem i'd set the new date in the near future until someone gets annoyed enough to fix it for real.
and i have to ask, why is this a hardcoded date at all? why not "now plus one week"?
There’s a lot to be said for simplicity. The more logic you put into handling the dates correctly in the tests, the more likely you are to mess up the tests themselves. These tests were easy to write, easy to review, easy to verify, and served perfectly well for 10 years.
But doing it right shouldn’t be all that hard.
One of the comments:
> Us, ten years after generating the certificate: "Who could have possibly foreseen that a computer science department would still be here ten years later."
This was why there was a Y2K bug. Most of that code was written in the 80s, during the Reagan era. Nobody expected civilization to make it to the year 2000.
No, people thought that storing a year as two digits was fine because computers were advancing so fast that it was unlikely they'd still be used in the year 2000 - or if they were it was someone else's problem.
And they were mostly right! Not many 80s machines were still being used in 1999, but lots of software that had roots to then was being used. Data formats and such have a tendency to stick around.
Software has incredible inertia compared to hardware.
It is effectively trivial to buy millions of dollars of hardware to upgrade your stuff when compared with paying for existing software to be rewritten for a new platform.
This is a very SWE-centric perspective. The very names of software/hardwsre would imply the exact opposite.
Has the last industrial hardware you've seen updated to use protected memory like most controllers have been able to for a few decades?
Or better, its drivers run in what Windows version?
Funnily enough I worked at a company with a codebase written in the 1980s - no idea what it originally ran on but someone decided in the mid 2000s to update it to run on modern hardware. Unfortunately they chose Itanium... so 20 years later they're paying lots of money for Itanium hardware.
A comment from the PR
> Not a serious problem, but the weekdays are wrong. For example, 18-Apr-2127 is a Friday, not Sunday.
There is now many magical dates to remember - 2126 ( I think PR was updated after that comment) and 2177. There is also 2028 also somewhere.
I fixed one of these test cases too. Attached to it was a comment:
Alas, he was still working, albeit at another firm.
“Someone” please stop write Someone at every possible post, especially on X.
quite the nostalgic test to fix lol
what a nostalgic test to fix lol
[flagged]
It was started by people who thought Twitter didn't have enough censorship (back when it had a lot more).
I guess that's a matter of personal sensibilities, but it's pretty funny to me.
(Note: this is the only fact I know about it, happy to learn more.)
Any social space will break down upon reaching a critical point in representation of the general populace.
I have no idea about the development however.
Please don't complain about tangential annoyances—e.g. article or website formats, name collisions, or back-button breakage. They're too common to be interesting.
https://news.ycombinator.com/newsguidelines.html
Worked for me.