It's an improvement because the smaller scale allows higher densities and more storage per chip.
Yes, that's the usual explanation but I don't think it makes much sense here since the ostensibly "better" memory can produce visible errors that the older generation didn't. I see the word "tradeoff" being used often in situations like this but I don't agree that this is, since at some point on the reliability scale it just stops being memory completely and devolves into some weird approximation of it.
From the paper, the strong implication is that a user running unprivileged code on any modern computer can corrupt memory outside of their process
Indeed, that's the big message I get: a tiny and innocuous-looking piece of code can easily corrupt memory. I'm not someone who believes in conspiracy theories much, but this looks like an amazingly good backdoor or constituent of one to me. If memory controllers implement workarounds such as the one described in the paper to reduce these types of errors, they also naturally will have options to turn them off for testing/debugging purposes, etc. For the great majority of the time if they are turned off nothing unusual will be noticeable, but then the system becomes vulnerable to the specific access patterns that trigger the fault. Since the documentation on the latest memory controllers is largely kept secret, a firmware update that silently changes this setting wouldn't raise much concern - memory initialisation code usually uses lots of undocumented registers and values anyway. Then all it takes is a tiny piece of user-level code (possibly obfuscated/concealed in some other mundane application), maybe with some cooperation/knowledge of how the OS's VM mapping works, to enable relatively precise corruption of certain addresses in memory. Although largely (publicly) undocumented, it wouldn't be so difficult to reverse-engineer the row<>address mappings either. The results could range from DoS to bypassing access controls, depending on what gets targeted.
The subtle nature of this approach is what makes it all the more scarier; the access patterns that trigger it aren't so unusual, and it's just reading from memory. I doubt it can be easily triggered (never say never...) from compiled languages like JS but virtualised environments appear vulnerable (unless the hypervisor constantly moves the pages around, incurring a significant performance penalty).
although the problem is real, the paper is a bit alarmist about the prevalence
The paper assumes exactly knowledge of the row<>address mappings and hammered the DRAM with that, whereas MemTest's implementation might not know the exact mapping used by a particular controller+configuration. Their estimate is 5-20% (a huge range), under "less optimal" hammering, which is still cause for concern.