points by rleigh 5 years ago

Look how many C library functions implicitly call strlen() internally.

Also, look at functions like sscanf(). It can be orders of magnitude slower than fscanf() because every invocation calls strlen while fscanf incrementally reads from the current file pointer. I don't know why sscanf doesn't also work incrementally but the implementations I've tested don't do that.

The main point here being: if strings had a size_t size plus data, that would change an O(n) scan to an O(1) length lookup, and that would have huge performance gains throughout the C library, not to mention your own code as well.

jstimpfle 5 years ago

I checked musl libc and the way the implementation implements sscanf is by calling vsscanf with a custom FILE stream. And that file stream is implemented using __string_read(), which does indeed call strnlen().

I figure it would be possible for musl to implement that stream using a function that scans for NUL and copies at the same time, but maybe that's not an improvement in the end.

It would be much simpler of course if sscanf() would take the length of the input string as an additional argument. But actually, I don't really care.

Because, does this even matter? Using sscanf() is far from ideal anyway. The stdio functions are not what you use if you're going for performance. Their conversions are probably not the fastest (being quite featureful), and they are even locale dependent which is a huge mess!

Heck, when we're going for performance to a degree where a strlen() matters (bear in mind that we have to read the input at least once anyway, so the waste is definitely bounded) we should certainly not be parsing text at all. That is much more wasteful in comparison.

Much if not most of libc is there to provide you a portable base to (comparatively) quickly get your project up and running, and to simply to keep old software going, but it's certainly not to help you achieve performance.

http://git.musl-libc.org/cgit/musl/commit/?id=18efeb320b763e...