Thank you for the thoughtful reply. Maybe bins is the wrong word to use, so I'll try with intervals. Starting with 1 bit data, there are two numbers and one interval. I think where bins makes it confusing is that inside the interval there are two big rounding errors mapping everything to either 0 or 1 and many people seem to be considering those the bins.
Taking a step back, remember we're ultimately mapping these discrete numbers to some real world continuous variable like the saturation of red, frequency, mass on a scale, whatever. And our digital device can only represent a finite amount of numbers. For 2 bit data, we can represent 0-3, and for 3 bit data we can represent 0-7.
The important part is that 0 represents the minimum and 1,3, and 7 all represent the same maximum real value, and everything that can be measured by the device will fall within those ranges. So comparing 1, 2 and 3 bit data on a linear number line looks like this:
0 1
0 1 2 3
0 1 2 3 4 5 6 7
You could assume that everything gets assigned to whatever number is nearest in the number scale or come up with another scheme, but that is ultimately defined by the ADC and likely nonlinear. All we know is that those are the numbers we have available to represent the real values we're measuring.
The question is about how to normalize the data. 1 bit data is already normalized. If you normalize 2 bit data by 3 you get [0, 1/3, 2/3, 1]. LGTM. If you normalize it by 4, you get [0, 1/4, 2/4, 3/4] and you're effectively throwing away some of the range of the ADC. You can try to get it back by offsetting by 0.5 then normalizing but now you get [1/8, 3/8, 5/8, 7/8]. And you could stretch that with some clever formula to fill from 0 to 1, but if you do it right then it's the equivalent to normalizing by 3, so why not normalize by 3?
So the answer is, if you have N bit data, you normalize by 2^N-1.
>Maybe bins is the wrong word to use, so I'll try with intervals
Same thing, in both how you use it and how the author does.
>Taking a step back, remember we're ultimately mapping these discrete numbers to some real world continuous variable
I think this is where you have a misconception.
There are two maps.
The important one goes the other way: FROM a continuous variable TO a finite set.
It's not 1-to-1: it maps entire ranges of numbers (intervals, bins, whatever) to discrete values (samples, integers, whatever).
The bins are preimages of that map.
The discussion in the article comes from two ways of defining that map: FROM continuous signal TO discrete variable.
The map that goes the other way, from the integers into floats, has to be CONSISTENT with it.
The article presents this backwards, putting the cart before the horse. This causes confusion.
>All we know is that those are the numbers we have available to represent the real values we're measuring.
Each of those numbers doesn't represent any one value; it represents a range.
Think about it this way: if we have a continuous signal that we're discretizing into a finite number of bits, we're invariably smashing ranges into single values (what you call "rounding error").
When we're reading this data — say, we read number 5 — we don't know which continuous variable value it came from.
To display it on a screen, we make a choice; we pick some number from the interval it came from, and call it a day.
>The important part is that 0 represents the minimum and 1,3, and 7 all represent the same maximum real value
The important part is that this is a choice you make about what those point samples represent.
It's a convenient choice. Which is why we all use it.
Some people prefer a different choice, that's all.
> If you normalize it by 4, you get [0, 1/4, 2/4, 3/4]
That's one way to do it, and not the way the article uses (re-read my previous comment, it has both).
Still, I'm with you here.
>and you're effectively throwing away some of the range of the ADC.
The map you're describing (FROM discrete INTO continuous) is approximating the DAC.
So, yes, with this scheme you're never getting 0.0 and 1.0.
Think of it this way. Say, you convert an image to a 1-bit representation, and render it on a screen in grayscale.
One choice is to render 0 as 0.0 and 1 as 1.0 (black and white).
Another is to render 0 as 0.25 and 1 as .75 (dark grey and light grey).
That's the "alternative" (divide by 2^n) approach. The formula here is x→ (x + 0.5)/2^n.
Neither is inherently wrong or better than the other; especially when you ask which rendering is closer to the original image.
Plus: one man's "you're not using the entire range of DAC" is another's "you leave a tiny bit of headroom".
In any case, you're not losing data in either [ discrete → continuous → discrete ] chain because you get the discrete values back perfectly.
What you divide by in the first step is dictated by what you do in the second.
>If you normalize 2 bit data by 3 you get [0, 1/3, 2/3, 1].
Let's see what this says about how we should go in the other direction to be consistent with this scheme.
Which continuous values get sent to 0 and 3? Which get sent to 1 and 2?
You wrote : {0, 1, 2, 3} → [0, 1/3, 2/3, 1]
So you can see that going in the other direction (discretizing):
Some people don't like that 0 and 3 get smaller ranges than the rest.
>So the answer is, if you have N bit data, you normalize by 2^N-1.
The answer is: it doesn't matter in practice, so use what's simpler in your context.
That's going to be dividing by 2^N - 1 for pretty much everyone.
* Correction: fixing typo