©2020 Sound Speeds and Allen Williams
Before we get started let me give you a disclaimer... this will get technical but my goal isn't to lose you, it's to help clarify so if you're a professional and/or familiar with 32bit floating point, you'll notice that I take a little creative license in my explanations. It'll pay off in the end. Don't worry everyone, I'll take it slow and remove any distractions including myself...
FADE TO BLACK
Let me start by asking you a question... using two single digit numbers, I want you to create the largest number possible. I'll give you a few seconds to think about it and feel free to pause this video if you need more time... Got it? Ok, write it in the comments below BEFORE continuing. Again, pause if you need to... Alright. Let's look at a few possibilites. If you understood the question then your two numbers would be 9 and 9. Let's try adding them... 18? Not very big so let's try multiplying them... 81. That's lower than it would be if you put the two 9s together and made 99. So... you might be wondering what my number is. Ok, I'll share it - 387,420,489. Did I win? Or do think I cheated? Ok, what if I wrote it this way then? 9 to the 9th.
The big number 9 you probably understand but the small 9 is what's called an exponent or the number of times the big number is multiplied by itself. In this case my number is the result of multiplying 9 times 9 times 9 times 9 times 9 times 9 times 9 times 9 times 9. Remember this and we'll come back in a moment.
Let's talk for a moment about signal to noise ratio and dynamic range as it relates to sound. Signal to noise ratio is the amount of signal compared to the amount of noise. In a recording studio the noise level is very low so a recording will have a high signal to noise ratio. This number in microphone specifications isn't written as a ratio like you may expect, it's actually written as a number of decibels equating to the number 94 minus the self noise of the microphone so if your microphone has 10dB self noise, then you have a 84dB signal to noise ratio. On the other hand if your record in a loud room with, let's say, a 70dB noise floor, you don't even need to consider the self noise of the mic because the noise level is much higher drowning of the self noise. A lot of noise in your signal anywhere equivilates to a lower signal to noise ratio. Dynamic range is similar but instead of subtracting the self noise from 94, you subtract it from the maximum SPL or the maximum level the microphone can take in without distorting. If a microphone has a maximum SPL of 130dB and a self noise of 10dB, the dynamic range would be 120dB. For more info on this, watch my video on calculating missing microphone specifications.
Link: https://youtu.be/niLqYks5Zuc
Now let's talk about the way sound is recorded. Analog audio, from a microphone for example, has to be converted from analog to digital so a recorder or computer can understand it. The sound waves (or sinewaves) represent a certain frequency and amplitude. Frequency is how low or high pitched a sound is and amplitude is the volume. The wider the sound waves are, the lower the frequency will be. Humans are usually said to be able to hear between 20-20,000 Hz or 20-20,000 vibrations per second. At 20 degrees Celsius (68 degrees Fahrenheit) sound travels at 343 meters, 1125 ft, per second. At that temperature, one 20 Hz sinewave is 17.15 meters or 56 1/2 feet long. As the number of vibrations per second increases, the waves get closer together. You can easily calculate the length of a certain frequency's sinewave by dividing the distance sound can travel per second by the frequency in Hz. One 20,000 Hz sine wave would be 1/1000 as long as a 20Hz wave measuring at 17.15 millimeters or about 2/3 inch long. This is important to consider because of how the computer uses zeros and ones to represent digital sound but before we go there, let me explain how we visually measure it.
Volume is measured in decibles or db for short. An average speaking volume 1 meter from your ear may be 76dB and that same voice right next to your ear would be 96 dB. A scream directly into your ear 1 inch away though could be 135dB. Sound waves grow taller as the volume increases and we measure this volume with an SPL Meter or Sound Pressure Level meter when we want to put it into terms a human can measure and understand. Computers, on the other hand, can't hear so saying a gunshot is 155dB doesn't mean anything to a computer even though in human terms, we know that would be deafening. To put things in a way a computer will understand it, decibels become voltage and to keep things easy... a 1000 Hz tone produces 1 volt of current at 94 decibels. That, by the way, is where the 94 came from in our signal to noise ratio calculation above. It was determined that 1 volt of voltage should represent maximum amplitude in digital recording so if the signal is at it's loudest, that would fill up the measuring scale. This is why digital audio metering is referred to as dBFS or Decibels relative to full scale. It should also be noted that 0dBFS represents this maximum level and amplitudes below 0dBFS are represented as negative numbers. As we discussed in my video "Adding Decibels Made Simple" (https://youtu.be/stT8iOq3LSE), every time a volume is cut in half, it decreases by 6dB. Because decibels are logarithmic, when we set our digital audio recording level to peak at -12dB, that's only 25% of the loudest volume we can record without the audio distorting.
Computers use binary to represent information. Think of binary like a series of switches that can represent large numbers all the way down to the smallest info a computer can comprehend which is one bit. One switch can be flipped off (represented by a 0) or on (represented by a 1) yielding 2 possible options. A 2 bit binary number would have 4 possible outcomes. Each bit added to the binary number doubles the number of possibilites. Now lets tie this into digital sound recording. A sine wave needs to be recorded as accurately as possible and the number of bits plays a huge role in how accurate they are recorded. The number of bits available to represent the amplitude of a wave is the bit depth. The greater number of bits, the more potential values the computer has to map out a sine wave. This process of assigning values to the wave is called quantization. If the wave falls between two values, the computer has to round up or down depending on which point it's closest to. This rounding results in inaccuracies so the higher the bit depth, the greater the accuracy in the mapping process. That inaccuracy, by the way, is called quantization error.
To reduce the amount of error, you must increase the amount of potential values or increase bit depth. Since half the sine wave is above this center line and half is below, the values have to be split above and below the line. Bit depth is only half the story though. The other half is the amount of times per second the recording device maps these sine waves. If you want to record audio all the way up to 20,000Hz, the computer needs to quantize no fewer than 40,000 times per second because, remember, half of the wave is above the line and half is below and each requires it's own sample. The number of times per second the computer quantizes a value is the sample rate. If you want any more info on sample rates, watch my video on this topic. (https://youtu.be/iW-iq_G5sV0) Before we move on, it should be noted that each bit in the bit depth can represent about 6.021 decibels so the highest signal to noise ratio in a 16 bit recording is 96.33dB and in 24 bit is 144.49dB which, I might add, exceeds the maximum dynamic range of any single analog digital converter at the time of this video. In real world terms, each value mapped corresponds to the voltage of the analog signal's amplitude. Don't forget, the maximum voltage represented by binary numbers is 1 volt which equals 0dBFS. There's no way to represent values over 0dBFS on the digital audio meter so when the binary number maxs out, any volume that would exceed 0dB is represented with the highest binary value. On the meter, it would appear as a wave but with a flattened off top. This is where the term clipping comes from - because it looks like the top of the wave is clipped off with scissors.
The way I just explained bit depth and sample rate is correct in fixed point audio recording but in floating point, it's different. While a fixed point binary number is just a long binary word representing one value, a floating point binary word can be broken down into 3 parts; the 1 bit sign, the 8 bit exponent and the 23 bit mantissa. Let's look at those in reverse... The mantissa is the base binary number that the exponent number will be multipled by and the sign is whether or not the binary number is positive or negative. Let's look at an example.
Formula (for reference) - (-1)^S 1.M * 2^(E-127)
-1 times the sign value will tell you if the rest of the number is positive or negative.
Next you write 1 point the value of your 23 bit integer. This is actually a 24 bit value though. The 24th bit is hidden and to referred to as the phantom bit. If you're wondering where it is... remember the value is 1 point the 23 bit value and that 1 point is the value of the phantom bit. Since we know it's 1, it doesn't need to be represented. It's important to note that the value range can only be between 1 and 9 and this decimal number is sometimes referred to as the significand.
Finally you take the value of the exponent, subtract from it 127 and then multiply 2 by 2 that number of times the same way as I did in my 9 to the 9th example at the beginning of this video. Note, 2 to the 0 power is 1.
Now, you should be able to calculate a 32 bit floating point value by hand. Theoretically, the dynamic range of 32 bit floating point is 1528dB which represents -758 to 770dB. This sounds incredible and it is in many ways but there are trade-offs. Because of the estimations and rounding of 32 bit floating numbers, they aren't as accurate as fixed point bit words. If you're wondering how accurate 32 bit floating point is, it's exactly the same as 24 bit over the same range. This makes sense because if the sign is positive, the exponent is 2 to the 21 power and the mantissa is a 24 bit integer, the values are the same. But what about outside of the 24 bit range? The simple answer is - the farther away from the 24 bit range the numbers get, the less accurate the numbers can become. This is because the mantissa is being multiplied by numbers with decimal points. Quantized values start facing small quantization errors that grow bigger as the exponent value increases. But what about negative exponent values making fractions? Wouldn't that make the precision even higher between certain ranges? No because if the quantized values are fractions, ADC's will still quantize them into whole numbers but see the decimal as error. This is because ADCs at the time of this video are fixed point and don't recognize fractions. It's true that these numbers are very small and represent a small fraction of a decibel but the bigger the numbers get, the less accurate they become between one value and the next. In 16 and 24 bit fixed point words, the values are even and predictable but in 32 bit float, they fluctuate and the gaps between one value and the next vary and are unpredictable. This sometimes creates audible anamolies like a slightly higher noise floor but you most likely wouldn't be able to hear it. Look at it this way... If 32 bit were truly accurate, we wouldn't also have 64 bit floating point which produces even more accuracy and much larger values.
One more note worth mentioning here. Many times digital sound waves are visualized as stair steps but this isn't really accurate. Quantization of analog sound waves plots a digital value as a point. People connect those points as stairs steps along a grid and that gives us a sense of it's digital nature but in reality it's just a tiny point that represents a value. If you were to do a lollipop graph instead, you can perhaps see the sound waves better. You may look at this and think that with random points, you could connect the dots anyway you want but that where the Nyquist Shannon sampling theorum comes to play. In simple terms, it establishes certain conditions a sample rate must follow when plotting a sequence of samples. This means that the plot points can only be quantized one way regardless of frequency and that allows you to convert the digital audio back to analog without much of a loss in quality. Sometimes distortions or artifacts occur during this process and this can result in one sample resembling another. This called aliasing. Luckily this isn't something often heard but is something to be aware of nontheless.
If you're familiar with digital audio then you've been waiting for me to mention analog digital converters or ADCs for short. An ADC quantizes an analog audio wave into a digital representation of that wave. I go into it and and even manually demonstrate the process in my video "How Audio Compression Lowers Bit Depth" (https://youtu.be/C6zlkwzYbqQ). One common misconception is that ADCs will convert any audio level to digital flawlessly. This isn't the case though. The best ADCs on the market as of the time of this video only have the ability to quantize about 130 decibels of dynamic range. Even then, it doesn't come for free because ADCs also have a noise floor because of how it deals with the analog audio. This is important to note because in order for 32 bit float recording to have extended dynamic range, multiple ADcs have to be used. One is used similar to how one is used for fixed point recording and a second is used for higher audio levels. Manufacturers have different ways they use to combine the quantized audio from both ADCs but the big takeaway is that because two different ADCs are used, it may give you a slightly higher noise floor but you won't likely notice it. Not only does each ADC bring it's only self noise to the quantized audio but since they are set at different audio levels, one higher than the other, the noise floor may be even higher. This isn't as big of a deal as it may sound like because the combination algorithm is performed with such things in mind and there may even be a DSP or digital signal processing in play to help this process while reducing excess noise. The Sound Devices MixPre II series for example, extends dynamic range upto 142dB. That extra 12dB over a regular 130dB ADC give it the ability to record sounds 4X higher because every 6dB represent a doubling of SPL.
In reality, 32 bit floating pointing point can only represent a dynamic range between -130.8dB (the lowest achievable 150 ohm EIN as of the time of this video - Watch: "EIN Made Easy" https://youtu.be/iutQGzn4Uf4) and 194dB undistorted or 210dB distorted. That's in air but in other mediums like water, it would be even higher. As a side note this is because sound waves are vibration and the highest crest of a sound wave represents high pressure while the trough of the sound wave represents low pressure. At 194dB the positive pressure in air is so high that the trough of the wave would have exceed a vacuum to not distort and you can't exceed the vacuum of space so higher that 194dB and the sound wave distorts. In water the rules are different because the trough of a 194dB wave isn't a vacuum.
All of this considered, what's the big take away? If it's dead quiet, a 16 bit bit depth you can probably hear the noise floor of the recording. A 24 bit bit depth, you're not likely to but what about a 32 bit bit depth? You're still limited by analog digital converters and while you can extend your dynamic range by using multiples, do you need that? Are you just speaking into a microphone or are you planning to go back and forth between shooting a shotgun 1 foot away from a microphone and whispering 5 feet away from the same microphone? Most professionals, including those recording sound for movies and TV shows, record at 24 bit bit depth because it's plenty. It doesn't take much to set a level correctly and with digital noise floors as low as they are, you can boost up low levels in 24 bit just as well as you can 32 bit. But what about higher volume levels? That is where 32 bit floating point may come in handy but this is also why you have limiters. There's no limiter in 32 bit recording because you're not going to get loud enough to ever need it but in 24 bit audio, you'll potentially hit those limiters if you're recording insanely loud sounds or don't have your levels set correctly ("Recording Gunfire" https://youtu.be/pP-W6ichb0w. You may be tempted to record in 32 bit floating point anytime you anticipate recording loud sounds but first, look at your microphone. If you're going wireless, your sound will be compressed anyway so it won't likely be neccessary to record in 32 bit floating point. If hardlined, what's the maximum SPL your microphone can hit before it distorts and clips? Many condenser mics can't handle much over 130dB before clipping so you might not want to record in 32 bit floating point when your microphone could clip before your analog digital converter. Are there applications where you may need 32 bit floating point? Sure there are. If you're recording sound and don't have access to your recorder, for example. Maybe you're recording sounds in nature during a thunderstorm and have your gain way up. You might record in 32 bit floating point just in case you get an incredibly loud thunderclap amongst the falling rain on leaves. In this example, or any similar scenario when you'd have your gain up more than normal and need to protect yourself against unexpected loud sounds, then 32 bit floating point could be a saving grace. For everyday recording like podcasts, interviews and YouTube videos, save yourself the extra large file size and just record in 24 bit.
STUDIO:
There you go, 32 bit floating point 101. Ideal for extreme and unexpected differences in dynamic range but totally unnecessary for most recording so record in 24 bit bit depth with confidence but you might keep your limiter on just to be safe. Thank you for...
If you found this helpful, please consider making a donation at: https://www.soundspeeds.us/contribute