Monday 25 May 2015

The 747 Bus

A lot is made of the differences between IIR and FIR filters for high-end audio applications.  FIR stands for Finite Impulse Response, and IIR stands for Infinite Impulse Response.  It is perhaps not surprising, therefore, that in discussing the characteristics of various filters, the the one thing you tend to read more often than any other is that IIR filters have this or that type of impulse response, whereas FIR filters have such and such an impulse response, as though the impulse response itself were the purpose of the filter.  Nothing could be further from the truth.

Although an impulse response has a waveform-like aspect, and is derived directly from the frequency and phase responses of the filter, there is an unfortunate affectation which is common in the world of high-end audio to characterize the audible characteristics of a filter in terms of the features of its impulse response.  It is a bit like saying, with the smug certainty of one stating the self-evidently obvious, that the number of a bus tells you where the bus is going.  Where I live, there is a bus route that goes to the airport, which has the (to me, at any rate) faintly amusing route number of ‘747’ (the famous Boeing 747 is the eponymous Jumbo Jet).  When I see the 747 bus I know it is going to the airport, but it would be wrong to deduce that the bus has an upper deck (which it doesn’t, as it happens), or that it holds more passengers than the average bus (which it also doesn’t).  Neither would it be wise to assume in other cities around the world, that the way to get to the airport is to take the 747 bus.

All digital filters, whether IIR or FIR, work by manipulating the available data.  Broadly speaking, the available data comprises two sets of numbers.  One set is the actual music data that gets fed into the filter.  The other is the set of numbers comprising the previous outputs of the filter.  In practical terms, the primary distinction between FIR and IIR filters is that FIR filters are confined to using only the actual music data as its inputs, whereas the IIR filter can use both.

The impulse response of an FIR filter is nothing more than a graphical representation of the so-called ‘taps’ of the filter.  Each ‘tap’ represents one of the actual input music data values, so the more ‘taps’ the filter has, the more of the input music data goes into the calculation of each output value.  The more complex the performance requirements of an FIR filter, the more ‘taps’ are needed to define it, and the more detail will be found in its impulse response.  With an IIR filter, however, its calculation uses previous output values as well as previous input values, and of course each previous output value will have been calculated from the input and output values before that.  As a result, if you were to go through all the math, you would find that an IIR filter can be re-written in a form that uses ALL of the actual music input values, and NONE of the previous output values to calculate each new output value.  For this reason, it is mathematically identical to an FIR filter with an infinite number of taps.  Hence the term “Infinite” Impulse Response.  But the IIR filter can achieve this with a surprisingly compact filter structure, one that uses relatively few of the previous input and output values, and obviously requires very many fewer calculations in order to arrive at the exact same result.

The biggest practical and worthwhile difference between FIR and IIR filters actually lies in the tools that are used to design them.  To learn more you need to read my previous posts on “Pole Dancing” where I discuss the basics of filter design.  This involves nothing more than placing a bunch of “Poles” and “Zeros” on a graph called a Z-space.  Once you have the poles and the zeros positioned, the digital filter itself is pretty much defined.  The problem is that the relationship between the performance of the filter and the location and number of poles and zeros is not a two-way street.  If I have my poles and zeros, I can easily calculate the frequency, phase, and impulse responses of my filter.  But if I start by defining the responses that I want from my filter, it is not possible to make the opposite calculation, and derive the requisite poles and zeros.  In other words, when it comes to digital filter design, the operative phrase is usually “you can’t get there from here”.

The way we get around this bottleneck is the same way we get around every other tricky mathematical problem.  We reduce the complexity of the problem by simplifying it, and only considering a strict subset of the total spectrum of possibilities.  I’m not going to get into the technical details, but the outcome of that approach is that we end up with certain design methods that can be used to design certain classes of filters.  In other words, design tool for FIR filters produce slightly different results than design tools for IIR filters.  The extent to which FIR and IIR filters differ in any audible sense is more down to whether design tools available in IIR or FIR space do a better job of realizing the desired output response.

Aside from the challenges faced in designing the filter, there are significant distinctions between IIR and FIR filters when it comes to actually implementing the filter.  In general IIR filters are very efficient.  Rarely do they have more than 10-20 ‘taps’ (a term not actually used when referring to IIR filters).  Therefore they tend to be inherently more efficient when run on computer architectures.  IIR filters can be very compact, and are usually designed to run in sequential blocks, with the output of one block forming the input to the next.  On the other hand, FIR filters lend themselves really well to being realized in massively parallel architectures.  For each FIR filter tap we need to multiply two numbers together, and when that’s done add all the answers together.  None of these multiplications rely on the outcomes of any of the other multiplications, so they can all be done in parallel.

In a computer you usually don’t have many parallel processing paths available, but an FPGA can be easily programmed to do hundreds of operations in parallel.  An FIR filter with hundreds upon hundreds of taps can therefore be realized very efficiently indeed in an FPGA.  Additionally, FIR filters are stable when implemented using Integer arithmetic, something that will rapidly trip up an IIR filter.  The ability to use integer arithmetic is something else that can be quite significant in an FPGA and less so in a computer.  ‘Efficiency’ is critically important in the majority of audio applications which have to operate in ‘real time’, and will fail if they run more slowly than the music which they are processing!

For all of those reasons, and also because the design tools available for IIR filters are generally a better match to the performance characteristics we are usually looking for, here at BitPerfect our preference is inevitably for an IIR filter.  And the filter we choose is one whose performance characteristics are the closest to what we need, regardless of what the impulse response might look like.  The filter we will offer is always the one which sounds best to us, rather than the one whose impulse response meets some perceived aesthetic ideal.

Friday 22 May 2015

Dmitri Shostakovich - Symphony No 7 ’Leningrad’

I first heard the seventh symphony by Russian composer Dmitri Shostakovich on television back in the 1970’s.  It must have been a televised Prom Concert, but I cannot really be sure.  I had learned to like the earlier fifth symphony, and was particularly intrigued by that work’s political undertones.  Shostakovich had previously been denounced by Stalin for writing music that was vulgar and incompatible with the artistic principles of the revolution (whatever those might have been).  This was a serious concern for him, because many of his friends and colleagues had been arrested and even shot following similar accusations.  Despite all this, Dmitri Shostakovich was made of considerably sterner (and smarter) stuff.  He published his fifth symphony with the sobriquet “A Soviet artist's creative response to just criticism”, and the attendant work did indeed appear to address the criticisms previous applied by the state.

However, even to an unsophisticated teenager’s ear, Shostakovich’s fifth symphony is a piece of unrelentingly biting satire, fairly bubbling with barely-suppressed sarcasm and cynicism.  Nourished as I was by the likes of Monty Python’s Flying Circus, and George Orwell’s Animal Farm, it was wonderful to imagine the pompous ranks of the Soviet establishment at once gushing over the magnificent rehabilitation of a true soviet artist, at the same time as the piece they were praising so fulsomely was savagely mocking them and everything they stood for.

After the fifth symphony, the seventh symphony is probably the composer’s best known work.  Here the narrative was that it was written to commemorate the failed German siege of Leningrad.  In the first movement, the symphony depicts the slow and irresistible advance of the German army, followed by its onslaught and extended siege over the city of Leningrad.  The siege was truly terrible.  The city was cut off for three long and terribly cold winters, and something approaching a million civilian citizens died of cold and starvation.  The symphony goes on to reflect the triumph of the Russian people over the repelled and beaten Germans.  This also was another wonderfully compelling story, and when I first heard the symphony on the TV (incredibly, I could not locate an LP at a price a penurious student was willing to pay) I was impressed by its depth, breadth, and scope.  The opening ‘battle’ movement was riveting, and the closing triumph very satisfying.  This was another symphony I was intending to enjoy getting to know.

Except that it didn’t turn out that way.  Over the years I bought quite a few recordings, hoping to recapture what I thought I felt watching that original TV broadcast.  But it never happened.  There were two main problems.  First, I came to realize that much of the thematic material was unsatisfying.  The main ‘battle theme’, repeated (in a manner that evokes Ravel’s Bolero) over a series of twelve consecutive variations of steadily accumulating intensity, seemed to become ever more trite, naive, or at even schoolboyish, with every repeated listening.  There was nothing particularly Germanic about it.  It sounded less and less a great work, and more and more an immature work.

The second problem was perhaps a more grave one.  Upon closer inspection, the music did not appear to do an effective job of developing the narrative ascribed to it.  No matter how many times I listened to the first movement, I could not get it to invoke the savagery of a German panzer onslaught.  After all, the Germans did not creep up on you.  They fell upon you like a category 6 hurricane.  The battle was intense from the opening salvo, and went on at the same level for a long, long time.  And rather than gradually building in intensity, the opposite happened.  The intensity diminished as mother nature bestowed an equal lack of mercy on both sides.  At the end of the symphony, all of a sudden everything erupts in triumph.  Where does that come from?  And what do we make of the two long, slow, laborious movements that separate the battle from the triumph?  The symphony and its narrative did not stack up, and with that huge disconnect it was difficult to come to terms with its apparent musical deficiencies.

I am not a musicologist, and thus it was only recently, with a big assist from The Internet, that I was able to finally come up with the resolution to these problematic issues and turn the symphony into a major work that I could fully come to terms with.  The symphony was written over a very short period of time in late 1940 and early 1941, during the early months of the siege and a long time before its full reality was to emerge.  And even so, it was the composer’s habit to allow his musical ideas to percolate in his imagination for a long time - sometime years - before he allowed himself to commit them to paper.  The symphony’s narrative therefore, in all likelihood, formed in his head long before Hitler’s armies even left Berlin.

So what was the real narrative then?  For the answer, it seems we have to go back to the fifth symphony, and the ease with which Shostakovich was able to skewer the pomposity and hubris of the communist establishment.  This symphony, then, is Shostakovich’s commentary on the great communist experiment itself.  The ‘battle’ movement is not at all about the German army approaching Leningrad.  It is about the insidious manner in which the communist movement took over Russian society, inserting itself into the fabric of the nation bit by bit, until it was too late to be able to resist.  The triumph it leads to is ugly and quite unmistakable.  We now see clearly that the trite, naive, and schoolboyish nature of the themes do not reflect an inept compositional talent, but rather a masterful composer with a sure grasp of parody, sarcasm, and skewering wit.  There is a lot hidden behind these apparently unsubtle themes.  They can be traced to folk songs and melodies, and even popular tunes of the period, which Shostakovich has twisted and adapted to signify the Communist Party’s positioning of the revolution as a popular (literally, ‘of the people’) movement.

The symphony opens with a majestic theme, presented in a manner to suggest the grandiosity and formality of Imperialist Russian society, and leads into a pastoral interlude before the gentle but insistent pianissimo tapping of a snare drum unsuspectingly ushers in the communists.  From these apparently harmless beginnings, the communist takeover proceeds insidiously, yet relentlessly.  Four long movements later the symphony closes with the return of the majestic theme from the opening, this time sounding more like a movie soundtrack with the hero riding off into the sunset with his gal at his side.  The intervening period, following the ‘triumph’ of the communist takeover (more exultant than triumphant), is a lengthy period of bleak music.  It tries from time to time to rouse itself, but never seems to offer anything of ambition before descending back into the routine.  It seems to reflect the unrelieved hopelessness of communist Russian society.  Seen in this light, the concluding triumph does not appear to represent the triumph of communism itself.  Rather it appears to be a hopeful imagination of a rosy future, one much like its Imperial past, but with a smiley face.  And nothing in the long lead-up seems to suggest that this will be the natural outcome of the great communist experiment.  In fact, the final climax itself is in many ways a parody of a rosy future.  Clearly this is not the communist party’s grand deception at play - we’ve already heard what that sounds like in the fifth symphony - no, in my view this is Shostakovich’s gentle parody on the great hopes of the ordinary Russian people.  Hopes for a wonderful, happy future, a rose-tinted version of the best parts of their Imperial past.  But it’s not real.  They can’t have the real thing - only the movie version.

I haven’t until now found a recording that conveys this new view of the great seventh symphony.  Interestingly, I just last week received my Society of Sound free download, which was Gergiev’s recent recording of the 7th with the Mariinsky Orchestra (his second with this orchestra in little over a decade).  Gergiev has never convinced me as a conductor, although his reputation is absolutely stellar, and his latest seventh seems to be another one cut from the traditional “German Siege” template.  Ho-hum, I thought.

But then along comes Mariss Jansons and the Concertgebouw Orchestra of Amsterdam, playing live on SACD.  Jansons in my view is every bit the conductor that Gergiev’s reputation describes.  He could go down in history as one of the all-time greats.  And this recording of the ‘Leningrad’ is by quite some margin the best I have ever heard.  Finally, we have a deeply convincing portrayal of the modern interpretation.  Of course, if you prefer the traditional interpretation, then maybe this one isn’t going to cut it for you - but seriously, does anyone really buy that any more?  Apart from maybe Gergiev.  And, for an added bonus, the recording itself here is an absolute stunner.  Sure, the Concertgebouw hall’s ponderous bass does come across a little, but even so it is very well tamed and doesn’t detract in the slightest.  Whoever recorded this should have won a Grammy (and maybe he did - I actually wouldn’t know).

Thursday 21 May 2015

How does Sample Rate Conversion work?

I wrote post this mainly to address the following question: If I have a choice of sample rates available, which one should I choose?  I get asked this often, and the answer, like all things pertaining to digital audio, is both simple and complicated depending on how deeply you want to look into it.  So here is a quick primer on the technical issues that underpin SRC.  I have not attempted to sugar coat the technical aspect, so feel free to go away and read something else if you are easily intimidated :)

First of all, what, exactly is Sample Rate Conversion?  Well, digital audio works by encoding a waveform using a set of numbers.  Each number represents the magnitude of the waveform at a particular instant in time, so in principle, each time we measure (or ‘sample’) the waveform we need to store two numbers.  One number is the magnitude of the waveform itself and the other number is the exact point in time at which the number was measured.  That’s a lot of numbers, but we can cut them in half if we can eliminate having to store all the timing numbers.  Suppose we measure the waveform using a very specific regular timing pattern determined in advance?  If we can do that, then we don’t have to store the timing information because we can simply use a very accurate clock to regenerate it during playback.  This is how all digital audio is managed for consumer markets.

The “Sample Rate” is the rate at which we sample (or measure) the waveform.  Provided we know exactly what the sample rate is, we can relatively easily reconstruct the original waveform using those stored numbers.  The chosen sample rate imposes some very specific restrictions on the waveforms that we can encode in this manner.  Most particularly we must observe the Shannon-Nyquist criterion.  This states that the signal being sampled must contain no frequencies above one half of the sample rate.  If any such frequencies are present in the signal, they must be filtered out very strictly before being sampled.  Also, it is one of the simpler tenets of audio that human hearing is restricted to the frequency range below 20kHz.  Based on those two things, we can derive a commonly-quoted requirement that in order to achieve high quality, digital audio must therefore have a sample rate of at least 40kHz.  For that reason, the standard which has been chosen for CD audio, and widely adopted for digital audio in general, is 44.1kHz.  Interestingly, for DVD Audio, a slightly different sample rate of 48kHz was adopted.  These numbers have important consequences.

Of course, the above is not the whole story, and there are various good reasons why you might want to consider sampling your audio signal at sample rates significantly higher than 44.1kHz.  As a result, audio recordings exist at all sorts of different sample rates, and for distribution or playback compatibility purposes you may well have a good reason to want to convert existing audio data from one sample rate to another.

If you convert from a lower sample rate to a higher one, the process is called up-conversion.  In the opposite case, conversion from a higher to a lower sample rate is called down-conversion.  The alternative terminology of up-sampling and down-sampling can be interchangeably used.  I tend to use both, according only to whim.

We’ll start with a simple case.  Let’s say I have some music sampled at 44.1kHz and I want to convert it to a sample rate of 88.2kHz (which is a factor of exactly 2x the original sample rate).  This is a very simple case, because the 88.2kHz data stream comprises all of the 44.1kHz samples with one additional sample inserted exactly half way between each of the original 44.1kHz samples.  The process of inserting those additional samples is called interpolation.  In effect, what I have to do is (i) figure out what the original analog waveform was, and then (ii) sample it at points in time located at the mid-points between each of the existing samples.  Are you with me so far?

Obviously, the key point here is to recreate the original waveform, and I have already said that “we can relatively easily reconstruct the original waveform using the stored numbers”.  However, like a lot of digital audio, once you start to look closely at it you find that what is easy from a mathematical perspective, is often mightily tedious from a practical one.  For example, Claude Shannon (he of the Shannon-Nyquist sampling theorem) proved that the mathematics of a perfect recreation of the analog signal involves ‘simply’ the convolution of the sampled data with a continuous Sinc() function.  However, if you were to set about performing such a convolution, and evaluating the result at the interpolation points, you would find that it involves a truly massive amount of computation, and is not something you would want to do on any sort of routine basis.  Nonetheless, convolution with a Sinc() function does indeed give you a mathematically precise answer, and interpolations performed in this manner would in principle be as accurate as it is possible to make them.

So if a convolution is not practical, how else can we recreate the original analog signal?  The answer is that we can follow the process that happens inside a DAC (at least inside a theoretical DAC) and do something similar to recreate the original waveform in the digital domain.  Inside a DAC we pass the digital waveform through what is called a brick-wall filter, which is something that lets us block all of the frequencies above one-half of the sample rate while letting through as much as it can of all the frequencies below one-half of the sample rate.

This is the type of interpolation filter which is most commonly used.  What we do is make a sensible guess for what the interpolated value ought to be, and pass the result through a digital brick-wall filter to filter out any errors we may have introduced via our guesswork.  If we have made a good guess, then the filter will indeed filter out all of the errors.  But if our guess is not so good, then the errors can contain components which fold down into our signal band and can degrade the signal.  This filtering method has the disadvantage (if you want to think of it that way) of introducing phase errors into the signal, and has the effect that if you look closely at the resulting data stream you will see that most of the original 44.1kHz samples will have been modified by the filter.  There is some debate as to whether such phase errors are audible, and here at BitPerfect we believe that they actually may be.  So your choice of filter may indeed have an impact upon the resulting sound quality of the conversion.

Up-conversion in this manner is usually performed by a specialized filter which in effect combines the job of making the good guess and doing the filtering.

When up-converting by factors which are not nice numbers (for example when converting from 44.1kHz to 48kHz, a factor of 1.088x) the same process applies.  However, it is further complicated by the fact that now you cannot rely on a significant fraction of the original samples being reusable as samples in the output.  For example, if converting from 44.1kHz to 88.2kHz, every second sample in the output stream is derived from an interpolated value.  The interpolated values, which contain the errors, alternate with original 44.1kHz sample values which, by definition, contain no errors.  It can be seen, therefore, that the resultant error signal will be dominated by higher frequencies that were not present in the original music signal and can therefore be easily eliminated with a filter.  I hope that is clear.

On the other hand, if I am converting from 44.1kHz to 48kHz, then only 1 in every 160 samples of the 48kHz output stream will correspond directly to original samples from the 44.1kHz data stream (you’ll have to take my word for that).  In other words, 159 out of every 160 samples in the output stream will start off life as an interpolated value.  The quality of this conversion is going to be very dependent on the accuracy of those initial interpolation guesses.  Again, the process of making a best guess and doing the filtering is typically combined into a specialized filter, but the principle of operation remains the same.

Down-conversion is very similar, but with an additional wrinkle.  Lets start with a very simple down-conversion from 88.2kHz to 44.1kHz.  It ought to be quite straightforward - just throw away every second sample, no?  No!  Here is the problem:  With a 44.1kHz sample rate you cannot encode any frequencies above 22.05kHz (i.e. one-half of the 44.1kHz sample rate).  On the other hand, if you have a music file sampled at 88.2kHz you must assume that it has encoded frequencies all the way up to 44.1kHz.  So before you can start throwing samples away you have to first put it through a brick-wall filter to remove everything above 22.05kHz.  Once you’ve done that then, yes, it is just a question of throwing away every second sample (a process often referred to as decimation).

This additional wrinkle makes the process of down-sampling by non-integer factors rather more complicated.  In fact, there are two specific complications.  First, how in the name of heck to you decimate by a non-integer fraction?  Secondly, because you’re now interpolating a signal which may contain frequencies that would be eliminated by the brick-wall filter, you need to do the interpolation first, before you do the brick-wall filtering, and then the decimation last of all (I’m sorry if that’s not immediately obvious - you’ll just have to stop and think it through).  Therefore, to get around these two issues, the process of down-sampling by a non-integer factor will usually involve (i) interpolative up-sampling to an integer multiple of the target sample rate; (ii) applying the brick-wall filter (which would not be the same filter that you would use if you were just up-sampling for its own sake); and finally (iii) performing decimation.  That is quite a lot to swallow, but I couldn’t see an easy way to simplify it without making it way too long (and I think this post is quite long enough as it is).

I hope you have followed enough of what I just wrote to at least enable you to understand why I always recommend sample rate conversions between members of the same “family” of sample rates.  One family includes 44.1kHz, 88.2kHz, 176.4kHz, 352.8kHz, DSD64, DSD128, etc.  The other includes 48kHz, 96kHz, 192kHz and 384kHz.  If you feel the need to up- or down-sample (for any number of good reasons), try to stay within the same family.  In other words, convert from 44.1kHz to 88.2kHz rather than 96kHz.  But in any case, SRC does involve a substantial manipulation of the signal, and the principle that generally guides me is that if you can avoid it you are usually better off without it.

And when you buy digital downloads, if 88.2kHz or 176.4kHz are available as format options, choose them in a heartbeat over 96kHz and 192kHz.

Wednesday 20 May 2015

More Green Socks

OK, so Fred, as you recall, discovered that wearing green socks resulted in an improvement in his car’s fuel economy, only to be put down and mocked by friends and colleagues alike who for the most part were not willing to even try it for themselves.  Even though Fred, a rational fellow, was unable to come up with any explanation for how the colour of his socks might impact his car’s fuel economy, he was perfectly happy to benefit from the result.

He decided that surely the shade of green must have an effect.  After all, hues of green can vary from bluish-green through to yellowish-greens.  Then there were light and dark shades, and patterned socks with a greater or lesser proportion of green threads.  He set about attempting to quantify this conundrum, and after a while came up with a clear conclusion.  The best results were obtained when the socks were all the same plain shade of green with as little patterning as possible.  Also, the best shade of green was that shade associated with St. Patrick’s day, Pantone PMS-347 (although, interestingly enough, while researching this, he discovered that the official colour of St. Patrick, patron saint of all Ireland, is in fact blue).

Fred was still happy to share his discoveries with anybody who was sufficiently interested, even though the inevitable response was something along the lines of “You’re a looney, mate”.  And then one day, much to his surprise, he spoke with a casual acquaintance Long Xiao who not only nodded sagely at his weird tale, but actually said “Yes, that makes sense”.  Xiao, it seems, was an adherent of the philosophy of Dress Feng Shui, a variant of the ancient Chinese metaphysical art.  It seems that wearing footwear in this particular shade of green was associated with being able to travel freely without having to endure weariness or fatigue.

Fred became an adherent of Dress Feng Shui, studied with Xiao, and eventually became an acknowledged master of the art in his own right.  He regularly travels to China to lecture on the subject.  Fred soon learned that although green socks were great for driving, they were not great for a lot of other things.  Fred began to change clothes several times a day, according to what it was he wanted to do next.  It worked well for him.  Fred, of course is not his real name, and today, Fred is CEO of a Fortune 500 consulting company.  The inside of his private jet is decorated in a colour scheme that is said to leave his guests dumbstruck.  He closes a lot of deals over both the Atlantic and Pacific, and even once over the North Pole.  And, rumour has it, he has not been seen in a gas station for several years.

Of course Fred’s story has been invented to illustrate the apparently unbusinesslike situation we find ourselves in here at BitPerfect.  We know that if we write audio playback software in a certain way, we can improve the resulting perceived sound quality.  At the same time, we know of no measurements that can be applied that demonstrate any effect BitPerfect has on any parameter known to impact sound quality.  We are agreed that, when all the peripheral arguments are distilled to their essence, its audible impact boils down to whatever its users report it to be.  At the same time, the vast majority of our users are evidently disposed to agree with us that the audible impact is both beneficial and at the very least worth $10.  One statistic we monitor very closely is the uptake rate of our free updates.  We assume that, for the most part, only serious BitPerfect users are downloading those updates.  Currently, each of our updates is being taken up by something over 60,000 users worldwide.  It is not a bulletproof metric, but it surely means we are doing something right.

If the likes of BMW and Audi can deliver well-thought-out, impeccably conceived and designed, and flawlessly engineered products, and still have customers say “No!  It’s wrong!” then it just goes to show that a bulletproof engineering basis is not by itself the answer to the young maid’s prayer (if I may borrow an expression from my youth).

While it is true that we don’t have that bulletproof basis for precisely how our software impacts sound quality, like Fred and his green socks we at least understand pretty well which shade of green works best.  We understand where in our code we need to optimize the audio engine for sound quality, and how to set about doing those optimizations.  And now we are starting to learn some of our own “Dress Feng Shui” to maybe take us to the next level.  This guides us with some new principles that we can apply to the way we code our audio engine which normally, as a programmer, might offer you no particular benefit.  However, experience is telling us a different story.

For the last couple of weeks Tim and I have been testing out prototypes of our new Fourth-Generation Audio Engine.  It is early days yet, but it would be no exaggeration to say that we are profoundly shaken by the magnitude of improvements that we appear to have unleashed.  I would go so far as to say that the difference between our Gen III (the current version used in BitPerfect 2.0.2) and our new Gen IV audio engines is greater than the difference between BitPerfect 2.0.2 and plain old iTunes.  I had no idea that such untapped reserves remained to be unearthed.

It will be a little while before Gen IV makes it to the App Store.  It needs to be optimized and further refined, and then shaken down by our team of Beta Testers.  But it should emerge sometime this summer.

BitPerfect is only a small company, and no matter how successful it proves to be, it is not going to earn us enough money to buy a private jet like Fred’s.  But we might just be able to treat ourselves to a couple of pairs of green socks, maybe with an appropriately propitious matching wardrobe.

Tuesday 19 May 2015

Green Socks

Imagine Fred.  He received a pair of green socks as a present from his wife, to reflect their shared interest in the environment and green technology.  He was very pleased with them, and immediately decided to wear them for his forthcoming business trip, which would involve an 800km drive.  Normally, his car would achieve a comfortable 600km on a full tank of gas, but on this occasion he was surprised to find that he made the entire trip without having to stop to fill up.  On the return trip, with his green socks now in a laundry bag in the trunk, he was chagrined to have to fill up at the 600km mark as per normal.  Reporting this odd situation to his wife, she immediately concluded that the green socks must have been responsible.  After all, what else could it be?

The two of them laughed at the idea, and it became a running joke between them.  But the next time he repeated the trip he wore the green socks again, and to his amazement once again completed the 800km outbound trip on a single tank of gas.  This time, however, he washed the socks in his hotel room and wore them again for the return trip.  And sure enough, he made the return trip without the need to fill up.

Fred was nonplussed, but not wishing to look a gift horse in the mouth he bought himself a complete drawer full of green socks.  After a short while it became clear to him that by wearing green socks he could effect up to 25% improvement in his gas mileage.  However, when describing the incredible results to his friends and colleagues, he was surprised to find that they laughed him out of the room.  They were full of reasons why it could not work.  He was retiring his lead foot every time he put green socks on.  He was only wearing the green socks when driving routes that were inherently less demanding on gas.  Winter had since given way to summer, and fuel economy naturally picked up in the better weather.  There was only his word for it that the claimed improvements ever existed in the first place.  It was all in his head.  It was a placebo effect.  Where was his double-blind test data?  They had all their arguments neatly lined up, and for the most part none of them were even willing to try it out for themselves.  A few, though, did dip their toes in the water, and as a rule reported tangible improvements in their own fuel consumption figures.

I’m sure you see where I’m going with this, but the fact remains that this sort of thing does indeed happen in the tightly-spec’d, observed, tested, and regulated world of consumer automobiles.  Better fuel consumption is ALWAYS at (or near) the top of most car buyers’ needs and wants list.  Even so, if a car manufacturer’s engineering department suddenly starts promoting green socks, you can be sure that their marketing department is ready and waiting to ensure that they won’t see the light of day.  The thing is, consumers are generally dumb, and oftentimes even if you offer them what they say they want, if you don’t package it correctly they will reject it - sometimes quite irrationally.

I can think immediately of two example, and they both come from Germany, where a solid engineering mindset is more deeply ingrained than in many other manufacturing cultures.  One imagines that even German heads of marketing all have engineering degrees.

First up, BMW in the 1980s.  This was when the diesel engine craze was starting to sweep Europe.  Diesels were popular because they delivered massive gains in fuel economy, which is what European consumers with their hyper-pricey gasoline were demanding.  But BMW’s engineers correctly pointed out that theoretically, diesel fuel offers barely more than a potential 5% gain in fuel efficiency when compared to gasoline.  Instead, diesel engines gain their impressive economy through the fact that they are fundamentally red-lined at not much more than 4,200rpm, and friction losses are way lower at lower revs.  BMW figured out that if they took their 2.8L petrol engine, optimized it for peak performance at low revs and limited it to 4,500 rpm, they would be able to replicate the expected performance of the 2-litre diesels that their competitors in Europe were touting.  Thus was born the BMW 528e.

BMW’s first mistake was in calling it the 528e rather than the 520e.  The last two digits in Bimmer-speak would announce the engine size, and the engine was actually 2.8 litres, so that’s what the German mindset mandated for its nomenclature.  However, their customers now expected 2.8 litre performance, even though BMW tried to make clear that it was actually offering 2-litre diesel fuel economy.  Their second mistake was in calling it 528e rather than 520d.  I have no idea what ‘e’ (for ‘electronic’, apparently) was intended to convey, and I guess their customers didn’t either.  But if ‘d’ makes customers think ‘Diesel’ then that should have been a good thing.  In 2015 BMW’s marketing department are a lot less pedantic about having their vehicle nomenclatures reflect the actual internal specifications.

The BMW 528e was actually a technological tour-de-force, but its commercial failure proved instead to be an engineering embarrassment.  Not long after, BMW relented and offered proper smoke-belching diesels that sounded like a London Taxi when idling.  They had a ‘d’ appended to their model numbers, and customers snapped them up in droves.

Second up is Audi’s initial foray into CVTs (Continuously Variable Transmissions).  A car’s gearbox is a compromise.  Whether manual or automatic, almost inevitably 99% of the time it will be in the wrong gear, and either the ideal gear will not be available, or changing to one of the available gears cannot be done quickly and accurately enough to apply the required correction.  The problem can be ameliorated to some extent by adding more and more gear ratios to the gearbox.  Manual transmissions which used to come with four forward gears now have six gears as often as not.  And automatics with up to 8 speeds can be found.  Even so, the ideal situation remains in theory a CVT.  With such a transmission, the engine can always be in the theoretically perfect gear, regardless of the conditions.  The thing is, though, that the electronic brainpower needed to figure out the ideal gear ratio several times a second requires a modern computer-controlled vehicle management system.  Thus it was that in the early 2000’s, Audi’s “Multitronic”, with its highly sophisticated electronic management system was one of the first high-performance CVTs offered to the market.

Multitronic was brilliant.  Finally, Audi drivers for the first time found themselves perpetually in the correct gear.  But the result was a disaster.  What happened was that, although the car was indeed always in the exact, optimally correct gear, the gear that it chose was not the one that the drivers felt it should have been in.  Like I said at the start, consumers are dumb.  They are used to the sound of a car see-sawing its way through a sequence of fixed gear ratios.  But that’s not what they heard.  What they thought they were hearing was an automatic gearbox with a blown torque converter, with the engine speed spooling up and down unexpectedly.  New Multitronic Audis were driven in droves back to their dealerships by owners complaining of faulty transmissions.  They refused to be mollified by assurances that not only were these vehicles working perfectly, but that they were actually working better and more efficiently that anything they had previously driven.  Unfortunately for Audi, a dumb customer (and specifically a dumb Audi customer) will not accept that he is dumb.

Audi’s engineers were forced to re-map the electronic management of their Multitronics so that they would mimic the behaviour of a conventional automatic transmission with a limited number of fixed gear ratios, a task they performed, one imagines, wearing paper bags over their heads.  The result was something that threw away all of the advantages of the CVT, but which dumb customers at least were not bringing back to their dealerships.  Finally, in 2014, Audi discontinued the Multitronic, and disavowed CVTs in general.  In the meantime, many other manufacturers are now selling CVT-equipped cars with transmission mapping systems that behave correctly - just like those early “failed” Multitronics.  My daughter has such a car, and she for one has no pre-conceived notions of how an automatic transmission “should” sound.

Tomorrow, I’ll return once more to Fred and his green socks.  And computer audio playback.

Thursday 7 May 2015

Announcing DSD Master v1.1


After a successful first year we are finally announcing our first update to DSD Master.  This update is mainly a spin out of minor bug fixes, but it also includes a major under-the-hood revision of our DSP engine in preparation for some significant enhancements of capability.

The new DSP engine delivers a significant increase in processing speed, yet with a dramatically reduced physical and virtual memory footprint.  Conversions are even more accurate, and we now use what we term "True Analog" normalization, which ensures that our PCM conversions do not encode for implied "inter-sample peaks" which can be a problem in certain circumstances.  Improved queueing logic allows large conversion batches to be managed more efficiently.

We have eliminated a stupid bug that requires new installations of DSD Master to make a PCM conversion before it can create Hybrid-DSD files, and another bug which leaks files handles and prevents large conversion batches to be run.

Finally, there are some minor cosmetic changes, including the ability to select files for conversion via a "File | Open" dialog on the menu bar.