“Thanks” to the pandemic, I’ve spent much more time figuring out how to deal with video and live streaming than I ever thought I would. It’s turned out to be a surprisingly rewarding experience. In no particular order, here are some things I’ve learned along the way.
29.97 vs 30
For historical reasons (i.e. for reasons that probably could have been better dealt with at the time but were instead swept under the rug, having to do with the switch from B&W TV to color), for these purely historical reasons, in the parts of the world that use the NTSC standard, including North America, footage is recorded and played back (on TV at least) not at 30 frames per second (fps) but at an incredibly arcane 29.97 fps. This would be okay if that were always the case.
It’s not. For example, let’s say I filmed a concert at 29.97 fps but then create a Final Cut Pro project at 30 fps, because, believe it or not, lots of online footage is actually at 30 fps and not at 29.97 fps — perhaps because computer monitors actually refresh at 30 or 60 Hz, not at 29.97 or 59.94. I import my footage and place it in the timeline. Now, unbeknownst to me, Final Cut will actually play the whole thing imperceptibly faster than it happened in real life, including the audio, because the playback frame rate is now a smidgeon faster than the recorded frame rate.
If that were it — if we exported our project and uploaded now — all would be okay, no-one would ever notice, because the discrepancy is so small. But, say I also recorded the concert with external audio gear. I import the sound file and place it underneath the previously imported video. I try to sync it up with the existing audio. What I find now is that if I sync it in one spot, it will be out of sync in every other spot. It’s unusable. Why? Because Final Cut is actually playing this newly imported sound file back at the same speed as it was recorded, whereas it’s playing the video footage back slightly faster. This is an easily solvable problem — just make sure you start your FCP project at the correct frame rate. But it’s an example of the headaches this historical artifact can cause. I’ve done this by accident twice, and it took me a little while both times to figure out what was going on.
Much worse is what happened when I was making my Natural Machines album. I filmed it at 29.97 fps, only because my cameras can’t record at true 30 fps (note that they label this as 30 fps, not 29.97! You’re just supposed to know, it seems). While I improvised, programs I wrote on my computer answered me in real time, both by playing the Disklavier piano themselves, and by generating a visual representation of the music. I recorded the visual output by plugging an HDMI recorder into the output of my computer. But, lo and behold, when I tried to sync the footage of my live performance with the live visuals, they kept drifting out of sync. Lots of headaches later (I didn’t know anything about the NTSC / PAL wars at the time), I realized that my computer was outputting 30 fps footage over HDMI and not 29.97. Which makes sense, because again, computer monitors don’t refresh at 29.97 fps. They refresh at 30 fps or 60 fps (or higher) because, obviously, since alternating current oscillates at 60 Hz in North America, it would be INSANE for anything to oscillate at anything other than multiples of this number, right?
Now, you might think that I could just run my 30 fps footage through a converter to resample it to 29.97 fps. This is, in fact, possible, but it’s very slow and, more importantly, it looks terrible. And for good reason. With audio files, you’re dealing with tens of thousands of data points a second. Typical audio is recorded at 44.1, 48, 88.2 or 96 thousand data points per second. With that much density of data, it’s relatively straightforward to resample a file from 88.2 kHz, say, to 48 kHz (although there are subtleties to this, too, involving aliasing etc, but they won’t bite you nearly as hard as the video issues I’m talking about), because the data points can essentially be imagined to describe a continuous line, and you can choose how often you cut that line up into data points without affecting the continuous line too much.
But video happens incredibly slowly — over one thousand times more slowly, typically. Instead of 44,100 events per second, we’re dealing with 30. So there’s nothing continuous about it. Whereas the change in audio level from one data point to another at a normal sampling rate is minute, at 30 fps the change in the image from one frame to another can be huge. It simply can’t (without advanced A.I., that is, and that’s
coming here) be imagined to be a continuous flow that you could choose to sample at whatever rate you wanted. Instead, resampling from 30 fps to 29.97 essentially involves removing frames here and there, or if you’re being sophisticated, blending two of them together now and then. The resulting jerks in the image are immediately discernible, and to me, really annoying.
The solution to this problem was to figure out (this was not easy and involved external software) how to get my computer to output 29.97 fps via HDMI rather than 30 fps. And just imagine! If the switch from B&W to color TV had been dealt with cleanly in North America when it happened, this absurd problem wouldn’t even exist in the first place!
Most video — and most computer graphics, too — is encoded with 8 bits per channel. What does this mean? Each pixel in a frame of video has a brightness value (its luminance) described by a number. The question is, in how much detail do you describe it? If you have 8 bits per channel, that means you have 256 different possible values for it. If you have 10 bits per channel, which is becoming common now in pro and even less-pro video, you have 1024 possible values. In theory, at least.
Because, as I’ve recently learned, for another seemingly insane historical reason (fight me on this), there is a convention in video that the bottom and top values not be used. So, instead of your values ranging from 0 to 255, with 0 as black and 255 as white, they’re supposed to range from only 16 to 235. What happens to the rest of those values? Anything below 16, as I understand it, is considered black, and anything above 235 is considered white. I can’t begin to imagine why anyone would want to throw away 14% of their dynamic range, but that’s that.
How has this affected me? Well, I film with Panasonic LUMIX GH4 cameras, which I love because they’re very cheap used (~$400) and make a beautiful 4K image. I bought them when I was making my Natural Machines album. The LUMIX gives you the option of recording your luminance values either as 0 – 255, 16 – 235, or 16 – 255. Being a reasonable man, I opted for 0 – 255, of course. Why would I throw away those extra bits?
Now, since I’ve been live streaming, I’ve noticed that the image I see on my computer, after it comes out of my cameras as HDMI, enters my Blackmagic ATEM Mini, and exits as USB, is noticeably higher in contrast than the image I see in my cameras’ viewfinders. This puzzled me until last night, when I dove down this rabbit hole and discovered that the ATEM Mini expects its video to be delivered as 16-235, which, I now understand, is the common video standard. So it’s natural that it was cutting off the bottom and top luminances of my image, making anything somber completely black, and anything light completely white. Again, why does this problem even exist in the first place? To an outsider, it’s patently absurd.
Lighting is everything
One joyous thing which I simply had no real appreciation of before, even though I’d heard it talked about, is that lighting is everything. Wow does lighting completely determine how your image looks! I know that sounds like a platitude, but it was very impressive to experience it first hand. I feel lucky that I was forced to deal with it: early in the pandemic, the Karajan Institute hired me to produce a 30-minute long video of myself performing Natural Machines, for Austrian and German television. They partnered with Deutsche Telekom and a TV company to ensure the footage met professional TV standards. They asked me for some test footage before I filmed my set.
I got a call from a nice man named Daniel who told me “Dan, the sound you’re getting is top notch, the quality of the image is really nice too, we love your playing and the music, but… your lighting needs to be improved.” We talked about it for some time and he ultimately pointed me to a YouTube video which in turn pointed me to another, which I recommend to everyone: How To Make A $300 Camera Look Pro!
Follow these instructions and it will completely change how your video looks. For my first five or so livestreams after upgrading my lighting, I didn’t even have a real diffuser for my lights. I used a storage box that had a white, transparent bottom, and a vegetable bag. It still looked good.
Streaming is really hard on computers. On Macs, at least. I stream with the free software OBS, and apparently, Apple has chosen not to participate in the project (read, I think: has chosen not to give the project money), and as a result, OBS for Mac doesn’t make efficient use of the hardware. So what happens? Macs get very, very hot when streaming, even new and powerful ones. At the beginning of the pandemic, this was okay because the weather was mild. But as summer rolled in, I started having strange episodes in my streams where my image would start to slide way behind my audio. Seconds behind. It took me a while to figure out that as my laptop got hot, the system throttled down the processes using the most energy to keep the processor from overheating, and as a result OBS could no longer encode the video fast enough. At one point, on a hunch inspired by the jet-engine-like sound my Mac’s fans were making, I ran to the fridge, grabbed a couple ice packs, and put them under the computer. Zap! The image snapped back into sync with the audio in about 5 seconds. I later learned that cooling a computer with ice is a very bad idea, because ice creates condensation, causing the air your computer cools itself with to be full of moisture, ultimately causing your computer’s components to corrode, rust, go bust, fall to ruin. The ultimate solution, which has worked well for me, was to buy a $35 cooling pad — essentially two big, quiet fans powered by USB — and set it under my laptop. I haven’t had a problem since, and these external fans are much quieter than the internal ones in my computer.
Logic is bad for live mixing
I’ve been happily using Apple’s Logic Pro for years to record, mix, and master. I’ve even made extended midi experiments with it. I’ve recorded all my albums myself since 2011, so I’ve put together some good gear over the years and have become fairly experienced with audio.
So, naturally, getting a good sound out of my piano was the easy part of figuring out how to livestream. I brought my mics (two DPA d:vote CORE 4099P mics which conveniently fit under the closed lid of my piano, keeping out street noise) into Logic via my Apogee Ensemble (the same one I used to record my last four albums) and some nice preamps, record-armed the tracks there, applied some plugins to mix the sound, and routed the mixed sound to OBS via an excellent little piece of software called Loopback Audio. This all worked great, except that sometimes, for completely inexplicable reasons, one of my mics would develop an echo. A very pronounced one, too, a with a delay of a couple seconds. A total dealbreaker. It tended to happen in my vocal mic, too, which made it even worse, because it meant people couldn’t understand what I was saying. Thank goodness for the feedback given by Facebook comments! People would let me know right away when it happened. I realized I could fix it during the stream by resetting the sample rate of the project, but even if I did that, it could reoccur again at any moment and was a constant source of worry, something I didn’t need more of.
So, what did I do? I switched from Logic to Reaper for my live mixing. Reaper is very cheap ($65, and you can just use it for free if you want), extremely streamlined (it weighs about 35 MB to Logic’s many GB), and has worked perfectly for me. I haven’t had one bad moment with it. It’s what they’d call robust. I’d recommend it for this purpose to anyone.
It’s really possible to play music over the internet, and it helps to have nice neighbors
One of the best things that’s happened to me during the pandemic is the discovery that playing with other musicians over the internet is, contrary to popular perception, totally possible, and in fact super fun! And I’m not talking about the kind of playing where you accept the huge (~1/2 sec) latency of Zoom or Skype or Facetime and make the best of it. I’m talking about ultra-low-latency, real playing together, in rhythm, as if you were in a recording studio together.
I won’t recount the details of my journey with the open-source software Jacktrip here, because NPR did an extraordinary job of telling the story and explaining the challenges involved, all narrated by the great Christian McBride:
One thing not in the video: after that initial test with Jorge Roeder, I quickly discovered that in order to do Jacktrip properly while livestreaming — i.e. doing a live performance over the internet and simultaneously broadcasting it to an audience — one needs two computers, and two independent internet connections. Fortunately, when I bought my new laptop last year, I kept my previous one, from 2015. I almost sold it and then reconsidered, on a hunch that it would be essential in the future. I realized that in order to get the very lowest latency possible on Jacktrip, it needs to be the only thing using your internet connection at that time. You certainly don’t want it to be in competition with, say, a program (like OBS) that’s constantly uploading large amounts of data to YouTube for your livestream. Jacktrip’s ability to send packets of information in the fastest possible amount of time is key to making these remote musical collaborations work.
So, what did I do? I borrowed my neighbor Carrie’s internet. She kindly agreed to let me string an ethernet cable from her apartment to mine, which has now been snaking along the side of our common hallway for months. And Jacktrip has worked like a charm. It’s taken time to get the most out of it, though. The trouble is that getting the lowest possible latency (which is important in order to play together in rhythm) is at odds with getting the best audio quality. I won’t go into the details, but if you’re going to start playing the packets of sound information you’ve received as soon as possible, there will probably be some, because of the vagaries of the internet, that won’t have arrived in time. And this causes glitches in the audio, very noticeable ones.
So, for a while, I was constantly trying to find the best compromise between latency and audio quality. Remember, I was livestreaming these remote collaborations, sometimes for paying customers (I’m now on my sixth ticketed livestream concert), so glitchy audio wouldn’t do!
I had an epiphany while setting up for a Jacktrip session with trombonist Ryan Keberle: I could have my cake and eat it too. I could set up not one, but two simultaneous connections with my duet partner. One would have the lowest latency possible, and have the massive audio glitches to show for it. I would play along with that one, sometimes having to reconstruct a note in my ear through the noise. The other would have high latency, and perfect audio, and that’s what I would send to the livestream audience, after delaying my own audio to match it.
This felt like finding the holy grail. And I realized that having the extra Jacktrip connection was in fact superfluous. What was really needed was to be able to extract two different latency feeds from one single connection — there was no need to receive the same data twice. I wrote to Chris Chafe, who co-wrote Jacktrip, explained the idea, and he said that he had thought of that years ago but had never implemented it. He called it “slipstream”. The idea is now in active development among the community of programmers (especially one in particular, Anton Runov, a brilliant coder from St Petersburg) that maintains and develops Jacktrip.
Live is sacred
One thing I never saw coming before I started doing livestreams is how authentically live a performance can feel over the internet. It was clear to me from the beginning that I needed to make it obvious — I would even say prove — to my audience that what they were hearing and seeing was live. One reason I felt this is that I knew that some presenters were asking musicians to pre-record performances of themselves, then presenting it to their audiences as if it were live. This is fraudulent, of course, but it’s also understandable at a time when doing things genuinely live comes with the risk of it all falling apart. Many livestreams never even start, many more disappear halfway through, others have no sound (surprisingly common), and yet others have sound that peaks the entire time. I’ve made every single one of these mistakes myself, and I’m more tech-oriented than many musicians.
So I felt that if I was to go to the trouble of actually doing it live, all of it, including the real-time Natural Machines graphics — which I eventually figured out how to superimpose over the main camera feed with transparency —, but also my live remote musical partners on Jacktrip, or my friend Kristin Berardi controlling my piano all the way from Australia, then I had better prove to my audience without a shadow of a doubt that it was truly live. And, of course, there’s a simple way of doing that, which is to interact strongly with the comments.
So, in my livestreams, I always take requests and questions. I’ll fix technical issues that people mention in the comments — for example, that my speaking mic is off, or that my video is no longer synched with my audio, problems that thankfully don’t happen often anymore — on the fly if I can. In many ways, experiencing unforeseen problems and solving them in the moment is the most direct way of recreating the feeling of live performance.
Because that’s what it’s all about, right? The feeling of live. The feeling of danger, of the possibility of things going wrong, and conversely, of them going gloriously right. It’s a sacred thing, this feeling, because it’s ultimately about reminding us all to dwell in the moment.
What I think has most surprised me about livestreaming, as someone who has lived and breathed live performance for most of his life, is how authentically live it feels to me as the performer. I feel the presence of the audience watching me through the ether. I feel their gaze, their judgement, and I swear I can feel their joy when I’m particularly in the zone and the music I’m making is really speaking from the heart. It sounds crazy, but the livestream experience has felt even more intimate to me at times than it has to play in a jazz club or a concert hall.
During one stream, I mentioned in between improvisations that I was reading Douglas Hofstadter’s book I Am a Strange Loop, because he had written me after the New York Times covered my #bachUpsideDown project, and because his classic Gödel, Escher, Bach had been a big influence on me in my early twenties. One person commented that they had an entire shelf of their library devoted to Hofstadter’s work. I talked a little about Gödel’s incompleteness theorem, and how it related to what I had just played, and another listener mentioned that she had just finished translating three hundred of his letters from German to English! She also corrected me on the pronunciation of his name. When has that kind of interaction been possible in a live performance? I never would have thought that it would be possible to foster a genuine sense of community in this format.
The great classical pianist Leon Fleisher died on August 2nd at the age of 92. On my regular Monday livestream August 3rd, one listener asked if I would play the jazz standard Someday My Prince Will Come, and immediately afterwards, someone else asked if I would play something with my left hand only as a tribute to Mr. Fleisher. One thing I’ve come to love doing during these streams is combining ideas from my audience. For example, one person provides a key, another a time signature, another a theme, and I play a freely improvised piece using those elements. In this case, I played a left-hand-only tribute to Mr. Fleisher, something that I never would have thought of doing without these serendipitous suggestions from the audience.