A.I. To Make Backing Tracks?

voutoreenie · December 2023

Eh, I'm with wim on Taylor's vocals...never did anything for me. Might also be because we used to play Nagasaki w/vocals in a group I used to play in and the "Back in Nagasaki, where the fellows chew tabacky..." was a gang vocal, which works way better for that section than a solo vocalist imo. Of course, I'm not sure if we would perform that song nowadays, same with Sheik and Chinatown, or at least with vocals...could still get away with a lot in the 2000s but probably not so much nowadays, especially how we "interpreted" the lyrics at times lmao (think slim n' slam)

billyshakes · December 2023

I guess for me there are so few Django songs with actual sung lyrics that I like the addition, regardless of how the good the interpretation. He is in tune at least. I don't need a scat performance, but I do think adding a sung melody during live performances can help listeners unfamiliar with the melody latch on to it quicker. Which is a bit ironic to me because I frequently mishear lyrics and just make up nonsense syllables to bridge the gap. I could have written "Sussudio" (nonsense word) and I once lost a televised contest where I failed to finish the lyric to Jack & Diane by John Cougar (the part after "changes come around real soon...").

But this is diverging from the original topic of AI and backing tracks....

voutoreenie · December 2023

>but I do think adding a sung melody during live performances can help listeners unfamiliar with the melody latch on to it quicker.

@billyshakes yup and that's half the reason we had vocals, audiences tend to get captured by vocalists far more readily than purely instrumental performances, especially at casual gig settings vs. concerts. Also really helped that our lead vocalist had a perfect voice for the old standards.

And you're right, this is some serious thread drift...will quit now while I'm still ahead

wim · December 2023

Is that really too much to ask the vocalist to sing the melody? "You had one job..." 😃

Well to steer back on topic, I tried separating the vocal part and instrumental parts, and the software did a fairly good job of it. The vocal part is short so I trimmed the long silence out using audacity. Results attached below.

That's better than I was expecting! Maybe separating the lead guitar part is not too far out of reach after all.

nagasaki-vocals-G major-118bpm.mp3

nagasaki-accompaniment-G major-118bpm.mp3

Svanis1337 · December 2023

I can't wait for an A.I that can restore audio fidelity. Maybe if you fed it unrestored recordings from 78's that were later released on LP or CD from original master sources so it can compare. Or recordings that were recorded in the same studio but with varying quality on the preserved recordings. (ATC Band sessions, Rome, or alternate takes for example.) You could take the best quality recordings and tell the A.I "This is how it's supposed to sound like", after which it removes the noise and adjusts the audio of the poor quality recording to sound like the good quality one.

adrian · December 2023

@Svanis1337 That's an active area of research, and there are already tools that use machine learning to do this. :)

A while ago, I used this specific free one to clean up some old 78rpm recordings that I downloaded from archive.org. It had surprisingly good results! The resulting audio is lacking in presence, but the crackle and hiss are fully gone. It's definitely more listenable than the original.

This technology will only get better and better over time — and it's probably already better, because progress in machine learning has been very fast and furious over the last few years. I last tried this particular code more than a year ago.

But there's a caveat and inherent limitation about this sort of thing: the machine-learning algorithms have to "make up" their output. The hisses and noise were masking some audio that cannot be recovered (because it's literally been covered by the hisses and noise) — so the algorithms try to find the most likely audio to fill in the gaps.

How do they decide what's most likely? That comes from their training data, i.e., a lot of previously seen "correct answers." Ideally the system's training data has a lot of music of the same genre/aesthetic/recording technique that you're trying to clean up. Like you say, it would be nice to give the system some high-quality CD audio examples and their corresponding 78rpm versions — but there's an art to selecting the training data because you don't want the system to learn "how to master music in a specific way," you only want it to learn "how to remove the hisses and noise."

One approach is to generate synthetic training data, where you start with pristine audio and then deliberately introduce hiss and cracks/pops. The fact that you control both sides of the equation means you have definitive information about what noise you've added, and you haven't introduced any extra changes that would bias the output (such as remastering the audio). But this technique requires that your synthetic techniques are actually reflective of how 78rpm records distort audio in the real world.

BTW, this same caveat applies to image upscaling, the technique of making a higher-resolution image given a low-res photo. The algorithm decides how to fill in those extra pixels by effectively making things up, based on training data. So it doesn't really reflect reality. But it's probably good enough for many uses, and there are some interesting philosophical questions there, like "at what level of detail does 'guessing reality' become accurate enough and/or ethical?"

Adrian

Svanis1337 · December 2023

So to get the best quality out of 78's, maybe we need to start producing them again and put modern GJ tunes recorded with a vintage mono mic á la QHCF onto 78's 😃

V-dub · December 2023

The fellow techies on this board seem to have this well covered. There's no way to subtract just a lead guitar yet, but nothing would stop someone with a lot of time and knowledge (and $$) to train that AI model. Sadly, it's probably too niche to be considered by anyone with all those things.

On a related note, I've seen some karaoke tracks pop up on youtube that are definitely using this tech to subtract vocals from old 30s tunes that I assumed no one would ever bother making the backing tracks for, so that's cool!

The ink spots - I don't want to set the world on fire

https://www.youtube.com/watch?v=wGslNxT9_fw

Ray Noble Orchestra - Guilty

https://www.youtube.com/watch?v=2Ia3JUXXo6U