Well, check out an AI project being worked on at MIT’s Computer Science and Artificial Intelligence Laboratory (CSAIL).
PixelPlayer is described as “deep-learning system” that can analyse a video of a musical performance, and isolate the particular instruments involved, making them louder or softer.
It can identify the sounds of more than 20 commonly seen instruments.
Seen is a key phrase because previous efforts to separate sounds have apparently focused exclusively on audio, which MIT states often requires extensive human labeling.
You can hear the system in action below:
“We expected a best-case scenario where we could recognize which instruments make which kinds of sounds,” says Hang Zhao, a PhD student at CSAIL.
“We were surprised that we could actually spatially locate the instruments at the pixel level. Being able to do that opens up a lot of possibilities, like being able to edit the audio of individual instruments by a single click on the video.”
PixelPlayer finds patterns in data using neural networks that have been trained on existing videos.
MIT says one neural network visually analyses the the video, another one concentrates on the audio, and a third “synthesizer” associates specific pixels with specific soundwaves to separate the different sounds.
The “deep-learning” is so deep – it uses so-called “self-supervised” learning – the MIT team doesn’t necessarily understand everything the system does in terms of identifying separate instruments.
It seems, however, that certain harmonic frequencies correlate to specific instruments, such as a violin, while quick pulse-like patterns correspond to instruments like the xylophone.
What are the possible, non-musical applications for the technology? Zhao suggests a system like Pixel Player could be used to better understand the environmental sounds that external objects make, such as vehicles.
Hang Zhao is lead author a paper co-written with with MIT professors Antonio Torralba, in the Department of Electrical Engineering and Computer Science, and Josh McDermott, in the Department of Brain and Cognitive Sciences. Also involved are research associate Chuang Gan, undergraduate student Andrew Rouditchenko, and PhD graduate Carl Vondrick.
The paper will be presented to the European Conference on Computer Vision (ECCV) in Munich in September.
Thanks to Sue P. for highlighting this one.
Images: MIT CSAIL
[Via New Atlas]