Skip to content

Michael-K-Stein/Sound-Classifier

Repository files navigation

Sound Classifier

How Sophisticated Can A Non-Organic Object’s Musical Hearing Become Utilizing Neural Networks?

A Computer Science Research Project

By Michael Kuperfish Steinberg

HaKfar HaYarok 2020

Advisor: Yooda Or

Theoretical Background

Theoretical Background Recognizing people by their voice, differentiating between music genres, animal sounds, and understanding the random noises from the kitchen are all standard tasks our human brain does automatically. These are all achieved by our impeccable ability to find patterns. So how do we grant a computer this ability? The factor humans use to recognize sounds we will be focussing on is timbre. Looking at what “sound” actually is, we see it is a wave moving through space. This wave is generated by vibrations, either from a person’s vocal cords, a speaker’s diaphragm, or the strings of an instrument. We can now look at Mersenne’s formula and see that each chord or string produces a set of frequencies depending on its properties (length, mass per unit length, and the stretching force) f=12LF. Evidently, we can now refer to the sound an instrument makes as a wave at a certain frequency.

image27

Moreover, we can play, for example on a piano, chords. A chord is the simultaneous sound of multiple notes, and thus the sound produced is quite simply a wave of the sum of the individual waves. We can take for example the natural A minor chord, which has the following notes with the corresponding frequencies: image10 image25

And plot the wave generated by summing the sine waves at these frequencies W(t) = sin( At ) + sin( Ct ) + sin( Et ). This wave describes the A minor chord explicitly (and specifically the A minor chord beginning at the closest A to middle C on a grand piano, A at frequency 440.0 Hz).

Now let’s assume we are the audience at this performance, and what we are given is the produced wave (this wave is the movement of particles in the air from the piano’s chords, through the room, and into our ears). How could we figure out which notes were played? To do this we use the Fourier Transform. The Fourier Transform takes the wave we “hear” as an input, and returns an array of the frequencies which produce it. In this case, the Fourier Transform will return {440.0, 523.25, 659.26}, and thus tell us exactly which notes were played to produce the sound wave. (This is a very simplified output of the Fourier Transform, a more in-depth explanation will be found below.)

Our brain is trained to do this automatically, and therefore musicians are able to recognize different chords in songs. Additionally, we can also derive information about the theme of the song, if it is uplifting or melancholic, which style it is in, Jazz is a unique example, and much more. This all stems from being able to recognize which frequencies are combined, and utilize them to classify the sound. It is quite clear to a keen listener that there is a significant difference in the frequency arrays of Twinkle Twinkle Little Star and Seven Nation Army by The White Stripes, while the frequency arrays of Beethoven’s 9th Symphony (Ode to Joy) and Vivaldi’s Spring will be more alike one-another. This is because different genres of music tend to follow a format, which includes certain frequency ranges, and certain intervals between frequencies. So we can use this fact to teach a computer to differentiate between music genres solely based on the frequency arrays our nifty Fourier Transform provides us with. This is actually how Shazam works (Wang, “An Industrial-Strength Audio Search Algorithm.”), and what much of the initial research was based on. Interestingly, to the best of our understanding, the way the human brain differentiates between peoples’ voices is almost identical to how we differentiate between music genres. This means that if we can do one, we can easily do the other.

About

How sophisticated can a nonorganic object’s musical hearing become utilizing neural networks?

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published