July learnings – Audio Classification and Sound Localization (Direction only)

Week of 7/26/21 – 8/1/21

  • Direction of arrival of Sound: Was able to find this.

References

  1. A New Moving Sound Source Localization Method Based on the Time Difference of Arrival (2010) – Read , provides a method to fins the distance of the sound source
  2. Time difference of arrival estimation of sound source using cross correlation and modified maximum likelihood weighting function (2017) -Read, provides a new weighing function for the Generalized Cross Correlation (GCC) that combines ML and PHAT-pi weighting functions.
  3. Direction of arrival estimation – A two microphones approach (2010)
  4. GCC-PHAT CROSS-CORRELATION AUDIO FEATURES FOR SIMULTANEOUS SOUND EVENT LOCALIZATION AND DETECTION (SELD) ON MULTIPLE ROOMS (2019)
  5. A Comparison of Generalized Cross-Correlation Methods for Time Delay Estimation (2017)
  6. A moving sound source localization method based on TDOA (2014)
  7. A TALKER TRACKING METHOD USING TWO MICROPHONES BASED ON THE SOUND SOURCE LOCALIZATION (2005)
  8. Localization of multiple acoustic sources with small arrays using a coherence test (2008)

Week of 7/19/21 – 7/25/21

  • Surround sound /Stereo/Mono: Surround sound has more than 2 speakers and a subwoofer. Stereo sound has 2 speakers and does not always have a subwoofer.
    • Mono audio files = 1 channel
    • Stereo audio files = 2 channels
    • Surround audio files = more than 2 channels
  • Surround 2.1 uses two stereo speakers and one subwoofer
  • Convolution vs Cross Correlation: Convolution is a measurement of effect of one signal on the other signal. Cross correlation is the measurement of similarity between the two signals.
  • Convolution between the two continuous time signals (this convolution is also called the convolution integral). Note: The lower and upper limits can be set to 0 for the causal signals.
  • For the discrete time signals (the convolution is called the convolution sum). Note, the lower limit can be set to 0 for the causal systems. To find this numerically, there are few methods.
    • Graphical Procedure
    • Sliding Tape Method
  • Correlation
    • For the continuous time signals
    • For the discrete time signals

References

  1. Surround sound
    1. 2.1 vs. 5.1 vs. 7.1 Surround Sound
    2. The Difference Between Mono and Stereo with Audio Example – the best reading material in this space
    3. What is a Stereo Microphone?
    4. How is Surround Sound Different Than Stereo?
    5. Basic Differences Among Monophonic, Stereophonic and Surround Sound
    6. Monophonic, Stereophonic and Surround Sound Differences
    1. Mono vs. Stereo Sound: The Difference Explained (With Audio Examples)
  2. Time Difference of Arrival (TDOA) estimation
    1. The generalized correlation method for estimation of time delay – (1976), Read- provides a Maximum Likelihood weighing function for Generalized Cross Correlation
  3. Convolution and Correlation
    1. https://www.youtube.com/watch?v=O9-HN-yzsFQ&t=0s – Visualization of convolution and correlation
    2. How to Measure a Time Delay Using Cross Correlation? – video
    3. https://www.youtube.com/watch?v=oCcUm0_rUJw – Determining signal similarities
    4. Convolution and Correlation explained (Math) – A comprehensive explanation
    5. Teaching the concept of convolution and correlation using Fourier transform

Week of 7/12/21 – 7/18/21

  • Refactored the code by integrating Misty with ML (sound classification) on Ubuntu 20.04
  • Worked on audio categories – audioset dataset
  • Direction of arrival of sound papers. See references below
Misty architecture – From reference 2.1 below

References

  1. Sound Localization
    1. Simulation of Human Ear Recognition Sound Direction Based on Convolutional Neural Network
    2. Sound Source Direction Estimation in Horizontal Plane Using Microphone Array (2013) – Read, provides a new algorithm for sound source localization using 4 microphones, uses ML estimator for TDOA estimation
    3. Localization of sound sources in robotics: A review – (2017) Read, a very comprehensive review paper
    4. Learning Sound Location from a Single Microphone (2009) – Read, provides a model for sound localization using a special designed microphone
    5. Localization of Sound Sources: A Systematic Review (2021) – Read, provides a summary of different techniques for sound localization
    6. Microphone Array | Beamforming | Clean Voice – Read, not very useful for my task
    7. Sound Event Localization and Detection of Overlapping Sources Using Convolutional Recurrent Neural Networks
    8. Convolutional Recurrent Neural Networks for Polyphonic Sound Event Detection
    9. Spectral Cues in Human Sound Localization
  2. Misty II
    1. https://www.mistyrobotics.com/blog/what-can-the-misty-ii-platform-do/ – misty platform architecture
    2. Misty Specification
    3. https://docs.mistyrobotics.com/misty-ii/robot/misty-ii/#connecting-to-adb – connecting to the android – Misty II
    4. https://developer.android.com/studio/command-line/adb – What is Android Debug Bridge
    5. https://docs.mistyrobotics.com/misty-ii/robot/misty-ii/#connecting-to-misty-39-s-file-system – Connecting to Misty’s filesystem
  3. ROS on Android/Misty II
    1. ros2-android-controller – An android app to control ROS2 robot
    2. http://wiki.ros.org/android/Tutorials/kinetic – Android Tutorials Kinect
    3. ROS_2_ANDROID – Receiving and Publishing data from Android using ROS2
    4. ROS Mobile – ROS-Mobile is an Android application designed for dynamic control and visualization of mobile robotic system operated by the Robot Operating System (ROS). The application uses ROS nodes initializing publisher and subscriber with standard ROS messages.
    5. Enabling ROS on our Qualcomm Snapdragon based Products
    6. ROS Support for Qualcomm® Snapdragon™ – Bringing ARM into Robotics

Week of 7/5/21 – 7/11/21

  • PANNs inference: PANNs inference works on audio from Misty. Note: audio channel must be Mono for PANNs inference to work
    • It works on Ubuntu but not on Windows 10
  • Studied: Signals – Sampling from BP Lathi book, made videos on Sampling and A/D conversion
  • Move to Sound audio works on Misty

References

  1. https://github.com/iver56/audiomentations – audiomentations —- Python library for audio data augmentation
  2. https://github.com/facebookresearch/AugLy/tree/main/augly/audio – augly —– Python library for audio data augmentation
  3. https://github.com/keunwoochoi/kapre – kapre — Python library for audio data augmentation and much more
  4. https://www.youtube.com/watch?v=RMfeYitdO-c – Audio Classification using Tensorflow using capre
  5. https://github.com/seth814/Audio-Classification – Audio Classification using Tensorflow using capre
  6. Listening for Event Messages with Simple WebSocket Client — A method to listen to event messages from Misty using a Websocket client
  7. Expected round-trip times in REST API?
  8. Misty Coordinate System
  9. Python tips
  10. Classes in Python