August learnings – Sound source localization (azimuth angle of arrival and distance)

Week of 8/2/21 – 8/8/21

  • Distance to the sound source estimation:
    • Use coherence as seen in 2.1 below.

References

  1. Sound localization (angle)
    1. Lightweight multi-DOA tracking of mobile speech sources – mobile sound source
    2. Discriminative Sounding Objects Localization via Self-supervised Audiovisual Matching (NIPS 2020)
    3. POLYPHONIC SOUND EVENT DETECTION AND LOCALIZATION USING A TWO-STAGE STRATEGY (2019)
    4. Multiple Sound Sources Localization from Coarse to Fine (2020)
  2. Sound localization (distance)
    1. Analysis of Monaural and Binaural Statistical Properties for the Estimation of Distance of a Target Speaker (2019)
    2. Direct-to-reverberant Energy Ratio Estimation Based on Interaural Coherence and a Joint ITD/ILD Model
    3. Sound Source Distance Estimation Using Deep Learning: An Image Classification Approach (2019)
    4. Acoustic Source Position Estimation Based On Multi-Feature Gaussian Processes
    5. Exploiting the Distance Information of the Interaural Level Difference for Binaural Robot Motion Control
    6. Estimating Direct-to-Reverberant Energy Ratio Using D/R Spatial Correlation Matrix Model (2011)
  3. Sound localization for human speech only
    1. Lightweight multi-DOA tracking of mobile speech sources – mobile sound source
    2. Analysis of Monaural and Binaural Statistical Properties for the Estimation of Distance of a Target Speaker (2019)
  4. Datasets:
    1. CHIME-home data set (DCASE challenge 2016)
    2. Sound dataset for domestic environments – DCASE 2021 challenge
  5. Acoustic Scene Classification
    1. http://dcase.community/challenge2021/task-acoustic-scene-classification
    2. A Mobile Robot With Active Localization and Discrimination of a Sound Source (1997)
  6. For vehicles
    1. A comparative study of time delay estimation techniques for road vehicle tracking
    2. http://dcase.community/challenge2021/task-acoustic-scene-classification
    3. Acoustic Estimation of the Head Orientation for In Car Communication Systems
  7. How the elevation of the sound source affects the reception of sound at the microphone
    1. Azimuthal and elevation localization of two sound sources using interaural phase and level differences (2008)
    2. Sound Source and Loudspeaker Base Angle Dependency of Phantom Image Elevation Effect (2017) reading.
    3. Vertical Sound Source Localization Influenced by Visual Stimuli (2013)
    4. The Encoding of Sound Source Elevation in the Human Auditory Cortex (2018)
    5. Three-Dimensional Sound Source Localization Using Inter-Channel Time Difference Trajectory (2015)
    6. Sound Source Omnidirectional Positioning Calibration Method Based on Microphone Observation Angle (2018)
    7. DIRECTIVITY OF HUMAN AND ARTIFICIAL SPEECH (2004)
    8. http://audiogroup.web.th-koeln.de/PUBLIKATIONEN/Poerschmann_DAGA2020.pdf (2020) read. Analyzing the Directivity patterns of Human speakers
    9. Horizontal directivity of low- and high-frequency energy in speech and singing (2012)
    10. FACTS ABOUT SPEECH INTELLIGIBILITY (2021) – read. Speech level decreases with distance from the sound source. See the table for more details. Also, check the voice spectra of different levels of speech such as (normal talk, whisper, shout, etc.). Frequency band around 2kHz is most important in English and other non-tonal languages. Using an HP filter at 20 Hz (upper left) leaves the speech 100% understandable. 
    11. 10 IMPORTANT FACTS ABOUT ACOUSTICS FOR MICROPHONE USERS – Definition of important terms
    12. Human voice phoneme directivity pattern measurements (2006)
    13. Acoustics and Vibration Animations (website)
    14. How voice directivity affects speech intelligibility (youtube video).. read. High frequencies are only radiated infront of the person. So, if a microphone is not directly infront, we will lose the some the frequencies reaching the microphone. Consonants are found in the frequencies above 500Hz (most are in area 2kHz – 4kHz). Normal speech at a distance of 1 meter.
  8. https://ifr.org/ifr-press-releases/news/mobile-robots-revolutionize-industry
  9. Sound localization
    1. Transfer-Function Measurement with Sweeps
    2. Robust speaker localization for real-world robots
    3. Symphony: Localizing Multiple Acoustic Sources with a Single Microphone Array
    4. Robust Sound Source Localization Using a Microphone Array on a Mobile Robot
    5. Sound Source and Loudspeaker Base Angle Dependency of Phantom Image Elevation Effect
    6. Sound Source Omnidirectional Positioning Calibration Method Based on Microphone Observation Angle
  10. Head Related Transfer Function
    1. Measurement of Head-Related Transfer Functions: A Review