thesis topics — speech & audio

Speech Recognition

Speech Therapy

Audio Detection & Classification

Music

Automatic monitoring of communication skills

Automatic tool to monitor communication style
Based on process communication model (PCM)
In collaboration with Communicate 2 Connect
Task: apply speech recognition technology for PCM:
- non-verbal information (emotion, prosody)
- key-words, key-phrases → frequency counts
- spontaneous, noisy audio

Automatic monitoring of communication skills

Approach
interested partners: myforce, KBC, Sebeco, MatterSight, Communicate 2 Connect, VDAB, ...
Full presentation

Towards a self-learning speech recognition system

Current machine learning techniques learn from labelled data
Can only learn relations prepared by human annotators → costly
Example: speech recognition
- Works well for standard, prepared speech
- Fails on new words, regiolects, dialects, hesitations, borken-off words, ...
Aim: self-learning speech recognizer; weakly supervised

Towards a self-learning speech recognition system

Approaches: see how humans do this and mimic it
Hawkins' memory-prediction framework (On Intelligence: How a New Understanding of the Brain will Lead to the Creation of Truly Intelligent Machines)
Deep Neural Networks (DNNs) with attention mechanisms

Automatic classification & evaluation of speech disorders

Speech therapy for patients with severe speech impairment: laryngectomy, dysarthria
Tool (website) to automate voice training (in collaboration with speech therapists)
Web-site: recording (audio) + manual annotations (annotations) + automatic feedback

Generating phonological feedback for evidence-based speech therapy

Feedback to
- patient (showing the progress he/she made):
- speech therapist (help diagnosing problems, adjust the exercises)
Problems:
- high variability (languages, type of speech disorder, age of the speaker)
- lack of annotated data
- handling spontaneous speech (speak freely)
Solutions:
- use of data from different languages?
- use normal speech and convert to alaryngeal or dysarthric domain?
- search similar audio in existing data?

Semi-automatic bat call recognition

In collaboration with
More info: see lifewatch.be
Goal: custom-made real time data processing tool to recognize bat species (or groups)
Approach: exemplar-based techniques

Automatic harbor porpoise echolocation recognition

In collaboration with
More info: see website Flanders Marine Institute
Goals: new classification algorithms to detect sounds from other marine mammals and even boat sonars; open algorithms to allow other more versatile hydrophones in the network

Machine Learning for Vocal Activity Detection

Detect when somebody sings ↔ instruments only
Non-trivial problem
→ deep learning

Developing an automatic DJ system