Integrating machine learning and physiological modeling tools for the assessment of vocal function using neck surface acceleration
Loading...
Date
2024-05
Authors
Journal Title
Journal ISSN
Volume Title
Program
Departament
Campus
Campus Casa Central Valparaíso
Abstract
This thesis is dedicated to advancing the ambulatory assessment of vocal func tion by utilizing a neck-surface accelerometer attached directly to the skin surface of the neck. The motivation lies in the fact that a fully developed ambulatory method, capable of precisely identifying the underlying pathophysiological char acteristics of both normal and pathological vocal functions, could revolutionize clinical practices in monitoring, evaluating, and treating common voice disorders. Accordingly, this work exploits the advantages of a low-order voice production model to introduce a non-invasive technique for estimating relevant vocal func tion metrics, such as subglottal pressure, vocal fold collision pressure, and intrin sic laryngeal muscle activation of the cricothyroid and thyroarytenoid muscles, based on signals from an accelerometer sensor. In the first stage, a Bayesian framework based on a constrained extended Kalman filter is proposed to link a low-order voice production model with either a glottal area waveform extracted from high-speed video recordings or glottal airflow estimated from Rothenberg mask measurements. The results provide new insights into the capacity of the selected voice production model to replicate different phonation conditions and highlight the feasibility of using this method to estimate clinical measures that are difficult to ascertain in a clinical setting. The second stage of the thesis focuses on an alternate solution: a neural network trained exclusively with simulations from a voice production model. This nonlinear regressor maps seven input features, which can be extracted from an accelerometer signal, to the target measures of vocal function. The efficacy of this method, particularly in terms of subglottal pressure, was validated through in vivo recordings, which included synchronous measurements of oral volume velocity, intraoral pressure, microphone, and ac celerometer. This method was applied to healthy and disordered voices (unilateral vocal fold paralysis and both phonotraumatic and nonphonotraumatic vocal hy perfunction). Participants were prompted to articulate /p/-vowel syllable strings, varying loudness, vowels, pitch, and voice quality. The neural network, trained with synthetic data, demonstrated subglottal pressure estimation comparable to that of previous studies for subjects without voice disorders. However, this non linear mapping was found to be less robust in cases of pathology. In the search for more accurate subject-specific models, the final research stage focuses on re fining the neural network regressor, initially trained solely with simulations from a synthetic voice production model. This refinement is carried out by employing a domain adaptation strategy from synthetic to in vivo laboratory data, result ing in an improved estimate of subglottal pressure. This method yielded a set of subject-specific models that provided the most accurate estimation of subglottal pressure to date for both normal and disordered voices using an accelerometer. Additionally, through a case study—which, alongside the previously mentioned in vivo synchronous measurements, also incorporates fine-wire laryngeal electromyo graphy—it is demonstrated that the performance of the subject-specific regressor in estimating subglottal pressure is maintained while concurrently estimating mus cle activation of the cricothyroid and thyroarytenoid muscles. Overall, this thesis advances the field of vocal function assessment through a series of significant contributions. The proposed Bayesian framework reduces the need for multiple observations while yielding robust and reliable estimates of features that are diffi cult to measure in clinical practice. It also innovatively combines machine learning techniques with the voice production model to estimate physiologically relevant features such as subglottal pressure, vocal fold collision pressure, and laryngeal muscle activation from neck-surface accelerometers. Furthermore, this work in troduces a subject-specific nonlinear regression enhanced by transfer learning, significantly improving the estimation of subglottal pressure from neck-surface vi bration signals, with promising potential for application to other vocal function para
Description
Keywords
Transfer Learning, Voice Disorders, Voice Production Model
