Über mich

Ich bin Entwickler und Architekt und entwerfe und entwickle beruflich Software. Heutzutage konzentriere ich mich hauptsächlich auf mein eigenes App-Business und erstelle Tools, die Menschen und Organisationen dabei helfen, Aufgaben effizienter zu erledigen.

Meine Erfahrung teilt sich auf den Bau von Windows-Anwendungen und die Entwicklung von ML/KI/Sprachsystemen unter Linux auf. Ich konzentriere mich auf oft übersehene Aspekte der Technologie – wie Datenqualität, Dokumentation und die Gewährleistung einer zuverlässigen Verarbeitung in großem Umfang.

Ich habe Erfahrung in folgenden Bereichen::

Produktivitätstools: Entwicklung von Software, die berufliche Aufgaben vereinfacht.
Machine Learning: Aufbau intelligenter Systeme, die im Produktivbetrieb zuverlässig funktionieren.
Audio & Sprache: Praktische Arbeit mit Spracherkennung und Audiosignalverarbeitung.

Mein Ansatz ist recht simpel: Software sollte leistungsstark genug sein, um die schwere Arbeit zu übernehmen, aber einfach genug, um komfortabel in der Nutzung und Wartung zu bleiben. Ich ziehe die größte Zufriedenheit aus anspruchsvollen Projekten, wie etwa der Arbeit im Team an einem von Grund auf neu entwickelten Spracherkennungssystem für Far-Field-Anwendungen oder der Entwicklung von Big-Data-Systemen für die unscharfe Suche für internationale Fintech-Kunden.

Piotr Chlebek
SharkTime Software

Publikationen

InterSpeech · 2022-08-18

Toward Corpus Size Requirements for Training and Evaluating Depression Risk Models Using Spoken Language

Mental health risk prediction is a growing field in the speech community, but many studies are based on small corpora. This study illustrates how variations in test and train set sizes impact performance in a controlled study. Using a corpus of over 65K labeled data points, results from a fully crossed design of different train/test size combinations are provided. Two model types are included: one based on language and the other on speech acoustics. Both use methods current in this domain. An age-mismatched test set was also included. Results show that (1) test sizes below 1K samples gave noisy results, even for larger training set sizes; (2) training set sizes of at least 2K were needed for stable results; (3) NLP and acoustic models behaved similarly with train/test size variations, and (4) the mismatched test set showed the same patterns as the matched test set. Additional factors are discussed, including label priors, model strength and pre-training, unique speakers, and data lengths. While no single study can specify exact size requirements, results demonstrate the need for appropriately sized train and test sets for future studies of mental health risk prediction from speech and language.

Biomedical Sensing and Analysis · Signal Processing in Medicine and Biology · 2022-07-20

Generalization of Deep Acoustic and NLP Models for Large-Scale Depression Screening

Co-author of the book chapter.

Weiterlesen
Book Cover

IEEE, arxiv · 2021-05-13

Speech-Based Depression Prediction Using Encoder-Weight-Only Transfer Learning and a Large Corpus

Speech-based algorithms have gained interest for the management of behavioral health conditions such as depression. We explore a speech-based transfer learning approach that uses a lightweight encoder and that transfers only the encoder weights, enabling a simplified run-time model. Our study uses a large data set containing roughly two orders of magnitude more speakers and sessions than used in prior work. The large data set enables reliable estimation of improvement from transfer learning. Results for the prediction of PHQ-8 labels show up to 27% relative performance gains for binary classification; these gains are statistically significant with a p-value close to zero. Improvements were also found for regression. Additionally, the gain from transfer learning does not appear to require strong source task performance. Results suggest that this approach is flexible and offers promise for efficient implementation.

IEEE, arxiv · 2021-03-25

Cross-Demographic Portability of Deep NLP-Based Depression Models

Deep learning models are rapidly gaining interest for real-world applications in behavioral health. An important gap in current literature is how well such models generalize over different populations. We study Natural Language Processing (NLP) based models to explore portability over two different corpora highly mismatched in age. The first and larger corpus contains younger speakers. It is used to train an NLP model to predict depression. When testing on unseen speakers from the same age distribution, this model performs at AUC=0.82. We then test this model on the second corpus, which comprises seniors from a retirement community. Despite the large demographic differences in the two corpora, we saw only modest degradation in performance for the senior-corpus data, achieving AUC=0.76. Interestingly, in the senior population, we find AUC=0.81 for the subset of patients whose health state is consistent over time. Implications for demographic portability of speech-based applications are discussed.

IEEE, arxiv · 2021-02-17

Robust Speech and Natural Language Processing Models for Depression Screening

Depression is a global health concern with a critical need for increased patient screening. Speech technology offers advantages for remote screening but must perform robustly across patients. We have described two deep learning models developed for this purpose. One model is based on acoustics; the other is based on natural language processing. Both models employ transfer learning. Data from a depression-labeled corpus in which 11,000 unique users interacted with a human-machine application using conversational speech is used. Results on binary depression classification have shown that both models perform at or above AUC=0.80 on unseen data with no speaker overlap. Performance is further analyzed as a function of test subset characteristics, finding that the models are generally robust over speaker and session variables. We conclude that models based on these approaches offer promise for generalized automated depression screening.

IEEE, arxiv · 2021-02-16

Depression and Anxiety Prediction Using Deep Language Models and Transfer Learning

Digital screening and monitoring applications can aid providers in the management of behavioral health conditions. We explore deep language models for detecting depression, anxiety, and their co-occurrence from conversational speech collected during 16k user interactions with an application. Labels come from PHQ-8 and GAD-7 results also collected by the application. We find that results for binary classification range from 0.86 to 0.79 AUC, depending on condition and co-occurrence. Best performance is achieved when a user has either both or neither condition, and we show that this result is not attributable to data skew. Finally, we find evidence suggesting that underlying word sequence cues may be more salient for depression than for anxiety.

5th International Workshop on Mental Health And Well-Being: Sensing And Intervention · 2020-09-12

Comparing Speech Recognition Services for HCI Applications in Behavioral Health

Presented on 5th International Workshop on Mental Health And Well-Being: Sensing And Intervention. Website: UbiComp 2020 workshop.

Behavioral health conditions such as depression and anxiety are a global concern, and there is growing interest in employing speech technology to screen and monitor patients remotely. Language modeling approaches require automatic speech recognition (ASR) and multiple privacy-compliant ASR services are commercially available. We use a corpus of over 60 hours of speech from a behavioral health task, and compare ASR performance for four commercial vendors. We expected similar performance, but found large differences between the top and next-best performer, for both mobile (48% relative WER increase) and laptop (67% relative WER increase) data. Results suggest the importance of benchmarking ASR systems in this domain. Additionally we find that WER is not systematically related to depression itself. Performance is however affected by diverse audio quality from users’ personal devices, and possibly from the overall style of speech in this domain.

Patente

Anmeldung 2024-06-13

Systems and methods for predicting mental health conditions based on processing of conversational speech/text and language

…systems and methods for identifying the severity of a mental health condition or symptoms of same by listening to a human-to-human conversation by receiving conversation data, processing the conversation data to generate a language model output and/or an acoustic model output using one or more language models and/or acoustic models…

Anmeldung 2020-10-23

Acoustic and natural language processing models for speech-based screening and monitoring of behavioral health conditions

The present disclosure provides acoustic and natural language processing (NLP) models for predicting whether a subject has a behavioral or mental health state of interest based at least in part on input speech from said subject.

Anmeldung 2015-12-22

Automatic tuning of speech recognition parameters

System and techniques for automatic tuning of ASR parameters are described herein. A clean audio segment and a dirty audio segment may be obtained, in an iterative fashion, optimized preprocessing parameters may be obtained by, at an iteration, selecting a set of parameters, preprocessing the clean audio segment with the set of parameters to produce a first result, preprocessing the dirty audio segment with the set of parameters to produce a second result,…

Anmeldung 2015-06-26

Phase response mismatch correction for multiple microphones

For a multiple microphone system, a phase response mismatch may be corrected. One embodiment includes receiving audio from a first microphone and from a second microphone, the microphones being coupled to a single device for combining the received audio, recording the received audio from the first microphone and the 2nd microphone…

Anmeldung 2014-07-04

Replay attack detection in automatic speaker verification systems

Techniques related to detecting replay attacks on automatic speaker verification systems are discussed. Such techniques may include receiving an utterance from a user or a device playing back the utterance, determining features associated with the utterance, and classifying the utterance…