FAQ

Challenges for Auditory and Automatic Speaker Recognition: Evaluating Cases of Highly Similar Voices

Data publikacji: 10.12.2025

Problems of Forensic Sciences (Z Zagadnień Nauk Sądowych), 2025, 142–143, s. 143-155

https://doi.org/10.4467/12307483PFS.25.007.22914

Autorzy

,
Jacek Kudera
Universität Trier
, Niemcy
Uniwersytet WSB Merito we Wrocławiu
, Polska
https://orcid.org/0000-0003-3678-1067 Orcid
Kontakt z autorem
Wszystkie publikacje autora →
,
Naima Islam Nodi
Universität Trier
, Niemcy
https://orcid.org/0009-0009-9456-4045 Orcid
Wszystkie publikacje autora →
Julia Roegner
Universität Trier
, Niemcy
https://orcid.org/0009-0005-4553-1483 Orcid
Wszystkie publikacje autora →

Tytuły

Challenges for Auditory and Automatic Speaker Recognition: Evaluating Cases of Highly Similar Voices

Abstrakt

Identical twins present a difficult case for both auditory and machine speaker recognition. This paper addresses this challenge and presents the findings of two studies: an auditory speaker discrimination test and a machine-based task using forensic automatic speaker recognition (ASR) system. The outcomes of the perceptual judgement task were compared with the log-likelihood ratios (LLRs) yielded by an x-vector-based speaker recognition system. Although the task was given to lay listeners, as opposed to forensic phonetic experts, the results appear to be congruent with the scores yielded by a state-of-the-art automatic system. The human raters were more accurate in judging same-speaker pairs than different-speaker pairs. The machine approach showed better performance in both conditions tested as compared to human listeners. Overall, the voices that were difficult for human listeners were different from those that the ASR system struggled with.

Dostępność danych

The experimental data and code are available in the following OSF repository: https://osf.io/kxu6v/.

Bibliografia

Pobierz bibliografię

1. Jessen M. Forensic phonetics. Lang Linguist Compass. 2008;2(4):671–711. Available from: https://compass.onlinelibrary.wiley.com/doi/abs/10.1111/j.1749-818X.2008.00066.x

2. French P. An overview of forensic phonetics with particular reference to speaker identification. Int J Speech Lang La. 1994;1(2):169–81.

3. Nolan F. Auditory and acoustic analysis in speaker recognition. In: Gibbons J, editor. Language and the Law. London: Longman; 1994. p. 326–45.

4. Beck JM. Organic variation of the vocal apparatus. In: Hardcastle W, Laver J, Gibbon F, editors. The Handbook of Phonetic Sciences. Oxford: Blackwell Publishers; 1997. p. 256–97.

5. Mayr R, Price S, Mennen I. First language attrition in the speech of Dutch–English bilinguals: the case of monozygotic twin sisters. Biling-Lang Cogn. 2012;15(4):687–700.

6. San Segundo E, Tsanas A, Gómez-Vilda P. Euclidean distances as measures of speaker similarity including identical twin pairs: A forensic investigation using source and filter voice characteristics. Forensic Sci Int. 2017;270: 25–38. Available from: https://www.sciencedirect.com/science/article/pii/S0379073816304960 

7. Whiteside SP, Rixon E. Speech tempo and fundamental frequency patterns: a case study of male monozygotic twins and an age- and sex-matched sibling. Logopedics Phoniatrics Vocology. 2013;38(4):173–81.

8. Van Gysel W, Vercammen J, Debruyne F. Voice similarity in identical twins. Acta Otorhinolaryngol. Belg. 2001;55(1):49–55. Available from: http://europepmc.org/abstract/MED/11256192 

9. Cavalcanti JC, Eriksson A, Barbosa PA. Multiparametric analysis of speaking fundamental frequency in genetically related speakers using different speech materials: Some forensic implications. J. Voice. 2024;38(1):243.e11–29. Available from: https://www.sciencedirect.com/science/article/pii/S0892199721002927

10. Zuo D, Mok PPK. Formant dynamics of bilingual identical twins. J Phonetics. 2015;52:1–12. Available from: https://www.sciencedirect.com/science/article/pii/S0095447015000182

11. San Segundo E, Gómez-Vilda P. Evaluating the forensic importance of glottal source features through the voice analysis of twins and non-twin siblings. Lang Law/Linguagem e Direito. 2017;1(2):22–41.

12. Nolan F, Oh T. Identical twins, different voices. Int J Speech Lang La. 1996;3(1):39–49. Available from: https://journal.equinoxpub.com/IJSLL/article/view/10105

13. Loakes D. A forensic phonetic investigation into the speech patterns of identical and non-identical twins [PhD dissertation]. University of Melbourne, School of Languages; 2006.

14. Whiteside S, Rixon E. Speech patterns of monozygotic twins: An acoustic case study of monosyllabic words. The Phonetician. 2001;84:9–22.

15. Whiteside SP, Rixon E. Speech characteristics of monozygotic twins and a same-sex sibling: An acoustic case study of coarticulation patterns in read speech. Phonetica. 2003;60(4):273–97.

16. Künzel HJ. Automatic speaker recognition of identical twins. Int J Speech Lang La. 2010;17(2):251–277.

17. Cavalcanti JC, da Silva RR, Eriksson A, Barbosa PA. Exploring the performance of automatic speaker recognition using twin speech and deep learning-based artificial neural networks. Front Artif Intell. 2024;7:1287877.

18. Gerlach L, McDougall K, Kelly F, Alexander A. Voice twins: discovering extremely similar-sounding, unrelated speakers. In: Interspeech 2023; Dublin: ISCA. p. 2553–2557. doi: 10.21437/Interspeech.2023-2134

CrossRef

19. Gerlach L, McDougall K, Kelly F, Alexander A, Nolan F. Exploring the relationship between voice similarity estimates by listeners and by an automatic speaker recognition system incorporating phonetic features. Speech Commun. 2020;124:85–95. Available from: https://www.sciencedirect.com/science/article/pii/S016763932030251X

20. Phonexia Voice Inspector version 5.1.0. Available from: https://www.phonexia.com/use-case/audio-forensics-software/ Accessed on 05.02.2024.

21. Hughes V. Sample size and the multivariate kernel density likelihood ratio: how many speakers are enough? Speech Commun. 2017;94:15–29. 

22. Phonexia Voice Inspector – User Manual. Phonexia s.r.o. Manual version 2023-12-07.

23. Brümmer N, Swart A. Bayesian calibration for forensic evidence reporting. In: Li H, Meng HM, Ma B, Chng ES, Xie L, editors. Interspeech 2014. 15th Annual Conference of the International Speech Communication Association. Singapore: ISCA; 2014. p. 388–92.

24. Evett IW. Towards a uniform framework for reporting opinions in forensic science casework. Sci. Justice. 1998;3(38):198–202.

25. Morrison GS, Zhang C, Rose P. An empirical estimate of the precision of likelihood ratios from a forensic-voice-comparison system. Forensic Sci Int. 2011;208(1–3):59–65.

26. Drygajlo A, Jessen M, Gfroerer S, Wagner I, Vermeulen J, Niemi T, et al. Methodological guidelines for best practice in forensic semi-automatic and automatic speaker recognition. Frankfurt: Verlag für Polizeiwissenschaft; 2015.

27. Morrison GS, Enzinger E, Hughes V, Jessen M, Meuwly D, Neumann C, et al. Consensus on validation of forensic voice comparison. Science & Justice. 2021;61(3):299–309. Available from: https://www.sciencedirect.com/science/article/pii/S1355030621000083

28. McKenna L, McDermott S, O’Donell G, Barrett A, Rasmusson B, Nordgaard A, et al. ENFSI Guideline for Evaluative Reporting in Forensic Science: Strengthening the evaluation of forensic results across Europe (STEOFRAE). Wiesbaden, Germany: European Network of Forensic Science Institutes; 2015. Available from: http://enfsi.eu/wp-content/uploads/2016/09/m1_guideline.pdf

29. Brümmer N, Du Preez J. Application-independent evaluation of speaker detection. Computer Speech & Language. 2006;20(2–3):230–75.

30. Reilly D, Neumann DL, Andrews G. Gender differences in self-estimated intelligence: Exploring the male hubris, female humility problem. Front. Psychol. 2022;13:812483.

31. Kudera J, Coccia M, Fadaeijouybari S, Preidt T, Ranjan A, Braun A. Voice cloning and mismatch conditions in forensic automatic speaker recognition. In: Karpov A, Delić V, editors. Speech and Computer. Cham: Springer Nature Switzerland; 2025. p. 171–84. Available from: https://doi.org/10.1007/978-3-031-78014-1_13 

CrossRef

32. Kelly F, Forth O, Kent S, Gerlach L, Alexander A. Deep neural network based forensic automatic speaker recognition in vocalise using x-vectors. In: Audio Engineering Society Conference: 2019 AES International Conference on Audio Forensics. Porto, Portugal: Audio Engineering Society; 2019. p. 151–57.

Informacje

Informacje: Problems of Forensic Sciences (Z Zagadnień Nauk Sądowych), 2025, 142–143, s. 143-155

Typ artykułu: Oryginalny artykuł naukowy

Tytuły:

Angielski: Challenges for Auditory and Automatic Speaker Recognition: Evaluating Cases of Highly Similar Voices
Polski: Trudności związane ze słuchowym i automatycznym rozpoznawaniem osób na podstawie głosu: analiza przypadków głosów bardzo podobnych

Autorzy

https://orcid.org/0000-0003-3678-1067

Jacek Kudera
Universität Trier
, Niemcy
Uniwersytet WSB Merito we Wrocławiu
, Polska
https://orcid.org/0000-0003-3678-1067 Orcid
Kontakt z autorem
Wszystkie publikacje autora →

Universität Trier
Niemcy

Uniwersytet WSB Merito we Wrocławiu
Polska

Publikacja: 10.12.2025

Otrzymano: 30.06.2025

Zaakceptowano: 23.10.2025

Status artykułu: Otwarte __T_UNLOCK

Licencja: CC BY-NC-ND  ikona licencji

Udział procentowy autorów:

Jacek Kudera (Autor) - 33.33%
Naima Islam Nodi (Autor) - 33.33%
Julia Roegner (Autor) - 33.33%

Korekty artykułu:

-

Języki publikacji:

Angielski, Polski

Challenges for Auditory and Automatic Speaker Recognition: Evaluating Cases of Highly Similar Voices

cytuj

Pobierz PDF Pobierz

pobierz pliki

RIS BIB ENDNOTE