<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Publishing DTD v1.3 20210610//EN" "JATS-journalpublishing1-3.dtd">
<article article-type="research-article" dtd-version="1.3" xml:lang="en"
    xmlns:mml="http://www.w3.org/1998/Math/MathML"
    xmlns:xlink="http://www.w3.org/1999/xlink"
    xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
    <processing-meta tagset-family="jats" base-tagset="publishing" mathml-version="2.0" table-model="xhtml"/>
    <front>
                        
                        <journal-meta>
            <issn>1230-7483</issn>
                                </journal-meta>
        <article-meta>
            <title-group>
                                    <article-title>Challenges for Auditory and Automatic Speaker Recognition: Evaluating Cases of Highly Similar Voices</article-title>
                                    <article-title>Trudności związane ze słuchowym i automatycznym rozpoznawaniem osób na podstawie głosu: analiza przypadków głosów bardzo podobnych</article-title>
                            </title-group>

                        <contrib-group>
                                                            <contrib contrib-type="author" corresp="yes">
                            <name>
                                <surname>Kudera</surname>
                                <given-names>Jacek</given-names>
                            </name>
                            <role>author</role>
                                                                                                                                    <xref ref-type="aff" rid="aff-1"/>
                                                                                                        <xref ref-type="aff" rid="aff-2"/>
                                                                                        <xref ref-type="corresp" rid="cor-1"/>
                        </contrib>
                                            <contrib contrib-type="author" corresp="yes">
                            <name>
                                <surname>Nodi</surname>
                                <given-names>Naima Islam</given-names>
                            </name>
                            <role>author</role>
                                                                                                                                    <xref ref-type="aff" rid="aff-3"/>
                                                                                        <xref ref-type="corresp" rid="cor-2"/>
                        </contrib>
                                            <contrib contrib-type="author" corresp="yes">
                            <name>
                                <surname>Roegner</surname>
                                <given-names>Julia</given-names>
                            </name>
                            <role>author</role>
                                                                                                                                    <xref ref-type="aff" rid="aff-4"/>
                                                                                        <xref ref-type="corresp" rid="cor-3"/>
                        </contrib>
                                                </contrib-group>

                                                                                        <aff id="aff-1">
                    <institution-wrap>
                        <institution>Universität Trier</institution>
                                                    <institution-id institution-id-type="ROR">02778hg05</institution-id>
                                            </institution-wrap>
                </aff>
                                                                                            <aff id="aff-2">
                    <institution-wrap>
                        <institution>Uniwersytet WSB Merito we Wrocławiu</institution>
                                            </institution-wrap>
                </aff>
                                                                                                                    
            <author-notes>
                                    <corresp id="cor-1">Correspondence to: Jacek Kudera <email>kudera@uni-trier.de</email></corresp>
                                    <corresp id="cor-2">Correspondence to: Naima Islam Nodi <email></email></corresp>
                                    <corresp id="cor-3">Correspondence to: Julia Roegner <email></email></corresp>
                            </author-notes>

                            <pub-date date-type="pub" publication-format="electronic" iso-8601-date="2025-12-10">
                    <day>10</day>
                    <month>12</month>
                    <year>2025</year>
                </pub-date>
            
            <volume>142–143</volume>
            <issue>2025</issue>
                        <fpage>143</fpage>
                                    <lpage>155</lpage>
            
            <permissions>
                <copyright-statement>Copyright &#x00A9; 2025</copyright-statement>
                                    <copyright-year>2025</copyright-year>
                            </permissions>

            <funding-group specific-use="Crossref">
                <funding-statement></funding-statement>
            </funding-group>
        </article-meta>
    </front>
    <body>
        Identical twins present a difficult case for both auditory and machine speaker recognition. This paper addresses this challenge and presents the findings of two studies: an auditory speaker discrimination test and a machine-based task using forensic automatic speaker recognition (ASR) system. The outcomes of the perceptual judgement task were compared with the log-likelihood ratios (LLRs) yielded by an x-vector-based speaker recognition system. Although the task was given to lay listeners, as opposed to forensic phonetic experts, the results appear to be congruent with the scores yielded by a state-of-the-art automatic system. The human raters were more accurate in judging same-speaker pairs than different-speaker pairs. The machine approach showed better performance in both conditions tested as compared to human listeners. Overall, the voices that were difficult for human listeners were different from those that the ASR system struggled with.
    </body>
    <back>
                    <ref-list>
                                                                                <ref id="B1">
                            <label>1</label>
                            <article-title>1. Jessen M. Forensic phonetics. Lang Linguist Compass. 2008;2(4):671–711. Available from: https://compass.onlinelibrary.wiley.com/doi/abs/10.1111/j.1749-818X.2008.00066.x</article-title>
                        </ref>
                                                                                                    <ref id="B2">
                            <label>2</label>
                            <article-title>2. French P. An overview of forensic phonetics with particular reference to speaker identification. Int J Speech Lang La. 1994;1(2):169–81.</article-title>
                        </ref>
                                                                                                    <ref id="B3">
                            <label>3</label>
                            <article-title>3. Nolan F. Auditory and acoustic analysis in speaker recognition. In: Gibbons J, editor. Language and the Law. London: Longman; 1994. p. 326–45.</article-title>
                        </ref>
                                                                                                    <ref id="B4">
                            <label>4</label>
                            <article-title>4. Beck JM. Organic variation of the vocal apparatus. In: Hardcastle W, Laver J, Gibbon F, editors. The Handbook of Phonetic Sciences. Oxford: Blackwell Publishers; 1997. p. 256–97.</article-title>
                        </ref>
                                                                                                    <ref id="B5">
                            <label>5</label>
                            <article-title>5. Mayr R, Price S, Mennen I. First language attrition in the speech of Dutch–English bilinguals: the case of monozygotic twin sisters. Biling-Lang Cogn. 2012;15(4):687–700.</article-title>
                        </ref>
                                                                                                    <ref id="B6">
                            <label>6</label>
                            <article-title>6. San Segundo E, Tsanas A, Gómez-Vilda P. Euclidean distances as measures of speaker similarity including identical twin pairs: A forensic investigation using source and filter voice characteristics. Forensic Sci Int. 2017;270: 25–38. Available from: https://www.sciencedirect.com/science/article/pii/S0379073816304960 </article-title>
                        </ref>
                                                                                                    <ref id="B7">
                            <label>7</label>
                            <article-title>7. Whiteside SP, Rixon E. Speech tempo and fundamental frequency patterns: a case study of male monozygotic twins and an age- and sex-matched sibling. Logopedics Phoniatrics Vocology. 2013;38(4):173–81.</article-title>
                        </ref>
                                                                                                    <ref id="B8">
                            <label>8</label>
                            <article-title>8. Van Gysel W, Vercammen J, Debruyne F. Voice similarity in identical twins. Acta Otorhinolaryngol. Belg. 2001;55(1):49–55. Available from: http://europepmc.org/abstract/MED/11256192 </article-title>
                        </ref>
                                                                                                    <ref id="B9">
                            <label>9</label>
                            <article-title>9. Cavalcanti JC, Eriksson A, Barbosa PA. Multiparametric analysis of speaking fundamental frequency in genetically related speakers using different speech materials: Some forensic implications. J. Voice. 2024;38(1):243.e11–29. Available from: https://www.sciencedirect.com/science/article/pii/S0892199721002927</article-title>
                        </ref>
                                                                                                    <ref id="B10">
                            <label>10</label>
                            <article-title>10. Zuo D, Mok PPK. Formant dynamics of bilingual identical twins. J Phonetics. 2015;52:1–12. Available from: https://www.sciencedirect.com/science/article/pii/S0095447015000182</article-title>
                        </ref>
                                                                                                    <ref id="B11">
                            <label>11</label>
                            <article-title>11. San Segundo E, Gómez-Vilda P. Evaluating the forensic importance of glottal source features through the voice analysis of twins and non-twin siblings. Lang Law/Linguagem e Direito. 2017;1(2):22–41.</article-title>
                        </ref>
                                                                                                    <ref id="B12">
                            <label>12</label>
                            <article-title>12. Nolan F, Oh T. Identical twins, different voices. Int J Speech Lang La. 1996;3(1):39–49. Available from: https://journal.equinoxpub.com/IJSLL/article/view/10105</article-title>
                        </ref>
                                                                                                    <ref id="B13">
                            <label>13</label>
                            <article-title>13. Loakes D. A forensic phonetic investigation into the speech patterns of identical and non-identical twins [PhD dissertation]. University of Melbourne, School of Languages; 2006.</article-title>
                        </ref>
                                                                                                    <ref id="B14">
                            <label>14</label>
                            <article-title>14. Whiteside S, Rixon E. Speech patterns of monozygotic twins: An acoustic case study of monosyllabic words. The Phonetician. 2001;84:9–22.</article-title>
                        </ref>
                                                                                                    <ref id="B15">
                            <label>15</label>
                            <article-title>15. Whiteside SP, Rixon E. Speech characteristics of monozygotic twins and a same-sex sibling: An acoustic case study of coarticulation patterns in read speech. Phonetica. 2003;60(4):273–97.</article-title>
                        </ref>
                                                                                                    <ref id="B16">
                            <label>16</label>
                            <article-title>16. Künzel HJ. Automatic speaker recognition of identical twins. Int J Speech Lang La. 2010;17(2):251–277.</article-title>
                        </ref>
                                                                                                    <ref id="B17">
                            <label>17</label>
                            <article-title>17. Cavalcanti JC, da Silva RR, Eriksson A, Barbosa PA. Exploring the performance of automatic speaker recognition using twin speech and deep learning-based artificial neural networks. Front Artif Intell. 2024;7:1287877.</article-title>
                        </ref>
                                                                                                    <ref id="B18">
                            <label>18</label>
                            <article-title>18. Gerlach L, McDougall K, Kelly F, Alexander A. Voice twins: discovering extremely similar-sounding, unrelated speakers. In: Interspeech 2023; Dublin: ISCA. p. 2553–2557. doi: 10.21437/Interspeech.2023-2134</article-title>
                        </ref>
                                                                                                    <ref id="B19">
                            <label>19</label>
                            <article-title>19. Gerlach L, McDougall K, Kelly F, Alexander A, Nolan F. Exploring the relationship between voice similarity estimates by listeners and by an automatic speaker recognition system incorporating phonetic features. Speech Commun. 2020;124:85–95. Available from: https://www.sciencedirect.com/science/article/pii/S016763932030251X</article-title>
                        </ref>
                                                                                                    <ref id="B20">
                            <label>20</label>
                            <article-title>20. Phonexia Voice Inspector version 5.1.0. Available from: https://www.phonexia.com/use-case/audio-forensics-software/ Accessed on 05.02.2024.</article-title>
                        </ref>
                                                                                                    <ref id="B21">
                            <label>21</label>
                            <article-title>21. Hughes V. Sample size and the multivariate kernel density likelihood ratio: how many speakers are enough? Speech Commun. 2017;94:15–29. </article-title>
                        </ref>
                                                                                                    <ref id="B22">
                            <label>22</label>
                            <article-title>22. Phonexia Voice Inspector – User Manual. Phonexia s.r.o. Manual version 2023-12-07.</article-title>
                        </ref>
                                                                                                    <ref id="B23">
                            <label>23</label>
                            <article-title>23. Brümmer N, Swart A. Bayesian calibration for forensic evidence reporting. In: Li H, Meng HM, Ma B, Chng ES, Xie L, editors. Interspeech 2014. 15th Annual Conference of the International Speech Communication Association. Singapore: ISCA; 2014. p. 388–92.</article-title>
                        </ref>
                                                                                                    <ref id="B24">
                            <label>24</label>
                            <article-title>24. Evett IW. Towards a uniform framework for reporting opinions in forensic science casework. Sci. Justice. 1998;3(38):198–202.</article-title>
                        </ref>
                                                                                                    <ref id="B25">
                            <label>25</label>
                            <article-title>25. Morrison GS, Zhang C, Rose P. An empirical estimate of the precision of likelihood ratios from a forensic-voice-comparison system. Forensic Sci Int. 2011;208(1–3):59–65.</article-title>
                        </ref>
                                                                                                    <ref id="B26">
                            <label>26</label>
                            <article-title>26. Drygajlo A, Jessen M, Gfroerer S, Wagner I, Vermeulen J, Niemi T, et al. Methodological guidelines for best practice in forensic semi-automatic and automatic speaker recognition. Frankfurt: Verlag für Polizeiwissenschaft; 2015.</article-title>
                        </ref>
                                                                                                    <ref id="B27">
                            <label>27</label>
                            <article-title>27. Morrison GS, Enzinger E, Hughes V, Jessen M, Meuwly D, Neumann C, et al. Consensus on validation of forensic voice comparison. Science &amp;amp; Justice. 2021;61(3):299–309. Available from: https://www.sciencedirect.com/science/article/pii/S1355030621000083</article-title>
                        </ref>
                                                                                                    <ref id="B28">
                            <label>28</label>
                            <article-title>28. McKenna L, McDermott S, O’Donell G, Barrett A, Rasmusson B, Nordgaard A, et al. ENFSI Guideline for Evaluative Reporting in Forensic Science: Strengthening the evaluation of forensic results across Europe (STEOFRAE). Wiesbaden, Germany: European Network of Forensic Science Institutes; 2015. Available from: http://enfsi.eu/wp-content/uploads/2016/09/m1_guideline.pdf</article-title>
                        </ref>
                                                                                                    <ref id="B29">
                            <label>29</label>
                            <article-title>29. Brümmer N, Du Preez J. Application-independent evaluation of speaker detection. Computer Speech &amp;amp; Language. 2006;20(2–3):230–75.</article-title>
                        </ref>
                                                                                                    <ref id="B30">
                            <label>30</label>
                            <article-title>30. Reilly D, Neumann DL, Andrews G. Gender differences in self-estimated intelligence: Exploring the male hubris, female humility problem. Front. Psychol. 2022;13:812483.</article-title>
                        </ref>
                                                                                                    <ref id="B31">
                            <label>31</label>
                            <article-title>31. Kudera J, Coccia M, Fadaeijouybari S, Preidt T, Ranjan A, Braun A. Voice cloning and mismatch conditions in forensic automatic speaker recognition. In: Karpov A, Delić V, editors. Speech and Computer. Cham: Springer Nature Switzerland; 2025. p. 171–84. Available from: https://doi.org/10.1007/978-3-031-78014-1_13 </article-title>
                        </ref>
                                                                                                    <ref id="B32">
                            <label>32</label>
                            <article-title>32. Kelly F, Forth O, Kent S, Gerlach L, Alexander A. Deep neural network based forensic automatic speaker recognition in vocalise using x-vectors. In: Audio Engineering Society Conference: 2019 AES International Conference on Audio Forensics. Porto, Portugal: Audio Engineering Society; 2019. p. 151–57.</article-title>
                        </ref>
                                                </ref-list>
            </back>
</article>
