<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE article
  PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Publishing DTD with MathML3 v1.2 20190208//EN" "JATS-journalpublishing1-mathml3.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:ali="http://www.niso.org/schemas/ali/1.0/" article-type="research-article" dtd-version="1.2" xml:lang="en">
<front>
<journal-meta>
<journal-id journal-id-type="publisher-id">JBDGM</journal-id>
<journal-id journal-id-type="nlm-ta">Jahrb Musikpsychol</journal-id>
<journal-title-group>
<journal-title>Jahrbuch Musikpsychologie</journal-title>
<abbrev-journal-title abbrev-type="pubmed">Jahrb. Musikpsychol.</abbrev-journal-title>
</journal-title-group>
<issn pub-type="epub">2569-5665</issn>
<publisher><publisher-name>PsychOpen</publisher-name></publisher>
</journal-meta>
<article-meta>
<article-id pub-id-type="publisher-id">jbdgm.195</article-id>
<article-id pub-id-type="doi">10.5964/jbdgm.195</article-id>
<article-categories>
<subj-group subj-group-type="heading"><subject>Research Reports</subject></subj-group>

<subj-group subj-group-type="badge"><subject>Data</subject><subject>Code</subject><subject>Materials</subject></subj-group>
</article-categories>
<title-group>
<article-title>The Creative Performance of the AI Agents ChatGPT and Google Magenta Compared to Human-Based Solutions in a Standardized Melody Continuation Task</article-title>
<trans-title-group xml:lang="de">
<trans-title>Die Leistungen der Künstlichen Intelligenzen ChatGPT und Google Magenta im Vergleich mit Musikstudierenden bei einer standardisierten Melodie-Fortsetzungsaufgabe</trans-title>
</trans-title-group>
<alt-title alt-title-type="right-running">Creative Performance of AI Agents</alt-title>
<alt-title specific-use="APA-reference-style" xml:lang="en">The creative performance of the AI agents ChatGPT and Google Magenta compared to human-based solutions in a standardized melody continuation task</alt-title>
</title-group>
<contrib-group>
<contrib contrib-type="author"><contrib-id contrib-id-type="orcid" authenticated="false">https://orcid.org/0009-0003-4618-1783</contrib-id><name name-style="western"><surname>Schreiber</surname><given-names>Anton</given-names></name><xref ref-type="aff" rid="aff1"><sup>1</sup></xref></contrib>
<contrib contrib-type="author"><contrib-id contrib-id-type="orcid" authenticated="false">https://orcid.org/0000-0003-3770-7483</contrib-id><name name-style="western"><surname>Sander</surname><given-names>Kilian</given-names></name><xref ref-type="aff" rid="aff1"><sup>1</sup></xref></contrib>
<contrib contrib-type="author" corresp="yes"><contrib-id contrib-id-type="orcid" authenticated="false">https://orcid.org/0000-0003-3356-3478</contrib-id><name name-style="western"><surname>Kopiez</surname><given-names>Reinhard</given-names></name><xref ref-type="corresp" rid="cor1">*</xref><xref ref-type="aff" rid="aff1"><sup>1</sup></xref></contrib>
<contrib contrib-type="author"><contrib-id contrib-id-type="orcid" authenticated="false">https://orcid.org/0009-0000-4880-0187</contrib-id><name name-style="western"><surname>Thöne</surname><given-names>Raphael</given-names></name><xref ref-type="aff" rid="aff1"><sup>1</sup></xref></contrib>
<contrib contrib-type="editor">
<name>
	<surname>Lothwesen</surname>
	<given-names>Kai</given-names>
</name>
<xref ref-type="aff" rid="aff2"/>
</contrib>
	
	<contrib contrib-type="reviewer"><name name-style="western"><surname>Zaddach</surname><given-names>Wolf-Georg</given-names></name></contrib>
	
	<contrib contrib-type="reviewer"><name name-style="western"><surname>Demmer</surname><given-names>Theresa</given-names></name></contrib>
	
<aff id="aff1"><label>1</label><institution>Hanover University of Music, Drama and Media</institution>, <addr-line><city>Hannover</city></addr-line>, <country country="DE">Germany</country></aff>
<aff id="aff2">Staatliche Hochschule für Musik Trossingen, Trossingen,  <country>Germany</country></aff>
</contrib-group>
<author-notes>
<corresp id="cor1"><label>*</label>Hanover Music Lab, Hanover University of Music, Drama and Media, Neues Haus 1, 30175 Hannover, Germany. <email xlink:href="Reinhard.kopiez@hmtm-hannover.de">Reinhard.kopiez@hmtm-hannover.de</email></corresp>
</author-notes>
<pub-date pub-type="epub"><day>05</day><month>09</month><year>2024</year></pub-date>
	<pub-date pub-type="collection" publication-format="electronic"><year>2024</year></pub-date>
<volume>32</volume><elocation-id>e195</elocation-id>
<history>
<date date-type="received">
<day>16</day>
<month>04</month>
<year>2024</year>
</date>
<date date-type="accepted">
<day>21</day>
<month>08</month>
<year>2024</year>
</date>
</history>
<permissions><copyright-year>2024</copyright-year><copyright-holder>Schreiber, Sander, Kopiez, &amp; Thöne</copyright-holder><license license-type="open-access" specific-use="CC BY 4.0" xlink:href="https://creativecommons.org/licenses/by/4.0/"><ali:license_ref>https://creativecommons.org/licenses/by/4.0/</ali:license_ref><license-p>This is an open-access article distributed under the terms of the Creative Commons Attribution (CC BY) 4.0 License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.</license-p></license></permissions>
<abstract>
<p>Many generative artificial intelligence (AI) systems have been developed over the last decade. Some systems are more of a generic character, and some are specialized in music composition. However, whether these AI systems are serious competitors for human composers remains unclear. Despite increased public interest, there is currently little empirical foundation for a conceivably equivalent performance for creative AI when compared to human experts in a controlled task. Thus, we conducted an online experiment to evaluate the subjectively perceived quality of AI compositions with human-made products (by music students) in a standardized task. Based on a melody continuation paradigm, creative products using AI were generated by the AI agents <italic>ChatGPT</italic> (Version 3.5) and <italic>Google Magenta Studio</italic> (Version 2.0). The human creative performances were realized by 57 melodic continuations, composed by music students. In the online evaluation study, listeners (<italic>N</italic> = 71, mainly musicians) rated the aesthetic quality of the outcomes of the various systems. Additionally, the raters’ musical experience level was controlled as well as the length of the given melody completion task (two probe positions). As a main result, the overall quality of the AI compositions was rated significantly lower on all four target items compared to the human-made products (large effect sizes). Musical experience and the length of the melody did not influence the ratings. We conclude that the current capabilities of AI in the domain of musical creativity determined by a standardized composition task are far below human capabilities. However, we assume rapid progress will be made in the domain of generative music-specific AI systems.</p>
</abstract><trans-abstract xml:lang="de">
<p>Aktuell wird eine zunehmende Anzahl an generativen Systemen Künstlicher Intelligenz (KI) entwickelt. Einige Systeme sind eher von generischer Natur, andere wurden speziell für die Komposition von Musik entwickelt. Wie in anderen kreativen Bereichen ist noch unklar, welche Auswirkung diese KIs auf Musikschaffende haben wird. Trotz des kontroversen Themas, existiert bisher wenig Evidenz für die subjektiv bewertete Qualität von KI-Kompositionen im Vergleich zu menschlichen Kompositionen. Daher untersuchten wir in einem online Rating-Experiment die subjektiv bewertete Qualität von KI-Kompositionen im Vergleich zu Kompositionen von Musikstudierenden in einer standardisierten Aufgabe. Basierend auf einem Melodiefortsetzungsparadigma wurden Kompositionen mit den KIs <italic>ChatGPT</italic> (Version 3.5) und <italic>Google Magenta Studio</italic> (Version 2.0) erstellt. Musikstudierende generierten insgesamt 57 Fortsetzungsvarianten der gleichen Fortsetzungsaufgabe. In einem online Rating-Experiment bewerteten Teilnehmende (<italic>N</italic> = 71) die ästhetischen Qualitäten der Melodien. Zusätzlich wurde die musikalische Erfahrung der Teilnehmenden, sowie die Länge der vervollständigten Anfangsmelodie (zwei Probe Positionen) kontrolliert. Als Hauptergebnis wurden die Kompositionen der KIs für alle vier Bewertungs-Items schlechter als die menschlichen Lösungen bewertet (große Effekte). Musikalische Erfahrung, sowie die Länge der Anfangsmelodie hatten keinen Einfluss auf die Bewertung. Wir schlussfolgern, dass die kompositorischen Fähigkeiten musikalischer KIs noch deutlich hinter menschlichen Fähigkeiten liegen. Allerdings sind zukünftig rasante Entwicklungen im Bereich der generativen musikalischen KI-Systeme zu erwarten.</p></trans-abstract>
<kwd-group kwd-group-type="author"><kwd>artificial intelligence</kwd><kwd>AI</kwd><kwd>composition</kwd><kwd>generative AI</kwd><kwd>empirical aesthetics</kwd><kwd>creativity</kwd></kwd-group>
<kwd-group kwd-group-type="translator" xml:lang="de"><kwd>Künstliche Intelligenz</kwd><kwd>KI</kwd><kwd>Komposition</kwd><kwd>generative KI</kwd><kwd>empirische Ästhetik</kwd><kwd>Kreativität</kwd></kwd-group>
</article-meta>
</front>
<body>
	<sec sec-type="intro"><title/>
<p>The idea of a creative machine that can generate music at the touch of a button has fascinated composers since the invention of musical dice games in the 18th century (for an overview see <xref ref-type="bibr" rid="r29">Steinbeck, 2016</xref>). The idea of an automated creative process has continued into the 21st century, accelerated by the availability of powerful computers since the late 1960s (<xref ref-type="bibr" rid="r30">Strawn &amp; Shockley, 2014</xref>). This development accelerated once again in the 1990s when personal computers with sufficient computing power became accessible to individuals. This can be considered the starting point of the first systematic exploration of the capabilities of Artificial Intelligence (AI). Based on the computer language LISP (a precursor of programming languages of AI) the American composer David Cope pursued the idea that musical compositions contain sets of instructions (style features) that can be used after identification to create highly related replications of themselves (<xref ref-type="bibr" rid="r6">Cope, 2000</xref>, p. 20). As a consequence, the understanding of creativity as a process that perpetually produces previously unheard music is replaced by a “recombinancy” paradigm. The elegance of recombinations contributes to the quality of these musical style copies. Using style copies of composers such as Bach, Mozart, Beethoven, Chopin, and others, Cope provides compelling evidence for the recombinancy paradigm (<xref ref-type="bibr" rid="r4">Cope, 1991</xref>, <xref ref-type="bibr" rid="r5">1996</xref>, <xref ref-type="bibr" rid="r6">2000</xref>, <xref ref-type="bibr" rid="r7">2001</xref>). The development of the software for algorithmic composition was accompanied by Cope’s perceptual experiments. In his <italic>Experiments in Musical Intelligence</italic> (<xref ref-type="bibr" rid="r5">Cope, 1996</xref>), he reports the results of listening tests with large numbers of participants on the ability to discriminate between original works by Mozart and style copies. For example, based on data from about 2,000 participants (delegates from a conference listening to the musical demonstrations and blind to the composition process of the respective musical example), musical amateurs outperformed expert musicologists (<xref ref-type="bibr" rid="r7">Cope, 2001</xref>, p. 21). The hit rates were usually around chance level (between 40 and 60 percent). In another series of listening tests, <xref ref-type="bibr" rid="r7">Cope (2001)</xref> suggests a discrimination test called “The Game”: four completely machine-composed examples of music in the styles of Bach, Chopin, Smetana, and Cui are compared to similar style copies (recorded as human performances and added to the book on CD). Listeners scoring at least eight out of 12 correct responses (66%) are labelled as “High Scorers”.</p>
<p>Now that AI agents have become available to a broader public (we will use the terms “agent” and “agency” in line with Latour’s definition of agency as a property of human and non-human actors with the ability to alter a “state of affairs”; see <xref ref-type="bibr" rid="r14">Gioti, 2021</xref>, p. 55), these systems can be used to generate musical output, which in turn can influence musical thinking. Although this might occur in the context of co-creativity between humans and machines, our study focuses on automized composition. For example, the text-based AI agent <italic>ChatGPT</italic> (<xref ref-type="bibr" rid="r23">OpenAI, 2023</xref>), first released to the public in November 2022, is capable of accepting prompts for musical tasks in symbolic language (e.g., in MIDI code) and therefore can generate symbolic music as output (e.g., in MIDI format or other representations that can be converted into sounding music). Consequently, AI could pose a threat to creative musicians by offering a cheaper way to create music—an assumption that is supported by a more recent survey of composers (<xref ref-type="bibr" rid="r13">GEMA, 2024</xref>). Although the use of AI agents for music production is a hot topic in the current technological development of AI agents, only a few studies have conducted a blind comparison of subjective evaluation of AI-generated musical output compared to human-generated music (e.g. <xref ref-type="bibr" rid="r12">Frieler &amp; Zaddach, 2022</xref>; <xref ref-type="bibr" rid="r31">Tigre Moura &amp; Maw, 2021</xref>; for an overview see <xref ref-type="bibr" rid="r22">Oksanen et al., 2023</xref> and <xref ref-type="bibr" rid="r38">Yin et al., 2023</xref>).</p>
<p>For example, in their overview of subjective and objective evaluation strategies for AI-generated music, <xref ref-type="bibr" rid="r37">Xiong et al. (2023)</xref> have shown that basic empirical subjective evaluations of AI and human-generated music are rather rare. In their review of perceptual studies on the aesthetic quality of computer-generated music, <xref ref-type="bibr" rid="r22">Oksanen et al. (2023)</xref> found only ten empirical investigations between 2003 and 2021. In an early study on the influence of different narratives regarding the composition source (AI vs. human), Tigre Moura and Maw (2021, Study 2) used the AI-based song “Daddy’s Car” (<xref ref-type="bibr" rid="r34">Vincent, 2016</xref>; for more details on the Flow Machines project see <xref ref-type="bibr" rid="r24">Pachet et al., 2021</xref>) and an AI-based symphonic film music titled “Genesis Symphonic Fantasy” created by AIVA software (<ext-link ext-link-type="uri" xlink:href="https://www.youtube.com/watch?v=Ebnd03x137A">https://www.youtube.com/watch?v=Ebnd03x137A</ext-link>). Surprisingly, listeners (blind to the composition process of the music) did not give negative evaluations of the AI condition. However, a fundamental problem in the handling of “AI music” can be observed in this and other studies that could be a reason for missing or inconsistent negative/positive evaluations: the stimulus used in the study by <xref ref-type="bibr" rid="r31">Tigre Moura and Maw (2021)</xref> resulted from the AI agent AIVA and represents a highly polished demonstration audio track released by the company. It does not represent the typical audio output of the AI agent but is the result of massive post-editing by a human arranger and is based on high-quality orchestra samples. This kind of promotional audio clip should be regarded as the outcome of co-creativity between AI and humans but not as representative of the output quality of automized AI composition agents (<xref ref-type="bibr" rid="r14">Gioti, 2021</xref>; <xref ref-type="bibr" rid="r15">Gioti et al., 2022</xref>). When co-creativity is not known to the listener, framing effects can also influence the evaluation of the aesthetic qualities of (widely unknown) human-sounding music from classical composers. For example, <xref ref-type="bibr" rid="r28">Shank et al. (2023)</xref> labelled short musical excerpts as (a) composed by AI, (b) composed by a human composer, or (c) a composer identity was not given. Results showed a significant effect of composer identity on the ratings of musical quality. Excerpts labelled as AI-generated music were rated significantly lower than those labelled as human-generated or those that provided no information on the composer’s identity. <xref ref-type="bibr" rid="r28">Shank et al. (2023)</xref> label this effect as the “AI composer bias”. Bias effects in the evaluation of AI vs. human-made artistic products have also been confirmed in a study by <xref ref-type="bibr" rid="r18">Millet et al. (2023)</xref>: The authors investigated the influence of anthropocentric beliefs on the aesthetic evaluation of artworks such as paintings or music. As a result, two pieces of AI-generated music (created by the same AI agent AIVA in the style of symphonic film music) were rated as less creative (medium effect size) and less awe-inducing (small to medium effect size) compared to when they were labelled as human-made. This bias against AI-made art is known as “anthropocentric bias” in the evaluation of artistic creativity (<xref ref-type="bibr" rid="r18">Millet et al., 2023</xref>). From the perspective of music production in popular music, <xref ref-type="bibr" rid="r8">Deruty et al. (2022)</xref> studied the use of AI tools in the recording studio. The authors conclude that future production routines will likely be based on AI tools in terms of co-creative support in sound mixing, arrangements, and the production of rhythm tracks (for an overview of current tendencies see also <xref ref-type="bibr" rid="r19">Moffat, 2021</xref>).</p>
		<p>Most similar to our study, <xref ref-type="bibr" rid="r12">Frieler and Zaddach (2022)</xref> compared the ratings of jazz solos of a generative model with human improvisation. They found that solos by jazz masters were generally rated better than algorithmically composed solos. However, in the classification task, even jazz experts only achieved 64.6% correct identifications. Therefore, in our explorative study, we conducted an online rating experiment to get more insight into the qualitative differences between human- and AI-generated compositions (beyond a mere discrimination paradigm) and thus, tested the aesthetic rating of compositions resulting from ChatGPT (Version 3.5, <xref ref-type="bibr" rid="r23">OpenAI, 2023</xref>) and Google Magenta Studio (Version 2.0; <xref ref-type="bibr" rid="r16">Google AI, 2023</xref>) compared to compositions by music students.</p>
<sec sec-type="other1"><title>Research Question and Study Aim</title>
<p>The main research question is as follows: In terms of aesthetic quality, how similar are evaluations of compositions generated by AI systems and based on a standardized melody continuation paradigm compared to human-based compositions generated by music students? The aim of this study is to develop an empirical basis for future research into the aesthetic evaluation of creative products generated by musical AI agents. However, we cannot answer the question of how AI produces music in other musical systems outside of Western culture. Thus, in this study, we will focus on melodies in the style of Western music, which will be evaluated by listeners familiar with Western musical grammar. Due to the potential future impact of AI on the music industry and musicians, we argue that empirical research with an objective approach is needed in this field in order to assess the power and potential dangers of musical AI.</p></sec>
<sec sec-type="other2"><title>Hypotheses and Study Aims</title>
	<p>In this explorative study, we did not test for any specific hypotheses. No hypotheses could be inferred due to the lack of specific previous studies and a theoretical framework, and it was therefore also impossible to calculate an a priori power analysis. However, in a post-hoc power analysis (see Results section), we tested whether a sufficient number of participants took part in the experiment to find the observed effect. Our goal was to learn more about the effects to provide a basis for future research.</p></sec></sec>
<sec sec-type="methods"><title>Method</title>
<sec><title>Design</title>
<p>For the subjective quality ratings of AI- and human-based compositions, we measured four dependent variables with one item each. These items were in part derived from previous research on related topics. The first item <italic>musically convincing</italic> was derived from <xref ref-type="bibr" rid="r12">Frieler and Zaddach (2022)</xref>, and the second item <italic>musically logical and meaningful</italic> was obtained from <xref ref-type="bibr" rid="r35">Webster’s (1994)</xref> measurements of creative thinking in music. Loosely related to <xref ref-type="bibr" rid="r3">Charyton et al.’s (2008)</xref> scale of originality, the third item asked whether the melody was <italic>interesting</italic>, and the fourth item asked how much the participants <italic>liked</italic> the melody (<xref ref-type="bibr" rid="r12">Frieler &amp; Zaddach, 2022</xref>).</p>
	<p>As an independent variable, we varied the composer in three conditions (human-based vs. AI agent ChatGPT vs. AI agent Magenta). Because the focus of this research was to evaluate differences between AI and human-based compositions, the statistical analysis focuses on the dichotomous differentiation with only two conditions for the independent variable (human-based vs. AI). Additionally, we controlled for the length of the prescribed melody that had to be continued. According to <xref ref-type="bibr" rid="r27">Schmuckler (1989)</xref>, this independent variable was called <italic>probe position</italic> with two conditions (long vs. short). Finally, we controlled for the prior musical experience of the participants based on <xref ref-type="bibr" rid="r39">Zhang and Schubert’s (2019)</xref> single-item scale of musical sophistication. This resulted in a 2 × 2 × 3 repeated measures design with the control variable of musical sophistication as a between factor.</p></sec>
<sec><title>Musical Stimuli</title>
	<p>In line with previous research on musical expectation (e.g., <xref ref-type="bibr" rid="r27">Schmuckler, 1989</xref>; <xref ref-type="bibr" rid="r32">Unyk &amp; Carlsen, 1987</xref>) we decided to use a melody continuation paradigm. Thus, the prescribed material for the compositions was a melody generated by one of the co-authors (RT), a professional composer in the style of film music and arranged for strings (see <xref ref-type="fig" rid="fA.1">Figure A1</xref> in the Appendix; the full piece can be heard at <ext-link ext-link-type="uri" xlink:href="https://www.youtube.com/watch?v=eYKdZBeY2eE">https://www.youtube.com/watch?v=eYKdZBeY2eE</ext-link>). Due to different harmonic implications depending on where the melody was truncated, we decided to use two different lengths of the melodic material for the continuation task (see <xref ref-type="fig" rid="fA.2">Figures A2</xref> and <xref ref-type="fig" rid="fA.3">A3</xref> in the Appendix). The first condition used four bars of the original melody ending on an E♭4, therefore implying a harmonic context in C minor or E♭ major (probe position 1; PP1; <xref ref-type="fig" rid="fA.2">Figure A2</xref>). The second condition had a length of 7 bars and ended on a D4, implying a harmonic context in D or G minor/major or even B♭ major (probe position 2; PP2; <xref ref-type="fig" rid="fA.3">Figure A3</xref>). The musical stimuli were generated as follows: (a) music students from Hanover University of Music, Drama and Media continued either of the melodies following a standardized instruction (see <xref ref-type="app" rid="app2">Appendix 2</xref>). This resulted in a total of <italic>N</italic> = 57 melodies (PP1 and PP2). (b) As part of a seminar, the students also composed 42 melodies in the PP2 condition by means of ChatGPT. Six of the 42 melodies in the GPT-PP2 condition were not implemented in the rating experiment as they were too long. For this purpose, we used a Python syntax as a prompt (see <xref ref-type="app" rid="app3">Appendix 3</xref>) to transfer the prescribed original melody sections to ChatGPT. The results were exported in MIDI format using the Python module SCAMP (<xref ref-type="bibr" rid="r9">Evanstein, 2023</xref>). (c) To assess the quality of another AI agent, we produced an additional 50 melodies (25 for each condition) with the AI agent Google Magenta (<xref ref-type="bibr" rid="r16">Google AI, 2023</xref>; option <italic>continue,</italic> temperature [i.e., number of notes and pitch changes] = 1; length = 4 or 6 bars to add, number of variations [e.g., how many outputs the program will produce] = 4). (d) Additionally, we produced 25 melodies for the PP1 condition with ChatGPT (V3.5). The results from both AI agents were combined in the analysis of the participants’ ratings. The resulting MIDI files were exported as mp3 audio files (timbre oboe) using MuseScore software (Version 4.0; <xref ref-type="bibr" rid="r21">MuseScore Team, 2023</xref>), normalized for loudness with Audacity (<xref ref-type="bibr" rid="r1">Audacity Team, 2023</xref>) to −10 LUFS.</p>
<p>The following total number of <italic>N</italic> = 168 melodic continuations was tested:</p>
<list id="L1" list-type="bullet">
<list-item>
<p><italic>n</italic> = 29 human-made continuations based on PP1</p></list-item>
<list-item>
<p><italic>n</italic> = 28 human-made continuations based on PP2</p></list-item>
<list-item>
<p><italic>n</italic> = 25 AI-based continuations (ChatGPT) based on PP1</p></list-item>
<list-item>
<p><italic>n</italic> = 36 AI-based continuations (ChatGPT) based on PP2</p></list-item>
<list-item>
<p><italic>n</italic> = 25 AI-based continuations (Magenta) based on PP1</p></list-item>
<list-item>
<p><italic>n</italic> = 25 AI-based continuations (Magenta) based on PP2</p></list-item>
</list>
	<p>All materials are available in the Supplementary Materials section (see <xref ref-type="bibr" rid="sp1_r1">Schreiber et al., 2024</xref>).</p></sec>
<sec><title>Procedure</title>
	<p>This study was conducted as an online experiment: <italic>N</italic> = 71 participants completed a questionnaire on the SosciSurvey platform (<ext-link ext-link-type="uri" xlink:href="https://www.soscisurvey.de">https://www.soscisurvey.de</ext-link>). In a randomized blind trial, each participant rated 20 melodies (five for each condition: AI agents/human-made for the two melody lengths of PP1 and PP2). In the instructions, participants were informed that some of the melodies were composed by AI (for the original wording see Supplementary Material, <xref ref-type="bibr" rid="sp1_r1">Schreiber et al., 2024</xref>). Consequently, no participant rated all 168 melodies, which resulted in an incomplete study design. The melodies were rated on a 5-point rating scale (1 = <italic>not all all</italic> [<italic>gar nicht</italic>] to 5 = <italic>very much</italic> [<italic>sehr</italic>]) and no anchor melody or prime was given. In addition, we asked for age, gender, and musical experience. The experiment took about 20 minutes to complete. Participants gave informed consent before starting the experiment. No reimbursement was paid.</p></sec>
<sec><title>Sample</title>
<p>Out of the 71 participants, one person was excluded due to missing values and improbable responses. Participants were recruited from university seminars and personal networks of the authors. All had a Western cultural background. The sample showed a broad range of musical sophistication necessary to detect rating differences influenced by various levels of musical education. From the convenience sample, 31 (43.7%) were female and most of the participants studied music. The participants were aged between 20 and 79 years (<italic>Mdn</italic> = 27, <italic>IQR</italic> = 29). Their amount of musical training was as follows: 32 (45.1%) had more than 10 years of instrumental or vocal training, 21 (29.6%) had six to 10 years, and 17 (24.3%) had less than six years. This means that our sample for the rating task can be characterized as having a high degree of musical experience. In terms of <xref ref-type="bibr" rid="r39">Zhang and Schubert’s (2019)</xref> categorization, this means that about 75% of the participants can be considered as raters with musical identity (6–10 years of musical training) or strong musical identity (&gt; 10 years of musical training).</p></sec></sec>
<sec sec-type="results"><title>Results</title>
<p>As shown in <xref ref-type="fig" rid="f1">Figure 1</xref>, as an overall effect, the compositions from music students were rated higher for all four dependent variables. For the AI agents, melodies from ChatGPT were rated slightly higher than those from Google Magenta. To test for the main effect of the independent variable “composer”, a repeated measures MANOVA was calculated in R (<xref ref-type="bibr" rid="r25">R Core Team, 2023</xref>) using the stats software package. Since the comparison of results between the AI agents and humans was of special interest, ratings of ChatGPT and Google Magenta Studio were aggregated and regarded as representatives for the “AI generated” category.</p>
	
	<fig id="f1" position="anchor" fig-type="figure" orientation="portrait"><label>Figure 1</label><caption>
		<title>Error Bar Diagram for the Mean Ratings of the Melodies for Each of the Four Items Grouped by Musical Experience and Sources of Melodic Creation</title><p><italic>Note.</italic> The rating scale ranged from 1 (<italic>not at all</italic>) to 5 (<italic>very much</italic>). Error bars represent 95% confidence intervals.</p></caption><graphic xlink:href="jbdgm.195-f1" position="anchor" orientation="portrait"/></fig>
	
<p>Statistical analyses revealed a significant effect of the factor composer (source of melodic creation) on the rating of the four dependent variables, <italic>F</italic>(1, 67) = 91.114, <italic>p</italic> &lt; .001, Pillai’s trace = 0.857, η<sup>2</sup> = 0.576. The length of the completed melody had no influence on the rating, <italic>F</italic>(1, 67) = 1.265, <italic>p</italic> = .293, Pillai’s trace = 0.073. This was also the case with the raters’ degree of musical experience, <italic>F</italic>(1, 67) = 0.554, <italic>p</italic> = .813, Pillai’s trace = 0.066, η<sup>2</sup> = 0.019. There were also no interactions between the factors (see <xref ref-type="table" rid="t1">Table 1</xref>). In terms of effect sizes, the observed rating differences between AI agents and human-made compositions were larger than 1.5 standard deviations for all dependent variables (see <xref ref-type="table" rid="t2">Table 2</xref>).</p>
<table-wrap id="t1" position="anchor" orientation="portrait">
<label>Table 1</label><caption><title>MANOVA for all Factors With Simple Differentiation Between Human and AI Compositions</title></caption>
<table frame="hsides" rules="groups">
<col width="15%" align="left"/>
	<col width="40%" align="left"/>
<col width="9%"/>
<col width="9%"/>
<col width="9%"/>
<col width="9%"/>
<col width="9%"/>
<thead>
<tr>
<th>Effect</th>
<th>Factor</th>
<th><italic>df</italic></th>
<th><italic>V</italic></th>
<th><italic>F</italic><sub>approx</sub></th>
<th>df<italic><sub>F</sub></italic></th>
<th><italic>p</italic></th>
</tr>
</thead>
<tbody>
<tr>
<td rowspan="3">Main effects</td>
	<td align="left">Musical experience</td>
<td>2, 67</td>
<td align="char" char=".">0.066</td>
<td align="char" char=".">0.554</td>
<td>8, 130</td>
<td align="char" char=".">.813</td>
</tr>
<tr>
	<td align="left">Composer</td>
<td>1, 67</td>
<td align="char" char=".">0.857</td>
<td align="char" char=".">96.114</td>
<td>4, 64</td>
<td align="char" char=".">&lt; .001</td>
</tr>
<tr>
	<td align="left">Probe position</td>
<td>1, 67</td>
<td align="char" char=".">0.073</td>
<td align="char" char=".">1.265</td>
<td>4, 64</td>
<td align="char" char=".">.293</td>
</tr>
<tr>
<td rowspan="4">Interactions</td>
	<td align="left">Composer: musical experience</td>
<td>2, 67</td>
<td align="char" char=".">0.110</td>
<td align="char" char=".">0.946</td>
<td>8, 130</td>
<td align="char" char=".">.481</td>
</tr>
<tr>
	<td align="left">Probe position: musical experience</td>
<td>2, 67</td>
<td align="char" char=".">0.070</td>
<td align="char" char=".">0.591</td>
<td>8, 130</td>
<td align="char" char=".">.784</td>
</tr>
<tr>
	<td align="left">Composer: probe position</td>
<td>1, 67</td>
<td align="char" char=".">0.119</td>
<td align="char" char=".">2.152</td>
<td>4, 64</td>
<td align="char" char=".">.084</td>
</tr>
<tr>
	<td align="left">Composer: probe position: musical experience</td>
<td>2, 67</td>
<td align="char" char=".">0.061</td>
<td align="char" char=".">0.509</td>
<td>8, 130</td>
<td align="char" char=".">.848</td>
</tr>
</tbody>
</table>
<table-wrap-foot>
<p><italic>Note. V</italic> = Pillai’s trace. Colons in the Factor column represent interactions<italic>.</italic></p>
</table-wrap-foot>
</table-wrap>
<table-wrap id="t2" position="anchor" orientation="portrait">
<label>Table 2</label><caption><title>Effect Sizes (Cohen’s d<sub>Z</sub>) for the Four Dependent Variables for the Comparison of AI- and Human-Made Compositions</title></caption>
<table frame="hsides" rules="groups">
<col width="40%" align="left"/>
<col width="20%"/>
<col width="40%"/>
<thead>
<tr>
<th rowspan="2" valign="bottom">Dependent variable</th>
<th colspan="2" scope="colgroup">Effect size<hr/></th>
</tr>
<tr>
<th scope="colgroup" align="center">Cohens <italic>d<sub>Z</sub></italic></th>
<th>95% Confidence Interval</th>
</tr>
</thead>
<tbody>
<tr>
<td>Interesting</td>
<td align="char" char=".">−1.74</td>
<td>[−2.11, −1.37]</td>
</tr>
<tr>
<td>Logical and meaningful</td>
<td align="char" char=".">−1.93</td>
<td>[−2.32, −1.53]</td>
</tr>
<tr>
<td>Liking</td>
<td align="char" char=".">−2.23</td>
<td>[−2.66, −1.79]</td>
</tr>
<tr>
<td>Convincing</td>
<td align="char" char=".">−2.11</td>
<td>[−2.53, −1.69]</td>
</tr>
</tbody>
</table>
<table-wrap-foot>
<p><italic>Note.</italic> Negative <italic>d<sub>Z</sub></italic> values indicate lower ratings for AI compositions compared to human-made compositions.</p>
</table-wrap-foot>
</table-wrap>
<p>Even though an a priori power analysis was not possible due to a lack of results from previous studies, a post hoc power analysis was calculated by means of G*Power (V3.1.9.6; <xref ref-type="bibr" rid="r10">Faul et al., 2007</xref>) to evaluate whether enough participants were included to unveil the observed effects. Based on <italic>N</italic> = 70 participants and a Pillai’s trace value of <italic>V</italic> = .857 (the value for the main effect “composer”) the calculated power was 1−β = 1.0. Overall, the results show large effects in favor of the human compositions for all target variables. Neither the musical expertise of the participants, nor the length of the composed melody affected the striking difference in evaluation between human and AI compositions.</p></sec>
<sec sec-type="discussion"><title>Discussion</title>
	<p>The results show that the subjectively perceived and empirically confirmed quality of AI compositions is far below human-made compositions. This effect seems to be so large that the degree of musical experience had no influence on the rating. When listening to the stimuli, it became clear that, with just a few exceptions, the AI melodies sounded illogical and strange to our Western understanding of melodic construction (see Supplementary Materials for sound examples, <xref ref-type="bibr" rid="sp1_r1">Schreiber et al., 2024</xref>). For example, the harmonic context of the prescribed melody was left intentionally ambiguous, but most of the AI-generated continuations did not even finish in the same key, resulting in a feeling of unresolved tonal tension and coherence at the end. Some melodies also contained breaks at unexpected metrical positions. These properties of the AI-generated melodies could explain why no effect of musical experience was observed and even musically naïve listeners gave lower ratings for the AI-generated versions. In terms of the perspective of <xref ref-type="bibr" rid="r2">Bigand and Poulin-Charronnat (2006)</xref> that all Western listeners are considered as “musically experienced listeners” with musical capacities acquired through exposure to music, we conclude that evaluation tasks like ours can be successfully performed without the help of explicit training or expertise. Interestingly, the ChatGPT compositions were rated slightly better for all dependent variables than those generated by Google Magenta, although this direct comparison between AI agents was not the focus of this study. Even though it is not a significant effect and may be due to variation in the data, it is surprising that results from the music-unspecific and text-trained AI agent ChatGPT were rated better than those from the music-specific AI agent Google Magenta Studio. However, although trained for the processing of musical material, we should bear in mind that Google Magenta Studio represents the predecessor generation of AI agents (first released in 2019) and ChatGPT the next generation (released by the end of 2022). We also conclude that our study provides first empirical evidence that contradicts the often anecdotal scenarios of the potential threat posed by the musically creative capabilities of AI systems. Under the condition of a standardized creative task, at least, we found no support for this assumption.</p>
<p>Our skeptical assessment of the current performance level of AI agents in the domain of music is in line with the critique by <xref ref-type="bibr" rid="r26">Rohrmeier (2022)</xref>, who argues that computational creativity has to face four main challenges: (a) the cognitive challenge (creativity requires a cognitive model of music cognitions, e.g., for tonality); (b) the challenge of the external world (creativity includes the semantic, semiotic, and pragmatic world references, e.g., by references to an extra-musical program such as in Smetana’s <italic>Moldau</italic>); (c) the embodiment challenge (creativity requires a model of the human body, e.g., for the possibilities of playing techniques); and (d) the challenge of creativity at the meta-level (creativity refers to meta-creative strategies such as form embeddings of a fugue within the classical sonata form, e.g., in Liszt’s <italic>B minor sonata</italic>, or by the use of musical quotations). This general capacity of music and its creation requires the capacity for Artificial General Intelligence (AGI). As long as these prerequisites of musical creativity remain unresolved, these challenges will remain a fundamental AI-specific problem. However, according to the theoretical framework developed by <xref ref-type="bibr" rid="r20">Morris et al. (2024)</xref>, AI agents such as ChatGPT have currently reached Level 2 (out of 6 levels) of AGI. In other words, musical AI agents have a long way to go before they at least reach the expert level (Level 4). However, we cannot exclude a significant increase in creative musical capacities for the next generation of AI agents, particularly when trained with music-specific materials of high quality and enriched with information about the external world. Finally, it remains to be seen whether extended training based on additional musical material will increase the outcome quality of the AI. For example, <xref ref-type="bibr" rid="r24">Pachet et al. (2021)</xref> report that “the most interesting music generation was not obtained with the most sophisticated techniques” (p. 512). Instead, the combination of various tools produced a more interesting musical output; second, the training of current AI systems depends on the availability of high-quality data. The availability of such data type seems to be limited and the LLM scaling of current AI systems seems to be constrained by this data type and the scarcity of AI raw material (<xref ref-type="bibr" rid="r17">Jürgens, 2024</xref>; <xref ref-type="bibr" rid="r33">Villalobos et al., 2024</xref>). This could result in insufficient data for training and, as <xref ref-type="bibr" rid="r33">Villalobos et al. (2024)</xref> conclude in a forecast, if current trends in LLM development continue, there is a 50% probability that the effective stock of human-generated public data will run out in 2024 and a 90% probability by 2026. Thus, the current trend in AI development is to attempt to compensate for the assumed data scarcity by applying data augmentation techniques (<xref ref-type="bibr" rid="r36">Xie et al., 2020</xref>).</p>
<sec><title>Limitations</title>
<p>This study opted to use a melody continuation task to produce comparable stimuli between the two AIs and the music students. As a result, the choice of AI systems was very restricted. It is possible that the compositions could achieve higher ratings in a less restrictive task in which AI agents were allowed to compose polyphonic music with different instruments. Finally, the rapid and constant development of AI will lead to fast improvements in AI compositions. For example, a new generation of music-specific AI agents such as Suno AI Chirp (<ext-link ext-link-type="uri" xlink:href="https://www.suno.ai">https://www.suno.ai</ext-link>) released in September 2023 integrates lyrics and human vocals as well as formal elements of song structure (e.g., verse and chorus) into the generation of popular music. Therefore, more studies like this should be conducted to constantly assess the development of the quality of AI compositions. By doing so, music research can make a valuable contribution to musicians and creatives in empirically investigating the progress being made by musical AI.</p>
<p><bold>Statement of Ethics</bold></p>
<p>The present study was conducted in accordance with ethical principles and standards according to the guidelines of the German Society for Psychology (<xref ref-type="bibr" rid="r11">Föderation Deutscher Psychologenvereinigungen, 2022</xref>) and with the principles outlined in the Declaration of Helsinki. According to German law, no ethics approval has been required. Written informed consent was attained by asking participants to continue only if they were willing to participate and if they had read and understood the instructions and information provided. Participants were told that participation was voluntary and that they had the right to withdraw from the study at any time. The data were anonymized and treated confidentially.</p>

</sec></sec>
</body>
<back>
<app-group>
<app id="app"><title>Appendix</title>
<sec id="app1"><title>Appendix 1: Melodic Material for Stimulus Generation</title>
	
<fig id="fA.1" position="anchor" fig-type="figure" orientation="portrait"><label>Figure A1</label><caption>
	<title>Complete First Phrase of the Original Melody “Aus meiner Feder” [From My Pen]</title><p><italic>Note.</italic> Use of melodic material with the kind permission of Raphael Thöne. The full arrangement of this melody can be heard at <ext-link ext-link-type="uri" xlink:href="https://www.youtube.com/watch?v=eYKdZBeY2eE">https://www.youtube.com/watch?v=eYKdZBeY2eE</ext-link></p></caption><graphic xlink:href="jbdgm.195-fA.1" position="anchor" orientation="portrait"/></fig>
	
<fig id="fA.2" position="anchor" fig-type="figure" orientation="portrait"><label>Figure A2</label><caption>
	<title>Original Melody Used for Probe Position 1</title><p><italic>Note.</italic> Use of melodic material with kind permission of Raphael Thöne.</p></caption><graphic xlink:href="jbdgm.195-fA.2" position="anchor" orientation="portrait"/></fig>
	
<fig id="fA.3" position="anchor" fig-type="figure" orientation="portrait"><label>Figure A3</label><caption>
	<title>Original Melody Used for Probe Position 2</title><p><italic>Note.</italic> Use of melodic material with kind permission of Raphael Thöne.</p></caption><graphic xlink:href="jbdgm.195-fA.3" position="anchor" orientation="portrait"/></fig>
	
</sec>
	
<sec id="app2"><title>Appendix 2: Composition Instructions for Music Students</title>
<sec><title>Your Task</title>
<p>Please complete the melodic example below as you wish but in line with the following rules: the continuation should have:</p>
<list id="L2" list-type="bullet">
<list-item>
<p>a length of 10–20 notes,</p></list-item>
<list-item>
<p>a range from D3 (d) to D5 (d‘‘),</p></list-item>
<list-item>
<p>different note durations (not only quarter notes),</p></list-item>
<list-item>
<p>a clear melodic climax.</p></list-item>
</list>
<p>Each person should submit 5 melodic continuations.</p>
<p>You can use musical notation, play the solutions on an instrument of your choice, sing it and make a recording, or enter it directly into a notation program (e.g., MuseScore). Transposing instruments can be notated as fingered.</p></sec>
<sec><title>Melodic Beginning for Group A (PP1)</title>
	<p><fig id="fA.4" position="anchor" fig-type="figure" orientation="portrait"><graphic xlink:href="jbdgm.195-fA.4" position="anchor" orientation="portrait"/></fig></p></sec>
<sec><title>Melodic Beginning for Group B (PP2)</title>
	<p><fig id="fA.5" position="anchor" fig-type="figure" orientation="portrait"><graphic xlink:href="jbdgm.195-fA.5" position="anchor" orientation="portrait"/></fig></p>
</sec></sec>
<sec id="app3"><title>Appendix 3: Prompt for the AI Agent ChatGPT (V3.5)</title>
<sec><title>Prompt (Syntax With Composition Instructions) for the AI Agent ChatGPT (V3.5)</title>
<p>Continue the given melody in the form of a list of (pitch, duration) pairs in Python syntax, where the pitch uses the MIDI pitch standard, and the duration represents the number of quarter notes. Use a pitch of None to represent a rest. Ensure the following:</p>
<list id="L3" list-type="bullet">
<list-item>
<p>The continuation stays between MIDI pitch 52 and MIDI pitch 86</p></list-item>
<list-item>
<p>The continuation is between 10 and 20 notes in length</p></list-item>
<list-item>
<p>The melody should have a calm character and be in the style of film music</p></list-item>
<list-item>
<p>The continuation should use a variety of note lengths</p></list-item>
<list-item>
<p>The continuation should have a clear melodic peak</p></list-item>
</list>
<p>Option (a):</p>
<p>melody_pp1 = [(62, 1), (74, 3), (74, 1), (72, 1), (71, 1), (71, 1), (69, 1), (67, 1), (63, 2)]</p>
<p><?pagebreak-before?>Option (b):</p>
<p>melody_pp2 = [(62, 1), (74, 3), (74, 1), (72, 1), (71, 1), (71, 1), (69, 1), (67, 1), (63, 2), (62, 0.5), (60, 0.5), (62, 2.5), (60, 0.25), (59, 0.25), (60, 1), (63, 1.5), (62, 0.5), (62, 3)]</p>
<p><italic>Note</italic>. Two melodic fragments of different lengths were used as inputs to be selected: (a) pp1 = probe position 1 (shorter melodic fragment, see <xref ref-type="fig" rid="fA.2">Figure A2</xref>); (b) pp2 = probe position 2 (longer melodic fragment; see <xref ref-type="fig" rid="fA.3">Figure A3</xref>).</p></sec></sec>
</app>
</app-group><ack><title>Acknowledgement</title>
<p>The authors thank Prof. Dr. Raphael Thöne for making his composition available as material for this study.</p></ack>
<fn-group><fn fn-type="conflict">
<p>RK is Editor-in-Chief and KS is Editorial Assistant of the <italic>Jahrbuch Musikpsychologie/Yearbook of Music Psychology</italic>. They were not involved in the editorial process of this manuscript.</p></fn></fn-group>
<ref-list><title>References</title>
<ref id="r1"><mixed-citation publication-type="web">Audacity Team. (2023). <italic>Audacity</italic> (Version 3.4.2) [Computer software]. <ext-link ext-link-type="uri" xlink:href="https://audacityteam.org">https://audacityteam.org</ext-link></mixed-citation></ref>
<ref id="r2"><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name name-style="western"><surname>Bigand</surname>, <given-names>E.</given-names></string-name>, &amp; <string-name name-style="western"><surname>Poulin-Charronnat</surname>, <given-names>B.</given-names></string-name></person-group> (<year>2006</year>). <article-title>Are we “experienced listeners”? A review of the musical capacities that do not depend on formal musical training.</article-title> <source>Cognition</source>, <volume>100</volume>(<issue>1</issue>), <fpage>100</fpage>–<lpage>130</lpage>. <pub-id pub-id-type="doi">10.1016/j.cognition.2005.11.007</pub-id><pub-id pub-id-type="pmid">16412412</pub-id></mixed-citation></ref>
<ref id="r3"><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name name-style="western"><surname>Charyton</surname>, <given-names>C.</given-names></string-name>, <string-name name-style="western"><surname>Jagacinski</surname>, <given-names>R. J.</given-names></string-name>, &amp; <string-name name-style="western"><surname>Merrill</surname>, <given-names>J. A.</given-names></string-name></person-group> (<year>2008</year>). <article-title>CEDA: A research instrument for creative engineering design assessment.</article-title> <source>Psychology of Aesthetics, Creativity, and the Arts</source>, <volume>2</volume>(<issue>3</issue>), <fpage>147</fpage>–<lpage>154</lpage>. <pub-id pub-id-type="doi">10.1037/1931-3896.2.3.147</pub-id></mixed-citation></ref>
<ref id="r4"><mixed-citation publication-type="book">Cope, D. (1991). <italic>Computers and musical style</italic>. A-R Editions.</mixed-citation></ref>
<ref id="r5"><mixed-citation publication-type="book">Cope, D. (1996). <italic>Experiments in musical intelligence</italic>. A-R Editions.</mixed-citation></ref>
<ref id="r6"><mixed-citation publication-type="book">Cope, D. (2000). <italic>The algorithmic composer</italic>. A-R Editions.</mixed-citation></ref>
<ref id="r7"><mixed-citation publication-type="book">Cope, D. (2001). <italic>Virtual music: Computer synthesis of musical style</italic>. MIT Press.</mixed-citation></ref>
<ref id="r8"><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name name-style="western"><surname>Deruty</surname>, <given-names>E.</given-names></string-name>, <string-name name-style="western"><surname>Grachten</surname>, <given-names>M.</given-names></string-name>, <string-name name-style="western"><surname>Lattner</surname>, <given-names>S.</given-names></string-name>, <string-name name-style="western"><surname>Nistal</surname>, <given-names>J.</given-names></string-name>, &amp; <string-name name-style="western"><surname>Aouameur</surname>, <given-names>C.</given-names></string-name></person-group> (<year>2022</year>). <article-title>On the development and practice of AI technology for contemporary popular music production.</article-title> <source>Transactions of the International Society for Music Information Retrieval</source>, <volume>5</volume>(<issue>1</issue>), <fpage>35</fpage>–<lpage>49</lpage>. <pub-id pub-id-type="doi">10.5334/tismir.100</pub-id></mixed-citation></ref>
<ref id="r9"><mixed-citation publication-type="web">Evanstein, M. (2023). <italic>SCAMP (Suite for Computer-Assisted Music in Python)</italic> (Version 0.9.2) [Computer software]. <ext-link ext-link-type="uri" xlink:href="http://scamp.marcevanstein.com/">http://scamp.marcevanstein.com/</ext-link></mixed-citation></ref>
<ref id="r10"><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name name-style="western"><surname>Faul</surname>, <given-names>F.</given-names></string-name>, <string-name name-style="western"><surname>Erdfelder</surname>, <given-names>E.</given-names></string-name>, <string-name name-style="western"><surname>Lang</surname>, <given-names>A.-G.</given-names></string-name>, &amp; <string-name name-style="western"><surname>Buchner</surname>, <given-names>A.</given-names></string-name></person-group> (<year>2007</year>). <article-title>G*power 3: A flexible statistical power analysis program for the social, behavioral, and biomedical sciences.</article-title> <source>Behavior Research Methods</source>, <volume>39</volume>(<issue>2</issue>), <fpage>175</fpage>–<lpage>191</lpage>. <pub-id pub-id-type="doi">10.3758/BF03193146</pub-id><pub-id pub-id-type="pmid">17695343</pub-id></mixed-citation></ref>
<ref id="r11"><mixed-citation publication-type="web">Föderation Deutscher Psychologenvereinigungen. (2022). <italic>Berufsethische Richtlinien</italic> [Guidelines for professional ethics]. <ext-link ext-link-type="uri" xlink:href="https://www.dgps.de/fileadmin/user_upload/PDF/Berufsetische_Richtlinien/BER-Foederation-20230426-Web-1.pdf">https://www.dgps.de/fileadmin/user_upload/PDF/Berufsetische_Richtlinien/BER-Foederation-20230426-Web-1.pdf</ext-link></mixed-citation></ref>
<ref id="r12"><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name name-style="western"><surname>Frieler</surname>, <given-names>K.</given-names></string-name>, &amp; <string-name name-style="western"><surname>Zaddach</surname>, <given-names>W.-G.</given-names></string-name></person-group> (<year>2022</year>). <article-title>Evaluating an analysis-by-synthesis model for jazz improvisation.</article-title> <source>Transactions of the International Society for Music Information Retrieval</source>, <volume>5</volume>(<issue>1</issue>), <fpage>20</fpage>–<lpage>34</lpage>. <pub-id pub-id-type="doi">10.5334/tismir.87</pub-id></mixed-citation></ref>
<ref id="r13"><mixed-citation publication-type="web">GEMA. (2024). <italic>AI and music: Generative artificial intelligence in the music sector</italic><italic>.</italic> <ext-link ext-link-type="uri" xlink:href="https://www.gema.de/en/news/ai-study">https://www.gema.de/en/news/ai-study</ext-link></mixed-citation></ref>
<ref id="r14"><mixed-citation publication-type="book">Gioti, A.-M. (2021). Artificial intelligence for music composition. In E. R. Miranda (Ed.), <italic>Handbook of artificial intelligence for music: Foundations, advanced approaches, and developments for creativity</italic> (pp. 53–73). Springer International Publishing.</mixed-citation></ref>
<ref id="r15"><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name name-style="western"><surname>Gioti</surname>, <given-names>A.-M.</given-names></string-name>, <string-name name-style="western"><surname>Einbond</surname>, <given-names>A.</given-names></string-name>, &amp; <string-name name-style="western"><surname>Born</surname>, <given-names>G.</given-names></string-name></person-group> (<year>2022</year>). <article-title>Composing the assemblage: Probing aesthetic and technical dimensions of artistic creation with machine learning.</article-title> <source>Computer Music Journal</source>, <volume>46</volume>(<issue>4</issue>), <fpage>62</fpage>–<lpage>80</lpage>. <pub-id pub-id-type="doi">10.1162/comj_a_00658</pub-id></mixed-citation></ref>
<ref id="r16"><mixed-citation publication-type="web">Google AI. (2023). <italic>Magenta</italic> (Version 2.0) [Computer software]. Google AI. <ext-link ext-link-type="uri" xlink:href="https://magenta.tensorflow.org">https://magenta.tensorflow.org</ext-link></mixed-citation></ref>
<ref id="r17"><mixed-citation publication-type="web">Jürgens, J. (2024, May 8). Alles aufgesaugt [All suctioned]. <italic>DIE ZEIT</italic>. <ext-link ext-link-type="uri" xlink:href="https://www.zeit.de/2024/21/kuenstliche-intelligenz-trainingsdaten-suche-google-meta-openai">https://www.zeit.de/2024/21/kuenstliche-intelligenz-trainingsdaten-suche-google-meta-openai</ext-link></mixed-citation></ref>
<ref id="r18"><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name name-style="western"><surname>Millet</surname>, <given-names>K.</given-names></string-name>, <string-name name-style="western"><surname>Buehler</surname>, <given-names>F.</given-names></string-name>, <string-name name-style="western"><surname>Du</surname>, <given-names>G.</given-names></string-name>, &amp; <string-name name-style="western"><surname>Kokkoris</surname>, <given-names>M. D.</given-names></string-name></person-group> (<year>2023</year>). <article-title>Defending humankind: Anthropocentric bias in the appreciation of AI art.</article-title> <source>Computers in Human Behavior</source>, <volume>143</volume>, <elocation-id>107707</elocation-id>. <pub-id pub-id-type="doi">10.1016/j.chb.2023.107707</pub-id></mixed-citation></ref>
<ref id="r19"><mixed-citation publication-type="book">Moffat, D. (2021). AI music mixing systems. In E. R. Miranda (Ed.), <italic>Handbook of artificial intelligence for music: Foundations, advanced approaches, and developments for creativity</italic> (pp. 345–375). Springer International Publishing. <pub-id pub-id-type="doi">10.1007/978-3-030-72116-9_13</pub-id></mixed-citation></ref>
<ref id="r20"><mixed-citation publication-type="web">Morris, M. R., Sohl-Dickstein, J., Fiedel, N., Warkentin, T., Dafoe, A., Faust, A., Farabet, C., &amp; Legg, S. (2024). <italic>Levels of AGI: Operationalizing progress on the path to AGI</italic>. arXiv. <ext-link ext-link-type="uri" xlink:href="http://arxiv.org/pdf/2311.02462v2">http://arxiv.org/pdf/2311.02462v2</ext-link></mixed-citation></ref>
<ref id="r21"><mixed-citation publication-type="web">MuseScore Team. (2023). <italic>MuseScore</italic> (Version 4.0) [Computer software]. <ext-link ext-link-type="uri" xlink:href="https://musescore.com/about">https://musescore.com/about</ext-link></mixed-citation></ref>
<ref id="r22"><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name name-style="western"><surname>Oksanen</surname>, <given-names>A.</given-names></string-name>, <string-name name-style="western"><surname>Cvetkovic</surname>, <given-names>A.</given-names></string-name>, <string-name name-style="western"><surname>Akin</surname>, <given-names>N.</given-names></string-name>, <string-name name-style="western"><surname>Latikka</surname>, <given-names>R.</given-names></string-name>, <string-name name-style="western"><surname>Bergdahl</surname>, <given-names>J.</given-names></string-name>, <string-name name-style="western"><surname>Chen</surname>, <given-names>Y.</given-names></string-name>, &amp; <string-name name-style="western"><surname>Savela</surname>, <given-names>N.</given-names></string-name></person-group> (<year>2023</year>). <article-title>Artificial intelligence in fine arts: A systematic review of empirical research.</article-title> <source>Computers in Human Behavior</source>, <volume>1</volume>(<issue>2</issue>), <elocation-id>100004</elocation-id>. <pub-id pub-id-type="doi">10.1016/j.chbah.2023.100004</pub-id></mixed-citation></ref>
<ref id="r23"><mixed-citation publication-type="book">OpenAI. (2023). <italic>ChatGPT</italic> (Version 3.5) [Computer software].</mixed-citation></ref>
<ref id="r24"><mixed-citation publication-type="book">Pachet, F., Roy, P., &amp; Carré, B. (2021). Assisted music creation with Flow Machines: Towards new categories of new. In E. R. Miranda (Ed.), <italic>Handbook of artificial intelligence for music: Foundations, advanced approaches, and developments for creativity</italic> (pp. 485–520). Springer International Publishing. <pub-id pub-id-type="doi">10.1007/978-3-030-72116-9_18</pub-id></mixed-citation></ref>
<ref id="r25"><mixed-citation publication-type="web">R Core Team. (2023). <italic>R: A language and environment for statistical computing</italic> (Version 4.3.2) [Computer software]. R Foundation for Statistical Computing. <ext-link ext-link-type="uri" xlink:href="https://www.R-project.org/">https://www.R-project.org/</ext-link></mixed-citation></ref>
<ref id="r26"><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name name-style="western"><surname>Rohrmeier</surname>, <given-names>M.</given-names></string-name></person-group> (<year>2022</year>). <article-title>On creativity, music’s AI completeness, and four challenges for artificial musical creativity.</article-title> <source>Transactions of the International Society for Music Information Retrieval</source>, <volume>5</volume>(<issue>1</issue>), <fpage>50</fpage>–<lpage>66</lpage>. <pub-id pub-id-type="doi">10.5334/tismir.104</pub-id></mixed-citation></ref>
<ref id="r27"><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name name-style="western"><surname>Schmuckler</surname>, <given-names>M. A.</given-names></string-name></person-group> (<year>1989</year>). <article-title>Expectation in music: Investigation of melodic and harmonic processes.</article-title> <source>Music Perception</source>, <volume>7</volume>(<issue>2</issue>), <fpage>109</fpage>–<lpage>149</lpage>. <pub-id pub-id-type="doi">10.2307/40285454</pub-id></mixed-citation></ref>
<ref id="r28"><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name name-style="western"><surname>Shank</surname>, <given-names>D. B.</given-names></string-name>, <string-name name-style="western"><surname>Stefanik</surname>, <given-names>C.</given-names></string-name>, <string-name name-style="western"><surname>Stuhlsatz</surname>, <given-names>C.</given-names></string-name>, <string-name name-style="western"><surname>Kacirek</surname>, <given-names>K.</given-names></string-name>, &amp; <string-name name-style="western"><surname>Belfi</surname>, <given-names>A. M.</given-names></string-name></person-group> (<year>2023</year>). <article-title>AI composer bias: Listeners like music less when they think it was composed by an AI.</article-title> <source>Journal of Experimental Psychology. Applied</source>, <volume>29</volume>(<issue>3</issue>), <fpage>676</fpage>–<lpage>692</lpage>. <pub-id pub-id-type="doi">10.1037/xap0000447</pub-id><pub-id pub-id-type="pmid">36006713</pub-id></mixed-citation></ref>
<ref id="r29"><mixed-citation publication-type="web">Steinbeck, W. (2016). Würfelmusik [Dice music]. In <italic>MGG Online</italic><italic>.</italic> <ext-link ext-link-type="uri" xlink:href="https://www.mgg-online.com/mgg/stable/12552">https://www.mgg-online.com/mgg/stable/12552</ext-link></mixed-citation></ref>
	<ref id="r30"><mixed-citation publication-type="web"><person-group person-group-type="author"><string-name name-style="western"><surname>Strawn</surname>, <given-names>J.</given-names></string-name>, &amp; <string-name name-style="western"><surname>Shockley</surname>, <given-names>A.</given-names></string-name></person-group> (<year>2014</year>). Computers and music. In <italic>Grove Music Online</italic>. Oxford University Press.  <ext-link ext-link-type="uri" xlink:href="https://doi.org/10.1093/gmo/9781561592630.article.A2256184">https://doi.org/10.1093/gmo/9781561592630.article.A2256184</ext-link></mixed-citation></ref>
<ref id="r31"><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name name-style="western"><surname>Tigre Moura</surname>, <given-names>F.</given-names></string-name>, &amp; <string-name name-style="western"><surname>Maw</surname>, <given-names>C.</given-names></string-name></person-group> (<year>2021</year>). <article-title>Artificial intelligence became Beethoven: How do listeners and music professionals perceive artificially composed music?</article-title> <source>Journal of Consumer Marketing</source>, <volume>38</volume>(<issue>2</issue>), <fpage>137</fpage>–<lpage>146</lpage>. <pub-id pub-id-type="doi">10.1108/JCM-02-2020-3671</pub-id></mixed-citation></ref>
<ref id="r32"><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name name-style="western"><surname>Unyk</surname>, <given-names>A. M.</given-names></string-name>, &amp; <string-name name-style="western"><surname>Carlsen</surname>, <given-names>J. C.</given-names></string-name></person-group> (<year>1987</year>). <article-title>Influence of expectation on melodic perception.</article-title> <source>Psychomusicology: Music, Mind, and Brain</source>, <volume>7</volume>(<issue>1</issue>), <fpage>3</fpage>–<lpage>23</lpage>. <pub-id pub-id-type="doi">10.1037/h0094189</pub-id></mixed-citation></ref>
<ref id="r33"><mixed-citation publication-type="web">Villalobos, P., Ho, A., Sevilla, J., Besiroglu, T., Heim, L., &amp; Hobbhahn, M. (2024). <italic>Will we run out of data? Limits of LLM scaling based on human-generated data</italic>. arXiv. <ext-link ext-link-type="uri" xlink:href="http://arxiv.org/pdf/2211.04325v2">http://arxiv.org/pdf/2211.04325v2</ext-link></mixed-citation></ref>
<ref id="r34"><mixed-citation publication-type="web">Vincent, J. (2016, September 26). This AI-written pop song is almost certainly a dire warning for humanity. <italic>The Verge</italic>. <ext-link ext-link-type="uri" xlink:href="https://www.theverge.com/2016/9/26/13055938/ai-pop-song-daddys-car-sony">https://www.theverge.com/2016/9/26/13055938/ai-pop-song-daddys-car-sony</ext-link></mixed-citation></ref>
<ref id="r35"><mixed-citation publication-type="other">Webster, P. A. (1994). <italic>Measure of creative thinking in music (MCTM): Administrative guidelines</italic> [Unpublished manuscript].</mixed-citation></ref>
<ref id="r36"><mixed-citation publication-type="web">Xie, Q., Dai, Z., &amp; Hovy, E., Luong, M.-T., &amp; Le, Q. V. (2020). <italic>Unsupervised data augmentation for consistency training</italic>. 34th Conference on Neural Information Processing Systems. <ext-link ext-link-type="uri" xlink:href="https://proceedings.neurips.cc/paper/2020/hash/44feb0096faa8326192570788b38c1d1-Abstract.html">https://proceedings.neurips.cc/paper/2020/hash/44feb0096faa8326192570788b38c1d1-Abstract.html</ext-link></mixed-citation></ref>
<ref id="r37"><mixed-citation publication-type="web">Xiong, Z., Wang, W., Yu, J., Lin, Y., &amp; Wang, Z. (2023). <italic>A comprehensive survey for evaluation methodologies of AI-generated music</italic>. arXiv. <ext-link ext-link-type="uri" xlink:href="http://arxiv.org/pdf/2308.13736v1">http://arxiv.org/pdf/2308.13736v1</ext-link></mixed-citation></ref>
<ref id="r38"><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name name-style="western"><surname>Yin</surname>, <given-names>Z.</given-names></string-name>, <string-name name-style="western"><surname>Reuben</surname>, <given-names>F.</given-names></string-name>, <string-name name-style="western"><surname>Stepney</surname>, <given-names>S.</given-names></string-name>, &amp; <string-name name-style="western"><surname>Collins</surname>, <given-names>T.</given-names></string-name></person-group> (<year>2023</year>). <article-title>Deep learning’s shallow gains: A comparative evaluation of algorithms for automatic music generation.</article-title> <source>Machine Learning</source>, <volume>112</volume>(<issue>5</issue>), <fpage>1785</fpage>–<lpage>1822</lpage>. <pub-id pub-id-type="doi">10.1007/s10994-023-06309-w</pub-id></mixed-citation></ref>
<ref id="r39"><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name name-style="western"><surname>Zhang</surname>, <given-names>J. D.</given-names></string-name>, &amp; <string-name name-style="western"><surname>Schubert</surname>, <given-names>E.</given-names></string-name></person-group> (<year>2019</year>). <article-title>A single item measure for identifying musician and nonmusician categories based on measures of musical sophistication.</article-title> <source>Music Perception</source>, <volume>36</volume>(<issue>5</issue>), <fpage>457</fpage>–<lpage>467</lpage>. <pub-id pub-id-type="doi">10.1525/mp.2019.36.5.457</pub-id></mixed-citation></ref>
</ref-list>
	<sec sec-type="data-availability" id="das"><title/>
		<p>The research data for this article are available (see <xref ref-type="bibr" rid="sp1_r1">Schreiber et al., 2024</xref>).</p>
	</sec>	

	
	
	<sec sec-type="supplementary-material" id="sp1"><title/>
		<p>For this article, R scripts, data, codebook, and musical stimuli are available (see <xref ref-type="bibr" rid="sp1_r1">Schreiber et al., 2024</xref>).</p>
		
		<ref-list content-type="supplementary-material" id="suppl-ref-list">
			<ref id="sp1_r1">
				<mixed-citation publication-type="supplementary-material">
					<person-group person-group-type="author">
							<name name-style="western">
								<surname>Schreiber</surname>
								<given-names>A.</given-names>
							</name>
							<name name-style="western">
								<surname>Sander</surname>
								<given-names>K.</given-names>
							</name>
							<name name-style="western">
								<surname>Kopiez</surname>
								<given-names>R.</given-names>
							</name>
							<name name-style="western">
								<surname>Thöne</surname>
								<given-names>R.</given-names>
							</name>
					</person-group> (<year>2024</year>). <source>The creative performance of ChatGPT and Google Magenta compared to human-based solutions in a standardized melody continuation task</source> <comment>[Data, codebook, code, stimuli]</comment>. <publisher-name>OSF</publisher-name>. <ext-link ext-link-type="uri" xlink:href="https://osf.io/qj8fp">https://osf.io/qj8fp</ext-link>		
				</mixed-citation>
			</ref>
		</ref-list>
	</sec>	

<fn-group>
<fn fn-type="financial-disclosure"><p>The authors have no funding to report.</p></fn>
</fn-group>
</back>
</article>
