Over a decade ago, researchers found that a large proportion of experimental results fail to replicate (Open Science Collaboration, 2015), triggering the so-called replication crisis in psychology (also called crisis of confidence; Pashler & Wagenmakers, 2012). This led to critical examinations of research methods and practices. Concepts like publication bias, p-hacking, or the file-drawer problem became widely discussed. Still, shifting toward more transparent and rigorous practices remains challenging and slow (Sharpe, 2013; Washburn et al., 2018).
To observe and support such change, research focusing on academic research cultures is essential. This so-called meta-research (also known as scientometrics or science on science) uses methods like bibliometrics to analyse publication patterns. It enables the identification of systemic problems, development of interventions, and evaluation of ongoing developments (Hardwicke, Serghiou, et al., 2020). Beyond bibliometrics, meta-research also uses additional approaches: Surveys for example can capture researchers’ attitudes and experiences, while qualitative methods like interviews provide deeper insight into research practices. Moreover, FAIR (findable, accessible, interoperable, reusable) data assessment tools allow the systematic evaluation of the FAIRness of a dataset.
The present paper contributes to this field by analysing publication practices in German and German-speaking music psychology, focusing on contributions to the Yearbook of Music Psychology.
Open Science
Open science can be considered a response to problematic research practices exposed during the replication crisis. It promotes transparency by making all stages of research—such as study plans, methods, data, software, and reports—accessible to other researchers and the public. A wide range of practices are summarised under the umbrella term open science (e.g., see da Silveira et al., 2023).
This study focuses on key open science aspects including preregistration, open materials, open data, and the presence of funding and conflict of interest statements—similar to previous work (Eerola, 2025; Hardwicke et al., 2022, 2024; Hardwicke, Wallach, et al., 2020; Iqbal et al., 2016; Wallach et al., 2018).
Preregistration refers to registering a study’s hypotheses, design, measures, and planned analyses before data collection. It helps distinguish confirmatory from exploratory research by reducing the researcher’s leeway while conducting and analysing their study (Nosek et al., 2018, 2019). Although it is standard in medical research, preregistration is still rare in the humanities and social sciences, including psychology.
Open materials and open data refer to the public availability of primary research data and associated resources. While research data is generally used to describe all (digital) data that is generated during the research progress or is its result (Kindling & Schirmbacher, 2013, p. 130), this study defines data more narrowly: as primary data from measured subjects—often people, but in music psychology sometimes also songs or media sources. Materials include additional files such as analysis scripts, stimuli, or questionnaires. We examined whether such data and materials were openly available.
Finally, funding and conflict of interest statements contribute to transparency. On the one hand, funding, especially from non-public sources, may influence the study’s conduct and should be disclosed. On the other hand, funders can be acknowledged by reporting their support. Conflicts of interest arise when personal or professional ties might bias the research. Reporting both enhances the credibility of scientific work.
New Statistics
Instead of the “classical” approach of data analysis, the null hypothesis significance testing (NHST) focusing on p-values indicating statistical significance, the new statistics stand for an estimation approach employing effect sizes, confidence intervals (CIs) and meta-analyses (Cumming & Calin-Jageman, 2017). Here, by new statistics, we mean this specific approach, not newer statistical approaches in general, such as Bayesian statistics.
Many publications have critically discussed NHST, p-values and the respective .05 criterion (Cohen, 1994; Wasserstein & Lazar, 2016) with some of them coming to the conclusion that research might be better off without p-values. Indeed, when given an effect size of an effect (point estimate, for example “The support for proposition X is 53%.”; from Cumming, 2014, p. 11) supplemented with a confidence interval (interval estimate) indicating the precision in the observation (i.e., “The poll had an error margin of 2%”; example from Cumming, 2014, p. 11), all necessary information is transported. In contrast, the statement “The support for proposition X is greater than 50% (p < .01)” is less informative. Regardless, the use of NHST and new statistics is not mutually exclusive but the classical approach of calculating a p-value can be supplemented by reporting an effect size and the corresponding confidence interval. To gain insights into the statistical approaches used in the Yearbook of Music Psychology, we coded whether studies reported p-values (as an indicator for NHST), effect sizes, and confidence intervals.
Additionally, we observed how researchers determined their sample size and whether they reported if any data or subjects were excluded from the analysis. Strictly speaking, these practices might not belong to the new statistics but since they are central to rigorous research and relate closely to statistical properties (i.e., statistical power), for this paper, we also summarized them under the heading new statistics.
The Yearbook of Music Psychology
The Yearbook of Music Psychology (German: Jahrbuch Musikpsychologie, https://jbdgm.psychopen.eu) was founded in 1983 and is the official scientific outlet of the German Society for Music Psychology (German: Deutsche Gesellschaft für Musikpsychologie, DGM; Wöllner et al., 2023). Volumes were published approximately annually in the form of edited books with the research reports and other contributions as chapters. Since the 1980s, contributions could also be published in English (Wöllner et al., 2023, p. 9). Over the years, the Yearbook was edited by ten researchers, working in teams of up to three. Editorial assistants were introduced for Volume 16 (2002) to support the submission and publication processes and were held by two to four junior researchers at a time.
After Volume 27, major changes took place and the Yearbook was transformed into a modern journal: From Volume 28 onwards, the Yearbook used the portal PsychOpen GOLD (https://psychopen.eu) by the Leibniz Institute for Psychology (ZPID, https://leibniz-psychology.org/en) and research contributions were released as online first publications on the Yearbooks website. All publications were open access without the authors having to pay an article processing charge (APC), which can be termed as diamond open access. After some time, the respective articles and additional spots, close-ups or reviews were collected into a print volume which was published with Waxmann. In this way, the Yearbook continues and has recently completed its Volume 32.
After the transformation to an open access journal, the DGM bought all old volumes from the original publisher and made all older content publicly available under CC BY 4.0 license. Therefore nowadays, all articles ever published in the Yearbook are available via the journal’s website (https://jbdgm.psychopen.eu/index.php/JBDGM/archive).
Even in comparison to international journals rooted in music psychology, the Yearbook has a longstanding tradition: For example, Psychology of Music was first published in 1973, Music Perception in 1983, and Musicae Scientiae in 1997. To our knowledge, the Yearbook has always been the only German outlet for music psychological research and therefore offers a unique hub for German, Austrian, Swiss, or German-speaking researchers. With the possibilities the journal offers today (especially diamond open access), it is not only attractive for the local community but could also be interesting for international researchers to submit their articles.
Bibliometric Analyses as a Method of Meta-Research
The described developments (open science and new statistics, a journal changing to publishing open access) can themselves be the subject of research, as mentioned in the beginning. One method of this meta-research is bibliometrics (see Ninkov et al., 2022, for an overview) which treats publications as subjects and quantitatively analyses the amount of publications in a specific field of research over time, authorship, citations, keywords, or other metadata of the articles. The necessary data can nowadays be extracted from databases such as Web of Science, Scopus or the open and non-profit platform OpenAlex. In addition to the readily available information, supplementary properties of articles can be coded manually or extracted by scripts using, for example, specific search terms. Thereby, information on the studies’ methods, employed and reported statistical means, or frame criteria like funding, author contributions or conflicts of interests can be extracted and analysed.
All in all, bibliometric analyses can yield detailed insights into publication practices. On their basis, potential problems can be identified and investigated (phases 1 and 2 of meta research as formulated by Hardwicke, Serghiou, et al., 2020), or solutions that had been introduced to tackle a problem can be evaluated (phase 4 of meta research, Hardwicke, Serghiou, et al., 2020).
Aims, Research Questions and Hypotheses
This study pursued two main goals: (1) to examine past practices to inform future guidelines and the development of the JBDGM, and (2) to compare the JBDGM with international journals from related fields to contextualise German music psychology publication practices.
To address these aims, research questions (RQs) and corresponding hypotheses (Hs) were formulated:
RQ1) How prevalent were open science practices in the JBDGM over time?
H1.1) The prevalence of (a) pre-registered studies, (b) open materials, and (c) open research data increased over time.
H1.2) Statements regarding (a) funding and (b) a potential conflict of interest became more prevalent over time.
RQ2) How prevalent was the use of new statistics in the JBDGM over time?
H2.1) The use of p-values as an indicator of null hypothesis significance testing (NHST) was high for all considered years and volumes.
H2.2) The reporting of (a) confidence intervals (CIs) and (b) effect sizes (such as Cohen's d or eta-squared) increased over time.
H2.3) Statements about (a) the determination of the sample size and (b) stopping rules or criteria for data exclusion became more prevalent over time.
RQ3) How do the findings about the JBDGM compare to findings about other journals regarding the surveyed practices and variables? The results will be compared to the following studies: Anglada-Tort and Sanfilippo (2019), Eerola (2025), Giofrè et al. (2017, 2023), Hardwicke, Wallach, et al. (2020), and Hardwicke et al. (2022, 2024).
RQ4) General bibliometric properties of the JBDGM: How developed authorship over time?
H4.1) The number of authors per paper increased over time.
Method
To investigate the research questions and hypotheses, a bibliometric analysis was conducted. For an overview of the employed method, see Figure 1. Data were retrieved from the database PubPsych and complemented by manual coding of additional variables. The final dataset was quantitatively analysed. All steps from Figure 1 are detailed in the following sections.
Figure 1
Overview of the Methods Including Preparation and Planning of the Study, Data Collection, and Data Analysis and Results
Preparation and Planning
Preregistration of This Study
The present study was preregistered before the data on open science and new statistics practices were gathered. The preregistration was submitted on 26th April 2024 on the OSF (see Düvel & Altemeier, 2024). The study was conducted as preregistered, with minor deviations noted in the methods description.
Selection of the Sample
To examine recent shifts in publication practices in the Yearbook of Music Psychology, the most recent volumes were analysed. Since open science and new statistical practices are primarily documented in quantitative studies, only “research reports” and “research reports on thematic focus” were included. Other formats (e.g., spots, close-ups, reviews) were excluded. All eligible reports—including quantitative, qualitative, and non-empirical articles—were coded.
As preregistered, volumes up to Volume 31 were initially included, but during coding, five online-first papers from Volume 32 were added to capture recent developments. Data collection was extended back to Volume 22 (2012) to cover key changes. A stopping rule was defined: If none of the three earliest coded volumes (Vols. 22–24) contained open science practices (preregistration, open materials, or open data), coding would stop. If any such practice was found, earlier volumes would be added iteratively until three consecutive volumes showed no such practices. As no such practices were found in Volumes 22 to 24, data collection ended there. In total, research reports from Volumes 22 to 32 (including all papers through June 2024) were coded, resulting in N = 79 papers (see Figure 2 for distribution).
Figure 2
Number of Publications for Each Volume
Data Collection
Data collection involved three phases: (1) existing data from PubPsych was reviewed, downloaded, and cleaned; (2) the authors (ND and FA) coded variables on new statistics and open science; (3) remaining uncertainties were resolved through discussion, yielding the final dataset.
Phase 1: Data From PubPsych Including the General Article Characteristics
First, all available bibliometric data on the Yearbook was reviewed to retrieve general information (e.g., title, authors) and streamline manual coding. Metadata for all articles from Volume 22 onward was sought. According to the journal's website (https://jbdgm.psychopen.eu), the Yearbook is indexed in several databases, but most (DOAJ, Dimensions, BASE, Scilit, EBSCO, Semantic Scholar) cover only articles from Volume 28 onward, following the shift to online-first, open access publishing. DNB and OOIR list only the journal, not individual articles. While Google Scholar and PsychArchives index all articles, they do not provide a practical way to download metadata. Only PubPsych enables comprehensive metadata retrieval for all volumes using the search term "Jahrbuch Musikpsychologie."
All 257 records were downloaded from PubPsych as a RIS file. Initial inspection showed unclean data, including duplicates, encoding errors, unclear variable names, and missing entries. The data was cleaned in OpenRefine (https://openrefine.org): empty columns were deleted, authors split into separate variables, duplicates removed, and volume numbers retrieved. This resulted in 179 unique entries (from Vols. 1–30), though some older papers were missing. Key variables (volume, year, title, authors, address, language, doi, abstract, keywords, database) were generally complete. Keywords, though untidy and not analysed in this study, were retained for potential future use. Finally, the dataset was imported into R to generate a unique key for each paper—combining volume number, the first three letters of the first author’s surname, and the first three letters of the title. The final dataset was then exported as a CSV file (“03_PubPsych-Datenexport_N=179_fromR.csv”; see online supplementary material).
Phase 2: Coding Procedure Via an Online Questionnaire
Main variables on open science practices and statistical methods were manually coded by the two authors (ND and FA), following coding prompts similar to the studies by Giofrè et al. (2017, 2023). To avoid errors due to coding in a spreadsheet, the coding form was implemented as an online questionnaire using SoSci Survey (https://www.soscisurvey.de; see questionnaire in the online supplementary material). Metadata from Phase 1 pre-filled the questionnaire fields: after entering each paper’s key, the corresponding case from the CSV file was retrieved and variables pre-filled, so coders only needed to check the information instead of re-entering it.
After recording general article characteristics, the research design was coded. For non-quantitative designs (theoretical, qualitative), coding ended; for quantitative designs (experimental, quasi-experimental, observational), items on new statistics and open science followed. Most variables included an “I’m not sure” option and a comment field was available on each page to avoid coding mistakes resulting from uncertainties. The two authors coded all papers from the relevant sections. The papers from each volume were split between the two coders in alternating order, with three papers double coded to assess inter-coder reliability. Completed data was exported from SoSci Survey as a CSV file with an R import script.
Phase 3: Finalising the Dataset
In the final phase, the authors discussed all remaining uncertainties and comments, documented decisions, and updated the dataset accordingly using an R script. Duplicates from double-coding were removed, and all “I’m not sure” ratings were eliminated. The final dataset was exported from R as a CSV file (“04_Jahrbuch_dataset.csv”; see online supplementary material).
Description of the Final Dataset: Variables and Measures
The dataset includes variables for coder and key (based on volume number, first author, and title). General article characteristics from PubPsych are: title, authors, year, language, DOI, volume, abstract, and institution of the corresponding or first author. Additional manually coded variables include: country of origin, PDF filename, replication status, study design, and comments. For new statistics, the presence of p-values, confidence intervals, effect sizes, sample size determination, and data exclusion was recorded. For open science, preregistration, open materials, open data, funding, and conflict of interest statements were coded. Criteria for coding new statistics and open science items were set liberally—following Giofrè et al. (2017, 2023)—to allow for direct comparison and to detect early changes in publication practices.
Data Analysis
The dataset comprises the full population of research reports in the JBDGM for the selected volumes. Analyses will focus on the proportion of papers per volume exhibiting each characteristic, visualised with stacked bar plots for each outcome variable. The number of authors per paper will be calculated and its change over time shown using boxplots or error bar charts, facilitating comparison with Anglada-Tort and Sanfilippo (2019). Proportions for categories across all volumes will also be calculated. Although the preregistration specified that the data would be analysed solely using descriptive statistics, we later opted to extend our analysis by incorporating linear regressions to assess the temporal trends of a variable, as frequently stated in the hypotheses. All analyses will be conducted in R and RStudio, with the script “06_Jahrbuch_analysis.R” available in the online material.
Results
Bibliometric information of N = 79 research reports from volumes 22 to 32 were analysed. The results regarding their general bibliometric properties (and the corresponding hypotheses) will be presented first (notwithstanding this being Research Question 4).
Then, the results of the sub-sample including the n = 59 quantitative papers regarding Research Questions 1 (open science), 2 (new statistics), and 3 (comparisons) will be reported.
General Bibliometric Properties (Including RQ4)
First, general bibliometric properties of the articles will be reported for all N = 79 articles. They include language, country of origin, replication, research design, and authorship—the final being the focus of RQ4 and the associated hypothesis.
Language
The Yearbook is open to submissions in German and English language. The proportions of research reports in German and English are displayed in Figure 3. The Yearbook started with only German publications in Volume 22 to 24.
Figure 3
Language of the Surveyed Research Reports
Note. The red dashed line indicates the regression line (statistical details in the text) for the proportion of English articles and features a slope of 5.6%.
From Volume 25 on, an increase of the proportion of English publications is visible. In the current Volume 32, most papers (4 of 5) are published in English. Although no specific hypothesis was formulated, we quantified this increase by calculating a linear regression: F(1, 9) = 14.7, p = .004, with R2 = 62.0% of the variance explained by the model. The slope of the regression line of 5.6% indicates that, on average, every volume included 5.6% more English articles than the previous one.
Country of Origin
As operationalised by Giofrè et al. (2017, 2023), the country of origin of a paper equals the country of the corresponding author’s institution. For the articles in Volumes 27 and older, no corresponding author was indicated. In these instances, the institution of the first author was used, wherever it was available (n = 5 missing cases). The countries of origin are displayed in Figure 4. The majority of papers stem from researchers at German institutions (89.2%). Sporadic publications stem from the German-speaking countries Austria (4.1%) and Switzerland (1.4%), or from the UK (5.4%, published by German researchers that are employed in the UK).
Figure 4
Countries of Origin of the Surveyed Research Reports (n = 74)
Replication
Disregarding the actual research design, it was coded whether the paper reports a replication study or not. The criterion was set very liberally: Not only strict replications were included but also, for example, partial replications. In the dataset, three studies were coded as replication studies (3.8%).
Research Design
The research design was coded based on the information in the abstract, methods, and results sections of a paper. The different designs distribute across the volumes as displayed in Figure 5. Visual inspection reveals that the proportion of experimental studies (black-striped in Figure 5) increased over time. Analogously, the proportion of non-empirical articles (white-dotted; e.g., theoretical contributions or reviews of a topic) decreased. Only a few papers feature quasi-experimental, meta-analytical (here in the broader sense of a paper jointly analysing several quantitative datasets), or qualitative designs. The remaining proportion of observational studies (grey-chequered in Figure 5) remains similar over time at about 50% of the publications.
Figure 5
Research Designs of the Surveyed Research Reports
Summed across all volumes, 48.1% of research reports presented observational studies, 24.1% non-empirical studies, 19.0% experimental studies, 3.8% meta-analyses, 3.8% quasi-experimental studies and 1.2% qualitative studies.
Authorship
Research Question 4 specifically aimed at the development of authorship over time. The number of authors per paper was averaged across each volume and is displayed in Figure 6.
Figure 6
Boxplots of the Number of Authors per Paper for Each Volume
Note. The red dashed line indicates the regression line (statistical details in the text) for the average number of authors per paper and features a slope of 6.4%.
The number of authors increased over time: For Volume 22, the median was 1 author, since Volume 24, it was always 2 authors. For the most recent papers from Volume 32, the upper quartile reached 4 authors per paper. A significant linear regression supported the visual impression: F(1, 9) = 6.68, p = .029, R2 = 42.6%. On average, in each volume the articles had 6.4% more authors than the articles in the previous volume. All in all, Hypothesis 4.1 could be confirmed. Summed across all volumes, the papers had on average (median) of 2 authors (25% quartile = 1, 75% quartile = 3).
The preregistration included the research question regarding the identification of the most prominent authors. To avoid highlighting individual authors in the manner of a Hall of Fame, we decided against listing author names here (see “08_most prevalent authors.pdf” in the online supplementary material for a non-anonymised list). In summary, one author has published seven papers, one has published six, one has published five, two have published four each, and eight have published three each. All other authors published two or just one paper in the selected volumes. All in all, 123 distinct authors are represented in the dataset.
Open Science Practices: RQ1
The following results on open science practices and new statistics (next subsection) are reported on the basis of the n = 59 quantitative research reports in the dataset. The leading research question was: “How prevalent are open science practices in the JBDGM over time?” (RQ1).
Preregistration
For all quantitative papers it was coded whether the authors reported that their investigation had been pre-registered. No paper reported a preregistration—this practice has not yet entered the German music psychology publication practices. Therefore, H1.1a has to be rejected.
Open Materials
Each quantitative research report was scanned and searched for statements regarding the availability of research materials, e.g., questionnaires, stimuli (audio examples), or analysis scripts. If only one material (e.g., only the questionnaire but not the stimuli) was available, this variable was already coded positively. The results are displayed in Figure 7. In most cases (30.5% overall), the materials could be downloaded directly, only a few cases (3.4%) stated that they are available upon request. Overall, 66.1% of the quantitative papers included no statement about the availability of research materials. In the most recent volumes, the majority of papers report availability of at least some research materials. A significant linear regression supported the visual impression of an increase in available materials (be it on request or downloadable), F(1, 9) = 68, p < .001, R2 = 88.3%. On average, each volume had 9.4% more articles with some available materials than the previous volume. Overall, H1.1b was accepted.
Figure 7
Statements Regarding Open Materials of the n = 59 Quantitative Research Reports
Note. Additional categories were available during the coding procedure: The category statement says materials are not shared had not been used. Also, in all cases where the paper stated that the material can be downloaded (white category in the figure), the material could indeed be accessed and downloaded by the coders. The red dashed line indicates the regression line (statistical details in the text) for the proportion of papers with available material (available upon request or can be downloaded/accessed) and features a slope of 9.4%.
Open Data
As detailed in the introduction, data was understood here as the table with the characteristics of the subjects regarding variables. The analysis of the statements regarding the availability of the data is displayed in Figure 8.
Figure 8
Statements Regarding Open Data of the n = 59 Quantitative Research Reports
Note. Whenever the statement said that the data can be directly accessed and downloaded, the authors could successfully retrieve the data. The response option statement says data can be downloaded/accessed – it was not successful (page down, ...) has not been used and the category was therefore omitted from the figure. The red dashed line indicates the regression line (statistical details in the text) for the proportion of papers with available material (available upon request or can be downloaded/accessed) and features a slope of 12.3%.
Since Volume 28, almost all quantitative papers included statements regarding the data availability. Roughly half have made their dataset available for download (dark grey in Figure 8). In some cases, the data is available upon request (light grey). Only a few papers state that the data cannot be published due to data protection issues (white). Therefore, in recent years, most authors published their dataset alongside the paper or at least stated that it is available upon request.
Summed across all volumes, 55.9% of quantitative research reports included no statement regarding data availability, in 25.4% the data was available, in 15.2% it was stated that the data is available on request, and 3.3% stated that the data is not shared. The proportion of papers with data publications did not increase stepwise over time but rather suddenly between Volume 27 and 28. A significant linear regression supported the hypothesis of an increase in available data (be it on request or downloadable): F(1, 9) = 33.5, p < .001, R2 = 78.8%. On average, each volume had 12.3% more articles with available research data than the previous volume. Therefore, H1.1c is accepted.
Funding Statement
For reasons of transparency as well as acknowledgement, funders should be reported in the publications of the associated research projects. Until Volume 27, funders were rarely reported, and the statement was optional (see Figure 9). From Volume 28 on, a statement regarding funding was obligatory, although still most of the publications did not report any funding. A significant linear regression supported the hypothesis of an increase in funding statements: F(1, 9) = 24.9, p < .001, R2 = 73.5%. On average, each volume had 12.9% more articles with funding statements than the previous volume. Therefore, funding statements did become more prevalent over time and H1.2a was accepted.
Figure 9
Statements Regarding Funding of the n = 59 Quantitative Research Reports
Note. The red dashed line indicates the regression line (statistical details in the text) for the proportion of papers with funding statements and features a slope of 12.9%.
Summed across all volumes, 49.2% of quantitative research reports included no funding statement, 35.6% stated that there was no funding, and 15.3% named funders.
Conflicts of Interest Statement
Similarly to the funding statement, a statement regarding potential conflicts of interest enhances transparency. This practice was not employed until Volume 27 and became obligatory from Volume 28 onwards. Only one quantitative research report (1.7%) stated potential conflicts of interest (see Figure 10). 52.5% included no statement regarding conflicts of interest, 45.8% stated that the authors reported no conflict of interest. A significant linear regression supported the hypothesis of an increase in conflicts of interest statements: F(1, 9) = 27, p < .001, R2 = 75.0%. On average, each volume had 13.6% more articles with funding statements than the previous volume. All in all, conflicts of interest statements did become more prevalent over time and H1.2b was also accepted.
Figure 10
Statements Regarding Potential Conflicts of Interest of the n = 59 Quantitative Research Reports
Note. The red dashed line indicates the regression line (statistical details in the text) for the proportion of papers with funding statements and features a slope of 13.6%.
Summary
All in all, the analysis revealed changes over time regarding the observed open science practices and yielded answers to the leading RQ1. The publication of datasets (open data) and writing statements regarding funding and conflicts of interest have improved over time and employed thoroughly since Volume 28—H1.1c, H1.2a and H1.2b were accepted. Also, the proportion of published research materials (open materials) has increased over time (H1.1b was accepted). Only, the practice of preregistration has not yet been adopted for the German or German speaking music psychology publications (H1.1a was rejected).
New Statistics: RQ2
For these analyses, the leading question was: How prevalent was the use of new statistics in the JBDGM over time? (RQ2).
NHST (as Indicated by p-Values)
The quantitative research reports were searched for p-values to gain insight into the prevalence of NHST. Most papers reported p-values and most of them included at least one exact p-value (79.7% of reports included at least one exact p-value while 15.2% included only relative p-values; 5.1% included no p-values; see Figure 11). Linear regression is non-significant, F(1, 9) = 2.04, p = .19, R2 = 18.5, slope of the regression line is −1.6%, therefore no increase or decrease of papers including p-values was found. As assumed in H2.1, the use of NHST was quite popular among quantitative music psychology studies reported in the Yearbook and the hypothesis was accepted.
Figure 11
The Prevalence of p-values Among the n = 59 Quantitative Research Reports
Note. If a paper reported only relative p-values (e.g., p < .01), we coded it as relative p-value. If it (also) contained at least one exact p-value, it was coded as exact p-value.
Confidence Intervals
The papers were scanned for CIs that were reported in the text (mostly, the results section), in a figure (as error bars to central tendencies—the error bars had to be labelled as CIs), or in a table. If one was reported somewhere, the item was rated positively. In almost all considered volumes, roughly half of the quantitative studies reported at least one CI (see Figure 12). There is no obvious tendency over time: linear regression was non-significant, F(1, 9) = 4.2, p = .072, R2 = 31.6%, slope of the regression line is 3.2%. Therefore, H2.2a was rejected.
Figure 12
The Prevalence of CIs Among the n = 59 Quantitative Research Reports
Summed across all volumes, 55.9% of quantitative research reports included no CI while the other 44.1% included at least one.
Effect Sizes
The reporting of (standardised) effect sizes in the paper was observed. Aside from common effect sizes like the correlation coefficient r or Cohen’s d, in many cases, an enquiry was conducted to clarify if a measure is an effect size or not. In these cases, statistics books (Eid et al., 2015; Sedlmeier & Renkewitz, 2013) and the internet were consulted. The criterion was, again, set quite liberally: If the measure was described as fulfilling the function of an effect size or enabling interpretation similar to an effect size, the variable was coded positively. The measures were decided upon as noted in Table 1.
Table 1
Classification of Statistical Measurements as Effect Sizes During the Coding Procedure
| Coded as effect sizes | Not coded as effect sizes |
|---|---|
|
|
The prevalence of effect sizes over time is displayed in Figure 13. Across all volumes, 84.7% of the quantitative papers contained at least one effect size while 15.3% contained none. The linear regression was significant, F(1, 9) = 5.6, p = .042, R2 = 38.4, the regression line having a slope of 2.7%. Therefore, the reporting of effect sizes increased over time—albeit with just 2.7% per volume—and H2.2b was accepted.
Figure 13
The Prevalence of Effect Sizes Among the n = 59 Quantitative Research Reports
Note. The red dashed line indicates the regression line (statistical details in the text) for the proportion of papers with funding statements and features a slope of 2.7%.
Sample Size Determination
It was noted if a reason for the employed sample size was given and if so, on what rationale it was based. If the size of the sample was determined by a statistical procedure (e.g., power analysis) or based on previous studies, the item was coded positively. If other reasons were given (pragmatic reasons like the size of a natural group under investigation) or no reason was reported, the item was coded negatively.
All in all, a statement regarding the sample size determination could be rarely observed (13.6% while the other 86.4% included no such statement; see Figure 14). The first one was included in Volume 28. During the last two volumes, the proportion exceeded 50%. Taking into account that power analyses have been developed for centuries, German music psychology seems to be lagging behind the standard. Although the Yearbook itself published a paper on the topic by Platz et al. (2012), it did not seem to have had a major influence on researchers’ practices. The linear regression revealed a significant positive trend: F(1, 9) = 12.6, p = .006, R2 = 58.2%, the regression line featuring a slope of 5.5%. Therefore, the prevalence of sample size determination has increased over the last years and H2.3a can be accepted.
Figure 14
The Prevalence of Statements Regarding the Sample Size Determination Among the n = 59 Quantitative Research Reports
Note. The red dashed line indicates the regression line (statistical details in the text) for the proportion of papers with funding statements and features a slope of 5.5%.
Data Exclusion
This item was taken over from Giofrè et al. (2017, 2023) and was rated positively if at least one of the two criteria were fulfilled: (1) if the researchers adopted a stopping rule for data collection and reported what the stopping rule was or, (2) if they reported criteria to exclude data or participants.
Stopping rules could not be observed in the present dataset and do not seem to be used in music psychology very often. Therefore, Figure 15 displays whether the authors reported the exclusion of data or participants from the analysis. 44.1% of the quantitative publication included such a statement. The other 55.9% included no such statement. Linear regression revealed a significant positive tendency, F(1, 9) = 15.4, p = .004, R2 = 63.0%, the regression line having a slope of 7.7%. Therefore, H2.3b was accepted.
Figure 15
The Prevalence of Statements Regarding Data Exclusion Among the n = 59 Quantitative Research Reports
Note. The red dashed line indicates the regression line (statistical details in the text) for the proportion of papers with funding statements and features a slope of 7.7%.
Summary
The analyses revealed that NHST was popular in all regarded volumes—p-values were reported in most quantitative papers (H2.1 was confirmed). On the other hand, effect sizes were also wide-spread and CIs were at least reported in about half of the publications. On the one hand, the reporting of effect size was wide-spread and showed a slight significant increase, H2.2b was accepted. On the other hand, CIs were reported in 44% of the research reports and their use did not increase over time, therefore, H2.2a was rejected. A strict use of new statistics (in the sense that it relied on effect sizes and CIs and omitted p-values) was observed in one recent paper by Ruth et al. (2024). Statements regarding the determination of the sample size were still quite rare, reporting the data exclusion was somewhat more prevalent. Both showed an increase over time, H2.3a and b could be confirmed.
International and Interdisciplinary Comparison: RQ3
In recent years, the prevalence of open science practices has been researched in several studies. Different subject areas, journals and publication years have been taken into focus. Hardwicke et al. investigated the publication years 2014–2017 in the disciplines of social sciences (Hardwicke, Wallach, et al., 2020) and psychology (Hardwicke et al., 2022). In 2024, they gave an update on psychology publications from 2022. Eerola (2025) adopted Hardwicke’s method and researched music psychology publications from 2017–2022. Giofrè et al. (2017, 2023) focused not only on open science practices in psychology but also took the prevalence of new statistics into focus. These six studies are considered to yield the most insightful comparison points for the present study. Regarding the average number of authors per paper, the results will be compared to the bibliometric analysis of music psychology publications by Anglada-Tort and Sanfilippo (2019).1
Additionally, further studies on the prevalence of open science practices have been conducted for specific journals (Federer et al., 2018, for PLoS ONE; Hardwicke et al., 2018, for Cognition), for natural and life sciences (Womack, 2015), for biomedicine (Iqbal et al., 2016; Wallach et al., 2018), or for organisational behaviour research (Tenney et al., 2021). Since these disciplines differ substantially from music psychology, they will not be used as comparison points.
Since publication practices have been changing substantially over the last years, comparisons between publications should consider the publication years of the investigated papers. Some studies (Eerola, 2025; Giofrè et al., 2017, 2023) give prevalences or frequencies of the researched practices for each year. Therefore, the comparisons will also be displayed annually. Other studies (Hardwicke et al., 2022, 2024; Hardwicke, Wallach, et al., 2020) give the prevalences averaged across all considered years or include just one year. These averaged values will be compared to the corresponding averaged results from the JBDGM. Figure “09_studies_comparison.jpg” in the online supplementary gives an overview of the investigated publication years of the papers that are consulted for comparison. In the following, their results will be compared to the results of the present study from the corresponding years.
Comparison Regarding Open Science Practices
Preregistration, Open Materials, and Open Data
The prevalence of the three core indicators of open science (preregistration, open materials, and open data) and the comparison between the studies are displayed in Figures A1 and A2 in the Appendix. Giofrè et al. (2017, 2023; see Figure A1a) observed an increasing number of preregistrations in the Journal of Experimental Psychology: General (JEP:G) and Psychological Science (PS). Eerola (2025) and the present study did not find any preregistrations. We can conclude that this practice is not yet prevalent—let alone common—in music psychology publications in general (and not just the German publications or publications from German researchers). Hardwicke et al. (2022, 2024) and Hardwicke, Wallach, et al. (2020; see Figure A2a) also found none or few preregistrations in social sciences and psychology, showing that other disciplines are not far ahead of the German music psychology.
Regarding the publication of research materials, the studies by Eerola (2025), Giofrè et al. (2017, 2023), and the present study reported an increase over the last years (see Figure A1b). The disciplines regarded did not differ substantially but, compared to music psychology in general (Eerola, 2025), the JBDGM might have had a slightly higher prevalence of research materials during recent years (around 60%). The present study yielded similar results than the studies by Hardwicke et al. (2022) and Hardwicke, Wallach, et al. (2020) regarding the years 2014–2017 (see Figure A2b): Materials were only rarely shared. Compared to psychology publications in 2022 (Hardwicke et al., 2024), the JBGDM included a higher proportion of materials availability. All in all, the availability of research materials in the JBDGM seems to be similar to or slightly higher than the prevalence in similar disciplines.
The publication of data was more present in the JBDGM than in the JEP:G and PS (Giofrè et al., 2017, 2023; see Figure A1c) and music psychology in general (Eerola, 2025)—at least since 2018. Especially in English music psychology papers, the availability of data was still marginal (Eerola, 2025). Between 2014 and 2017, social sciences and psychology were a little ahead of the JBDGM regarding data publication (Hardwicke et al., 2022; Hardwicke, Wallach, et al., 2020; see Figure A2c). By 2022, the JBDGM showed better results than psychology (Hardwicke et al., 2024).
All in all, regarding preregistrations, the JBDGM is somewhat lagging the current standards (but English music psychology is as well). But regarding open materials and data it does not need to shy away from international and transdisciplinary comparison.
Funding and Conflict of Interest Statements
Funding and conflict of interest statements are viewed as additional indicators of open science practices. Comparisons are displayed in Figures A3 and A4 in the Appendix.
Both Eerola (2025) and the present study found very high prevalences of funding statements (see Figure A3a; disregarding if the authors reported funding or not). Between 2014 and 2017, the JBDGM included less funding statements than the social sciences or psychology (Hardwicke et al., 2022; Hardwicke, Wallach, et al., 2020; see Figure A4a). By 2022, around 25% of psychology papers still did not have funding statements, whereas, by that time, funding statements had been rigorously employed in the JBDGM. All in all, the publication of funding statements has been widely spread over the last years and especially exemplary employed in the JBDGM since 2018.
Conflict of interest statements were not as widely spread in (English) music psychology as funding statements (Eerola, 2025, see Figure A3b) but the JBDGM also included conflict of interest statements in all papers since 2018. Comparison to the studies by Hardwicke et al. (2022, 2024; Hardwicke, Wallach, et al., 2020; see Figure A4b) reveal the same results as regarding funding statements: By 2014–2017, the JBDGM was somewhat lagging behind psychology and social sciences. By 2022, it has more than caught up to psychology publications. All in all, especially since 2018, the publication of conflict of interest statements in the JBDGM is exemplary and although all considered fields have widely employed them over the last years, most have not yet reached full coverage.
Comparison Regarding New Statistics
p-Values, Confidence Intervals, and Effect Sizes
The studies by Giofrè et al. (2017, 2023) included the investigation of the prevalence of new statistics between 2013 and 2020 and served as a template for the present study. Giofrè et al. coded, whether at least one CI was reported and, separately, whether the authors referred to the CI in the text. For effect sizes, only the references in the text were coded. In the present study, we focused on coding whether a measurement (CI or effect size) was reported, not whether it was interpreted in the text. This limits the possibility of comparison, since effect sizes were investigated differently. The results are compared to the present study; the comparison is displayed in Figure A5.
The prevalence of p-values showed similar results (see Figure A5a): Nearly all considered papers included at least one p-value. Confidence intervals were reported in the JBDGM about as often as in JEP:G (Giofrè et al., 2017, 2023, see Figure A5b). PS showed a slightly higher prevalence. All in all, the statistical approach reported in the JBDGM seems to be similar to JEP:G and PS between 2013 and 2020.
Sample Size Determination and Data Exclusion
Statements about how the sample size was determined were much rarer in the JBDGM than in JEP:G or PS (Giofrè et al., 2017, 2023, see Figure A6a). The reporting of data exclusion was only somewhat less prevalent in the JBDGM (see Figure A6b). To conclude, these statements representing rigorous research methods and transparency are less prevalent in the JBDGM.
Comparison Regarding the Average Number of Authors Per Paper
Anglada-Tort and Sanfilippo (2019) conducted a bibliometric analysis on three international music psychology journals. They reported the average number of authors per paper (see Table 6 in their paper, column “average TA [total authors] per document”) which will be compared to the results of the present study. Figure A7 displays the mean number of authors from the present study (black dots with error bars) and from Anglada-Tort and Sanfilippo (2019; triangles). Here, we calculated mean and standard errors for the JBDGM data instead of quartiles (as in Figure 6) to make the present data comparable to the results from Anglada-Tort and Sanfilippo. The years 2012–2017 were included in both studies and the data from Anglada-Tort and Sanfilippo (2019) showed a higher mean number of authors per paper than the present study, especially for the earlier years.
Results From Exploratory Analyses
In the preregistration, we noted that we wanted to look for associations between the measured variables and calculate a correlation matrix. At that point, we did not consider that several variables in this study have nominal scale levels, i.e., language, country of origin, study design, p-values, or open data. Therefore, it was not possible to calculate correlations.
Instead, a Multiple Correspondence Analysis (MCA; Greenacre & Blasius, 2006; Hjellbrekke, 2019; Husson et al., 2017; Le Roux & Rouanet, 2010) was conducted to explore latent structures in the dataset, visualizing results in a two-dimensional space. A detailed description can be found in the online material (file “10_Exploratory Analysis_Multiple Correspondence Analysis.pdf”). The MCA plot revealed a correlation between more recent volumes and a higher implementation of open science practices. Additionally, articles with more authors tended to demonstrate better open science and statistics practices, suggesting a possible link between collaborative efforts and progressive practices in research.
Discussion
Research Questions and Hypotheses
The present study aimed at gaining insight into the past practices in the Yearbook (first aim) regarding general bibliometric properties like authorship (RQ4) but focusing on the prevalence of open science practices (RQ1) and new statistics (RQ2).
The N = 79 research reports from volumes 22 to 32 were mostly published in German with a recent uptake of publications in English. The majority of articles were anchored at German institutions, only a few stemming from Austria, Switzerland, or the United Kingdom. Three articles reported replications. Most articles reported observational empirical studies, while non-empirical articles became less and experiments more frequent. The number of authors increased from on average 1 (Volume 22) to 2 (Volumes 24–32), the distribution of the data indicating a positive tendency (H4.1 confirmed).
Regarding open science, the practices of open materials, open data, funding as well as conflict of interest statements improved over time (H1.1b and c as well as H1.2a and b were accepted). Only H1.1a was rejected: the practice of preregistration was not prevalent at any time. Regarding new statistics, p-values were prevalent at all times (H2.1 confirmed) indicating the popularity of NHST. The prevalence of effect sizes and CIs—the cornerstones of the new statistics—did not change over time and H2.2a and b were rejected. The prevalence of a determination of the sample size and reporting data exclusion became somewhat more prevalent over time—H2.3a and b were accepted.
Therefore, the present study gained detailed insights into the previous publication practices in the Yearbook (Aim 1) which can subsequently inform the development of the journal.
Comparison to Similar Studies
The second aim of the present study was to compare the JBDGM to international journals from related fields. RQ2 named six studies as primary comparison points.
Open science has gained global momentum across disciplines. While the JBDGM has not yet reported preregistrations, this is consistent with low or absent rates in comparable fields such as English-language music psychology (Eerola, 2025), social sciences (Hardwicke, Wallach, et al., 2020), and psychology (Hardwicke et al., 2022, 2024). However, since Volume 28, the JBDGM has shown an increase in open materials and data sharing—comparable to or even exceeding some other disciplines. Similarly, funding and conflict of interest statements, previously almost absent, have been included in all research reports since Volume 28. In contrast, other disciplines have shown more gradual progress and still often fall short of full reporting.
The prevalence of new statistics was also addressed by Giofrè et al. (2017, 2023), who noted the continued dominance of p-values in psychology—similar as in the JBDGM. However, due to methodological differences, effect size and confidence interval prevalence could not be directly compared. Statements about sample size determination and data exclusion were less common in the JBDGM than in the studies about psychological research by Giofrè et al.
The number of authors per paper in English-speaking music psychology journals (2012–2017; Anglada-Tort & Sanfilippo, 2019) were higher than in the JBDGM for the same period. This may reflect earlier trends toward larger collaborations, interdisciplinary teams, or complex research designs requiring more diverse expertise (Campbell & Simberloff, 2022).
In sum, the comparison reveals parallel trends and challenges. The JBDGM has been ahead in some areas (open data/materials, funding/conflict statements), especially since Volume 28, while in others (preregistration, reporting of sample size and exclusions) there is room to align more closely with international standards.
Generalisability, Limitations, and Further Research Questions
The present study provides detailed insights into publication practices in German-speaking music psychology between 2012 and 2024. As the JBDGM is the main outlet in this field2 and all its research reports from the examined period were included, the findings reflect not just a sample but nearly the full population of German-language music psychology publications. However, German researchers also publish in international, mostly English-language journals. As Anglada-Tort and Sanfilippo (2019) showed, they actively contribute to global discourse. Therefore, the results cannot be generalised to all publications by German researchers.
The JBDGM is a relatively small journal, resulting in a sample of only 79 research reports across more than 12 years. In contrast, studies by Giofrè et al. (2017, 2023), Hardwicke, Wallach, et al. (2020), Hardwicke et al. (2022, 2024), and Eerola (2025) included 200 to over 1000 papers. Accordingly, the present results are less robust, and annual averages may rely on only a few publications. Trends are harder to detect and easily biased by outliers.
While the research design (based largely on Giofrè et al., 2017, 2023) allows for precise observations of selected variables, the coding criteria were liberal. For example, reporting a single exact p-value was sufficient for coding, regardless of whether all p-values were exact, or just the ones over .05. The same applies to effect sizes and CIs. Similarly, for open materials, any available part (e.g., stimuli) sufficed, even if key elements (e.g., questionnaires, analysis scripts) were missing. Data were checked for plausibility, but not for completeness or reproducibility. These aspects must be considered when interpreting the results.
Based on these limitations, further research could refine and expand the current approach. Future studies might examine the completeness of shared materials and data or test computational reproducibility—whether reported results can be replicated. Reporting practices (e.g., p-values, effect sizes, CIs) and statistical methods could also be analysed in more detail (as conducted by Blanca et al., 2018, for psychology).
The present study used a stopping rule to avoid coding all volumes and instead go back only as far as necessary to identify the emergence of open science practices. Nonetheless, expanding the dataset to earlier volumes may yield additional findings. In the coming years, future volumes could be coded with the same or an expanded methodology to monitor ongoing developments—analogous to the sequential approach by Giofrè et al.’s (2017, 2023). Additionally, the study could be broadened by incorporating other music psychology journals into the investigation.
All in all, further studies might reveal more details about publications practices—be it in music psychology, in German publications or focusing on the development over time.
This is an open access article distributed under the terms of the Creative Commons Attribution License (