DON'T WORRY, BE HAPPY - BUT ONLY SEASONALLY - SZYMON LIS MATEUSZ KIJEWSKI MICHAŁ WOŹNIAK MACIEJ WYSOCKI - Faculty of Economic Sciences, University ...

Page created by Roberta Barnes
 
CONTINUE READING
DON'T WORRY, BE HAPPY - BUT ONLY SEASONALLY - SZYMON LIS MATEUSZ KIJEWSKI MICHAŁ WOŹNIAK MACIEJ WYSOCKI - Faculty of Economic Sciences, University ...
UNIVERSITY OF WARSAW
           FACULTY OF ECONOMIC SCIENCES

   Working Papers
      No. 12/2021 (360)

DON’T WORRY, BE HAPPY
– BUT ONLY SEASONALLY

    M ATEUSZ K IJEWSKI
       S ZYMON L IS
    M ICHAŁ W OŹNIAK
    M ACIEJ W YSOCKI

        Warsaw 2021
DON'T WORRY, BE HAPPY - BUT ONLY SEASONALLY - SZYMON LIS MATEUSZ KIJEWSKI MICHAŁ WOŹNIAK MACIEJ WYSOCKI - Faculty of Economic Sciences, University ...
WORKING PAPERS 12/2021 (360)

Don’t Worry, Be Happy – But Only Seasonally

Mateusz Kijewski, Szymon Lis*, Michał Woźniak, Maciej Wysocki

University of Warsaw, Faculty of Economic Sciences
* Corresponding author: sm.lis@student.uw.edu.pl

Abstract: Current scientific knowledge allows us to assess the impact of socioeconomic variables
on musical preferences. The research methods in these studies were psychological experiments
and surveys conducted on small groups or analyzing the influence of only one or two variables at
the level of the whole society. Instead inspired by the article of The Economist about February
being the gloomiest month in terms of music listened to, we have created a dataset with many
different variables that will allow us to create more reliable models than the previous datasets. We
used the Spotify API to create a monthly dataset with average valence for 26 countries for the
period from January 1, 2018, to December 1, 2019. Our study almost fully confirmed the effects
of summer, December, and number of Saturdays in a month and contradicted the February effect.
In the context of the index of freedom and diversity, the models do not show much consistency.
The influence of GDP per capita on the valence was confirmed, while the impact of the happiness
index was disproved. All models partially confirmed the influence of the music genre on the
valence. Among the weather variables, two models confirmed the significance of the temperature
variable. All in all, effects analyzed by us can broaden artists' knowledge of when to release new
songs or support recommendation engines for streaming services.

Keywords: valence,      spotify, happiness, statistical panel analysis, explainable machine learning

JEL codes: C01,    C23, I31

Working Papers contain preliminary research results. Please consider this when citing the paper. Please contact the
authors to give comments or to obtain revised version. Any mistakes and the views expressed herein are solely those
of the authors
DON'T WORRY, BE HAPPY - BUT ONLY SEASONALLY - SZYMON LIS MATEUSZ KIJEWSKI MICHAŁ WOŹNIAK MACIEJ WYSOCKI - Faculty of Economic Sciences, University ...
Kijewski, M. et al. /WORKING PAPERS 12/2021 (360)                              1

1. Introduction

The popularity of online music via global streaming services made it possible to study the

similarities and differences in musical tastes between countries, the seasonality of listening to

different types of music, and the relationship between music trends and socioeconomic variables.

In the beginning, we must think about the factors that influence people’s music preferences.

Listening to music is an inherently cultural behavior that can be shaped by users' backgrounds and

contextual characteristics, which means variables in the area of economics (e.g. Gross Domestic

Product – GDP) (Liu, Hu & Scheld, 2018), political issues (e.g. Freedom of Expression, Rule of

Law) (Schedl, et al., 2017), or weather conditions (e.g. average temperature, season, cloud cover,

or precipitation) (Lee & Lee, 2007). To go deeper into the topic, we need to understand why people

are listening to music. As listening to music is a consumption, it is dictated by the desire to

maximize the pleasure and minimize the pain, but in the case of music, it is not that simple. People

generally tend to avoid negative emotional experiences. However, they can enjoy sadness portrayed

in music and other arts (Vuoskoski et al., 2011). This paradox is called “pleasurable sadness” and

its clarification has puzzled music scholars for decades (Hospers, 1969; Levinson, 1997; Scherer,

2004). Now using the data from the streaming platform, this riddle can be solved. In the previous

year, the article “Data from Spotify suggest that listeners are gloomiest in February” has been

published, which can be summed up in one sentence - February is the month in which we listen to

the most depressive music (The Economist, 2020). The exceptions to this rule are three countries,

i.e., Chile, Paraguay, and Argentina. However, this article only presents the phenomenon itself,

without bringing us any closer to explanation of this phenomenon. The following research tries to

not only explain what makes us listen to the least cheerful music in February, but also explore other

factors influencing the choice of song by its positivity.
DON'T WORRY, BE HAPPY - BUT ONLY SEASONALLY - SZYMON LIS MATEUSZ KIJEWSKI MICHAŁ WOŹNIAK MACIEJ WYSOCKI - Faculty of Economic Sciences, University ...
Kijewski, M. et al. /WORKING PAPERS 12/2021 (360)                               2

         Musical preferences have been the subject of much sociological, psychological, and

economic research. Skowron et al. (2017) showed that we can reduce the error of prediction of the

popularity of genres using cultural and socioeconomic indicators such as GDP, income inequality,

agriculture’s share of the economy, unemployment rate, or life expectancy. Similar results have

obtained Schedl et al. (2017), Liu et al. (2018). Mellander et al. (2018), whose research showed

that geographic differences in music preferences reflect underlying economic and political

divisions in American society. In agglomerations that are more affluent, better educated, more

densely populated, and more diverse (in terms of sexual and ethnic minorities) liberal tendencies

prevail people prefer sophisticated and contemporary music, while in regions, where people are

less privileged, less educated, more racially homogeneous, and more religious, they tend to be

conservative and prefer unpretentious and intense music. A similar pattern has been discovered at

the level of states, where the authors found that the geographic structure of music preference is

related to the key socioeconomic variables such as income, education, and occupation, as well as

political preferences expressed as voting patterns (Rentfrow, 2013).

         There are no significant differences between musical preference and any demographic

variables (age, gender, ethnicity, and educational level) (Lai, 2004). Similar results were achieved

by Vlegels & Lievens (2017) with a difference that people over 65 years old have a much greater

interest in classical music than other groups. However, some research shows that variables as

gender structure can improve the accuracy of prediction (Vigliensoni & Fujinaga, 2016; Roe,

1987).

         The listening patterns can be influenced by contextual factors such as an activity the listener

is involved in. Consequently, choices about listening to music can show some recurring time

patterns, such as certain days of the week. Predicting the listening day of a particular genre using
DON'T WORRY, BE HAPPY - BUT ONLY SEASONALLY - SZYMON LIS MATEUSZ KIJEWSKI MICHAŁ WOŹNIAK MACIEJ WYSOCKI - Faculty of Economic Sciences, University ...
Kijewski, M. et al. /WORKING PAPERS 12/2021 (360)                               3

circular analysis was much more precise than the chance expectations (Herrera, Resa & Sordo,

2010; Baltrunas & Amatriain, 2009).

       Most young people report that they use music to improve their mood, especially when they

are already positive in their initial state. However, some young people reported a deteriorated mood

when feeling sad or stressed. The stressed young people were more likely to listen to intense music

and heavy metal, reporting no more negative impact on their mood than any other music genre

(McFerran et al., 2015; McFerran, 2016). The other study shows completely another view on this

topic. The results suggested that those in sad moods were not unfailingly inclined to listen to sad

songs, but rather were reluctant to listen to happy songs, apparently for fear that the selection of

such songs would seem inappropriate (Friedman, Gordis & Förster, 2012). Another perspective

may also be taken, which suggests that musical preferences reflect mental health rather than

causing it or affecting it. Some studies suggest that musical choices were related to the student's

current academic success or failures, which can affect the choice of music interest (Roe, 1987;

Took & Weiss, 1994). By this fact, we can say that in this area there is no scientific consensus.

       Weather matters, such as the seasons or cloud cover, can define people's musical

preferences, i.e., winter may sometimes isolate individuals and force them to adapt their way of

travel and dress to cope with the changing weather. The research of Pettijohn and Sacco (2009)

showed that more complex music, e.g., instrumental music, is preferable in winter. On the other

hand, in summer preferable is dance music with an emphasis on rhythm emphasized in the genres

of rap/hip-hop, soul/funk, and electronica/dance music (Rentfrow & Gosling, 2003). Application

of the weather and temperature data into the recommendation system caused that evaluation of the

model outperforms the comparative system that utilizes the user’s demographics and behavioral

patterns only (Kim et al., 2008; Lee & Lee, 2007).
Kijewski, M. et al. /WORKING PAPERS 12/2021 (360)                               4

       Based on the above results from the literature and the preliminary data analysis, we put

forward the following hypotheses:

   •   Hypothesis 1: Is the effect of summer significant and has a positive effect in the model?

       Summertime and vacations are expected to positively influence people’s mood; hence they

       tend to listen more happy songs.

   •   Hypothesis 2: Is the effect of December (Christmas) significant and has a positive effect on

       the valence? Christmas is a special time around the world, in this case especially

       considering the popularity of Christmas songs, which are full of happiness and love.

   •   Hypothesis 3: Will the February effect be irrelevant in the model? February is not a month

       with any holidays or spikes; thus, we do not expect any difference between February and

       other common months.

   •   Hypothesis 4: Will the effect of the political environment be important in the model? We

       assume that a high level of democratization, rule of law, civil liberties, freedom of religion,

       freedom of speech and artistic expression will be positively related to the level of valence.

       Conversely, as state corruption increases, the relationship should be negative.

   •   Hypothesis 5: Will the unfavorable socio-economic environment expressed by GDP per

       capita, and Happiness Index have a negative impact on valence? It is expected that sad

       music is chosen by people who are in a difficult financial situation and happy songs are

       listened by cheerful and peaceful people.

   •   Hypothesis 6: Whether the genre of music has significantly influence valence, i.e., the

       variables describing trends in listening will be statistically significant. In general, some

       music genres are happier than the others.
Kijewski, M. et al. /WORKING PAPERS 12/2021 (360)                                5

   •   Hypothesis 7a: Weather that is forcing people to stay at home negatively affects valence,

       i.e., the variable describing cloudiness of the sky will be significant and will have a positive

       impact on valence, and that the temperature will have positive impact on valence. It is

       expected that current music preferences are affected by the aura. Based on literature, people

       are more likely to listen sad music alone, than in groups of people.

   •   Hypothesis 7b: Weather forcing people to stay at home negatively affects valence, i.e., the

       variable assigning countries to specific geographical regions will be statistically significant

       and will have negative values for Western Europe, Northern America, Eastern Europe,

       Northern Europe, Southern Europe, Eastern Asia, Western Asia, and positive for Latin

       America and the Caribbean, Southern Europe.

   •   Hypothesis 8: Will the results show the effect of more Saturdays per month, i.e., the month

       with five Saturdays will have a positive impact on dependent variable. Saturdays are related

       to choosing a more positive music vibes, because people are expected to relax over the

       weekend and most of the parties are organized on Saturdays. Thus, difference between two

       months – one with four Saturdays and the other one with five Saturdays should be visible.

       We believe that this article will extend past literature on this topic, by using data of

aggregated choices of individuals with many variables describing current status of the country.

What is more, the results can be a great advice for music business e.g., radio stations or playlist

makers. They can select songs by their positivity using our analysis, which may lead to higher

popularity of the radio or the playlist. The rest of this paper is organized as follows. Section

2 introduces the data set and applied models. Section 3 presents the results. Section 4 consists of

a conclusion and an outlook on potential future research areas.
Kijewski, M. et al. /WORKING PAPERS 12/2021 (360)                             6

2. Empirical analysis

2.1 Data and variables

The earlier research on the impact of socioeconomic variables on musical preferences has been

more focused on checking whether introducing new information will improve the accuracy of

predicting songs that will be listened to. Whereas we are rather focused on explaining the

phenomenon of trends in listening to the songs with different positivity between months and

countries. We created a dataset with many variables chosen based on the knowledge gathered from

the literature. We believe that this large dataset will allow to obtain more reliable model

architectures than previous datasets. We used Spotify API to create the monthly average valance

dataset for 26 countries for the period from 1 January 2018 to 1 December 2019. Valance describes

the positivity of the song. High valence songs sound more positive (happy, cheerful, euphoric),

while low valence songs sound more negative (sad, depressive, intense). To extend our dataset, we

added monthly aggregated search indices from Google Trends for all the countries describing

trends in music genres (i.e. rap, house, pop, rock, and classical music). To describe democratic

situation of the countries we used Varieties of Democracy (V-Dem) Project (Coppedge et al.,

2020). To explain how diversity of ethnic or religious groups affects selection of the songs based

on positivity, we gathered data from Fractionalization research (Alesina et al., 2003).

       For each country we collected 24 variables regarding socio-economic issues, weather and

calendar data aggregated to the monthly level. The descriptions of these features combined with its

basic statistics are summarizes in table 1. It contains the mean, standard deviation (below the mean

in brackets), minimum, 25th quartile, 50th quantile, 75th quartile, and maximum. Our dependent

variable ranges from 0.42 to 0.65. From quantiles and the maximum value, we may conclude that
Kijewski, M. et al. /WORKING PAPERS 12/2021 (360)                                    7

the right tail of the distribution is fat, that exhibits a left skewness and/or high kurtosis. The mean

is equal to 0.4939 and is slightly higher than the median (0.487).

Table 1. Description and summary statistics for variables

                                                  Mean                    25th        50th        75th
    Variable             Description                           Min                                              Max
                                                  (sd)                  Quantile    Quantile    Quantile
                  A measure from 0 to 1
                  describing the musical         0.4939
 Valence                                                      0.420       0.468       0.487       0.509         0.652
                  positiveness conveyed by a     (0.041)
                  track
                  A happiness index from         6.7140
 HI_score                                                     5.287       6.180       6.908       7.332         7.769
                  World Happiness Report         (0.72)
                  A GDP per capita,
                                                 48089.61
 Gdp              resampled from quarterly                   9126.600   36042.525   49640.600   58023.300     89936.300
                                                (16391.56)
                  to monthly
                  A variable with a value of
                  1 if a given month had 5        0.3333
 Dancing days                                                 0.000       0.000       0.000       1.000         1.000
                  Saturdays, and a value of 0    (0.4718)
                  if it had 4 Saturdays
                  An ethnic fractionalization
                  describing probability of       0.2270
 Ethnic_frac                                                  0.012       0.103       0.126       0.322         0.712
                  not belonging to the same      (0.1909)
                  ethnic group
                  A linguistic
                  fractionalization
 Ling_frac                                        0.2090
                  describing the probability                  0.000       0.053       0.146       0.323         0.577
                                                 (0.1909)
                  of not belonging to the
                  same linguistic group
                  A religious
                  fractionalization
 Relig_frac                                       0.3926
                  describing the probability                  0.000       0.205       0.353       0.608         0.824
                                                 (0.2275)
                  of not belonging to the
                  same religious group
                  A proportion of searches
                  for classical music on
                  YouTube to all music
                                                  0.4026
 Classical        categories (Pop, Rock,                      0.000       0.209       0.343       0.577         1.000
                                                 (0.2686)
                  Rap, House, Classical),
                  download and prepared
                  from Google Trends
                  A proportion of searches
                  for pop music on YouTube
                                                  0.3882
 Pop_music        to all music categories,                    0.000       0.156       0.315       0.589         1.000
                                                 (0.2868)
                  download and prepared
                  from Google Trends
                  A proportion of searches
                  for rap music on YouTube
                                                  0.4306
 Rap              to all music categories,                    0.000       0.205       0.399       0.622         1.000
                                                 (0.2770)
                  download and prepared
                  from Google Trends
Kijewski, M. et al. /WORKING PAPERS 12/2021 (360)                             8

                A proportion of searches
                for rock music on
                YouTube to all music            0.5041
Rock                                                       0.000    0.339     0.492     0.678         1.000
                categories, download and       (0.2566)
                prepared from Google
                Trends
                A proportion of searches
                for house music on
                YouTube to all music            0.4517
House                                                      0.000    0.246     0.431     0.661         1.000
                categories, download and       (0.2716)
                prepared from Google
                Trends
                A logarithm of percent of
Sky_log         the sky hidden behind the       3.5542
                                                           0.338    3.286     3.522     3.839         4.372
                clouds, values from 0 to       (0.3381)
                100
                A monthly sum of               164.2558
Sun_hrs                                                    5.000    99.000   162.050   222.750       363.000
                sunshine hours                 (80.1069)
                An average monthly             53.6035
Temperature                                                16.245   42.788   53.598    64.920        83.591
                temperature in Fahrenheit      (13.99)
                A freedom of academic
                and cultural expression.
                                                2.2282
v2clacfree      Ordinal converted to                       -2.209   2.182     2.567     2.875         3.212
                                               (1.1238)
                interval in the original
                dataset.
                A freedom of religion
                indicating to what extent
                individuals are free to
                                                1.6816
v2clrelig       choose and practice their                  -0.661   1.398     1.726     2.090         2.800
                                               (0.7078)
                religions. Ordinal
                converted to interval in the
                original dataset.
                A political corruption
                index related to frequency
                                                0.1430
v2x_corr        of briberies and                           0.002    0.021     0.056     0.202         0.765
                                               (0.1992)
                embezzlements. Interval
                from low to high (0-1)
                A categorical variable
                indicating to what extent
                the electoral democracy         0.8156
v2x_polyarchy                                              0.279    0.826     0.870     0.881         0.913
                applies in the country.        (0.1407)
                Interval from low to high
                (0-1)
                A rule of law indicator,
                indicating independence,
                transparence, equality in
                law enforcements, and if        0.8973
v2x_rule                                                   0.201    0.891     0.968     0.987         0.999
                actions of the government      (0.1773)
                in line with the law.
                Interval from low to high
                (0-1)
                A freedom of discussion
                index indicating liberty of     0.9041
v2xcl_disc                                                 0.120    0.908     0.953     0.975         0.987
                press and media, privilege     (0.1646)
                to publicly discuss the
Kijewski, M. et al. /WORKING PAPERS 12/2021 (360)                                9

                   political issues and liberty
                   of academic and cultural
                   discourse. Interval from
                   low to high (0-1),
                   Rights to private property.
                                                   0.9001
 v2xcl_prpty       Interval from low to high                 0.422    0.908       0.930        0.942       0.971
                                                  (0.1021)
                   (0-1)

Source: Own calculations.

        For Sky, Temperature and Valence variables, we encountered a few missing observations

for Turkey and Czech Republic, which were replaced with average for country subregion group.

There were few factors with yearly or quarterly frequency – GDP, Happiness index, fractional and

political variables (v2) for which we replaced missing observations with last known value.

Additionally, we used min-max scaler for trends in music genres.

        One variable, Subregion, which assigns a country to a given region, was not described in

the table due to its categorical character. We identified 8 regions, i.e. Western Europe (6 countries),

Northern America (Canada and USA), Eastern Europe (4 countries), Northern Europe (7

countries), Southern Europe (4 countries), Eastern Asia (Japan), Latin America and the Caribbean

(Mexico), and Western Asia (Turkey).

        Probability density function of Valence has been estimated using Kernel Density Estimate

(KDE). The estimation results along with the histogram are presented in figure 1. Its fragment, i.e.,

from 0.40 to 0.55, resembles the normal distribution. However, the right tail is fat. To understand

where the reasons behind this phenomenon the valence values above 0.55 were analyzed. There

were 53 observations, so almost 10% of the sample. Most of the records come from Spain and

Mexico. Importantly, all 24-month observations for Mexico exceeded this threshold, and in case

of Spain almost all - 21 out of 24 observations. In addition, six observations come from Japan and

two from Finland.
Kijewski, M. et al. /WORKING PAPERS 12/2021 (360)                           10

Figure 1. Kernel Density Estimate (KDE) plot with histogram for Valence

Source: Own calculations.

        Figure 2 shows how valence changes over time. December stands out here for both years.

Therefore, we can expect the hypothesis for this effect to be confirmed. For 2019, the summer

effect may be noticeable, but for 2018 it is not very visible. What is more, from figure 2. analysis

we cannot see that February stands out with lower valence. As it has similar levels to nearest

months – January and March.
Kijewski, M. et al. /WORKING PAPERS 12/2021 (360)                             11

Figure 2. Valence probability density over the time segregated from the most recent (at the top)
to the oldest (at the bottom)

Source: Own calculations.

        Figure 3 shows how the values of valence percentiles change over time. Here, the December

effect is also clearly visible, which is interesting that it appears not only on the average level but

also applies to every percentile. The summer effect (July and August) is clearly visible, although

what we have expected earlier, the effect is much more visible for 2019. In 2018, the effect was

observable only for the 90th percentile. Thus, countries that listen to happy music for most of the

time in the year, are listening to even happier music in summer in comparison to other countries.
Kijewski, M. et al. /WORKING PAPERS 12/2021 (360)                         12

Figure 3. Quantile trend plot for Valence

Source: Own calculations.

        Figure 4 shows the valence in individual regions. Southern Europe, Latin America and the

Caribbean stand out clearly from other regions. In both regions, Spotify users listen to much more

positive music.

Figure 4. Valence in particular subregions

Source: Own calculations.
Kijewski, M. et al. /WORKING PAPERS 12/2021 (360)                           13

        Figure 5 shows the correlation between the variables. The most closely related variables are

the political variables, i.e. v2clacfree, v2clrelig, v2x_corr, v2x_rule, v2xcl_prpty, v2c_polyarchy,

and v2xcl_disc. The correlation between valence and the explanatory variables can be assessed as

moderate (it ranges from 0.2 to 0.5).

Figure 5. Correlation between the variables used in the research

Source: Own calculations.
Kijewski, M. et al. /WORKING PAPERS 12/2021 (360)                               14

2.2 Empirical design

2.2.1 Panel Data Regression Model

The data used in this research is a panel with 26 countries serving as groups and 24 monthly

observations for each variable, hence the most obvious choices are panel data regression models

with fixed effects and random effects. For the purpose of determining the proper estimator, we used

the Hausman test which null hypothesis points towards using a random effects estimator and the

alternative hypothesis indicates that the random effects estimates are inconsistent and hence fixed

effects estimator should be chosen. The results of the Hausman test indicated that the null

hypothesis is rejected, and hence the fixed effects (FE) panel regression should be used. In order

to come up with a set of significant variables for regression analysis we applied General-to-Specific

modelling procedure (Campos, Ericsson & Hendry, 2005), which consists of iterative model

estimation, dropping the variable with the highest p-value of the significance test and testing the

joint hypothesis of insignificance of the dropped variables.

2.2.2 Dynamic Panel Data Regression Model

Dynamic panel data regression models are used in cases where the autoregressive process of the

dependent variable is significant, hence its future values depend on the past. To test this assumption

we tested significance of AR(1) process of the dependent variable. The results strongly rejected the

null hypothesis of insignificance and hence indicated that the autoregressive term is not redundant

in explaining the regressand. Therefore, we concluded that it is necessary to include the lagged

values of the dependent variable in the panel. The rationale behind this model is also the retention

in music taste and the fact that people generally tend to listen a specific type of music for a longer

period as well as come back to the songs they enjoyed listening recently. In such case using fixed
Kijewski, M. et al. /WORKING PAPERS 12/2021 (360)                              15

effects regression model will lead to the Nickell’s bias and the estimated coefficients will be

inaccurate, especially in the context of panels with small T and large N. This bias arises due to

exclusion of individual fixed effect from each observation, which in case of including the lagged

regressor leads to introducing correlation between the regressors and the error term. Since the panel

used in this research is relatively short, as it consists of only 24 monthly observations, we have

decided to use Arellano and Bover / Blundell and Bond system estimator, which is unbiased and

effective for dynamic panel data even in a small sample. Consistently with previous panel

regression model, General-to-Specific modelling approach was applied to select the set of

significant variables.

2.2.3 CatBoost model and Explainable Artificial Intelligence

To analyze the research problem in depth, we also applied the CatBoost model (Prokhorenkova et

al., 2017) in its classic regression form. That is, we entered panel data into the model, and the

machine learning estimator treats them as cross-sectional data. Importantly, we have chosen not to

consider the specificity of the time series in this model in order to simplify the estimation and

statistical inference process.

       The biggest advantage of the boosting trees model in our context is a lack of assumption

regarding the linear function, thus it can handle highly non-linear interactions in the data. We are

aware that manual search for an appropriate polynomial or power functional form for the linear

panel approach like fixed-effects model usually fails due to a vast space of possible solutions. What

is more, boosting schemes applied in CatBoost allowed us to control variance (overfitting) in

a responsible way. In addition, CatBoost perfectly model highly cardinal variables (we have such

in the analysis). Importantly, the CatBoost model interpretation is not as trivial as for FE or DPD.
Kijewski, M. et al. /WORKING PAPERS 12/2021 (360)                                16

However, it is feasible with techniques such as feature importance and feature effects powered by

SHapley Additive exPlanations (Lundberg & Lee, 2017).

       Our CatBoost modelling process was relatively straightforward. We searched for the best

hyperparameters in a 5-folded cross-validation grid search with following setup (based on our

experience): depth [2, 3, 4, 5, 6, 7], learning rate [0.01, 0.05, 0.1, 0.25, 0.5], iterations [50, 100,

150, 200, 250, 300]. During this process, our evaluation metric was root mean squared error.

       Feature importance techniques enable us to analyze the significance of a given variable

throughout the model and determine its quasi-participation in the predictive power of the model.

SHapley Additive exPlanations (SHAP) is a game theoretic approach to explain the output of any

machine learning model. As we were focused on summarizing the effects of all the features, we

used SHAP summary plot. It sorts features by the sum of SHAP value magnitudes over all samples

and uses SHAP values to show the distribution of the impact each feature has on the model output.

The color represents the feature value (red for high, blue for low). What is more, we used SHAP

Partial Dependence Plot (2D partial Partial Dependence Plot) to examine the overall effect of

a single feature (of two features) across the whole dataset. This kind of plots represents a change

in dependent variable as independent variable changes.

3. Results

3.1 Fixed effects panel model

After conducting General-to-Specific approach, the finally obtained model for fixed effects has 11

independent variables, 10 of the variables are significant at the level of at least 0.1 and 8 of them

are significant at the level of at least 0.05. The values of coefficients, standard errors, and p-value

are reported in the table 2.
Kijewski, M. et al. /WORKING PAPERS 12/2021 (360)                             17

Table 2. Coefficients, standard errors, and p-value for Panel Data Regression Models

         Variable            Model 1                  Model 2                       Model 3

 Temperature                  0.0002                  0.00012                          -
                             (0.0002)                 (0.0001)
 Gdp                        0.000003*                0.000003*                     0.000003*
                            (1.14e-06)               (1.16e-06)                    (1.12e-06)
 HI_score                    -0.03021                 -0.02816                      -0.033
                              (0.019)                 (0.0191)                     (0.0197).
 V2clrelig                   0.01314*                 0.0139*                      0.014854*
                             (0.0063)                 (0.0058)                       (0.006)
 V2x_corr                   -0.35842***             -0.33093***                   -0.33766***
                              (0.0466)                (0.0559)                      (0.0556)
 V2xcl_prpty                 0.13198                      -                            -
                             (0.0975)
 Classical                    0.00675                  0.00733                      0.00636
                             (0.0037).                (0.0039).                    (0.00397)
 House                      0.00016***              0.011896***                   0.01355***
                              (0.0032)                (0.00316)                     (0.0031)
 Rap                        -0.01212***             -0.01446***                   -0.01586***
                              (0.0041)                (0.00379)                     (0.00343)
 Pop_music                   0.0068129                 0.0078                        0.008
                            (0.0045155)              (0.00459).                    (0.0048).
 Dancing_days               0.00381***               0.00376***                   0.00435***
                              (0.0005)                 (0.0005)                    (0.00055)
 Summer                     0.00787***               0.00804***                    0.0104***
                             (0.00124)                (0.00127)                    (0.00169)
 Xmas                       0.01584***              0.015869***                    0.0139***
                              (0.0035)                (0.00346)                    (0.00344)

Source: Own calculations.

         This model strongly confirms the hypotheses 1, 2, and 3 about the importance of the

summer and December effects and not significant February effect. Hypothesis 4 has been partially

confirmed. Only variables regarding freedom of religion (v2clrelig) and political corruption

(v2x_corr) are statistically significant. Hypothesis 5 has been confirmed. Variable gdp is significant

and is positive, but surprisingly HI_score is not significant. It may be caused by correlation

between HI_score and gdp. Hypothesis 6 has been partially confirmed. The impact of the house,

rap, and pop has been confirmed. For the house and pop music effects are positive, while for rap it
Kijewski, M. et al. /WORKING PAPERS 12/2021 (360)                          18

is negative. Hypotheses 7a and 7b have been fully rejected. The variables temperature and sky are

not statistically significant. Hypothesis 8 has been confirmed. The variable dancing_days is

significant and positive.

3.2 Arellano and Bover / Blundell and Bond system estimator

Table 3. Coefficients, standard errors, and p-values for Arellano and Bover / Blundell and Bond
system estimator

                              Model 1             Model 2                    Final model
        Variable
                             A-B / B-B            A-B / B-B                   A-B / B-B
 Lag.Valence                 0.6280***            0.632***                    0.623***
                               (0.0276)            (0.027)                     (0.0264)
 Temperature                 0.00038***           0.0004***                  0.00036***
                              (0.000055)          (0.00004)                    (0.0004)
 Gdp                        -7.90 e-07***        -7.69 e-07**                -7.61 e-07**
                               (3.13 e-07)        (3.14 e-07)                 (3.31 e-07)
 HI_score                    0.0232***             0.023***                   0.0225***
                               (0.0056)            (0.0056)                    (0.0056)
 V2clrelig                   0.0139***            0.0135***                  0.01425***
                               (0.0033)            (0.0034)                   (0.0.033)
 V2x_polyarchy               0.1511***             0.15***                    0.1495***
                               (0.0421)            (0.042)                     (0.0422)
 V2xcl_disc                  -0.1810***           -0.178***                    -0.1795
                                (0.035)             (0.035)                    (0.035)
 Log_sky                       0.0041               0.004                         -
                              (0.0037)              (0.03)
 House                       0.0071***             0.007***                   0.0071***
                              (0.0.018)            (0.0018)                    (0.0018)
 Rap                          -0.0028*             -0.0029*                   -0.0032**
                              (0.0018)             (0.0019)                    (0.0018)
 Pop_music                   0.00921***           0.0091***                   0.0094***
                               (0.0017)            (0.0017)                    (0.0017)
 Dancing_days                0.0047***            0.0046***                   0.0045***
                             (0.00077)            (0.00078)                    (0.0007)
 Summer                        0.0011                 -                           -
                              (0.0011)
 Xmas                         0.022***             0.022***                   0.022***
                               (0.0014)            (0.0014)                    (0.001)
 Sun_hours                  0.000024***          0.000023***                 0.00002***
                            (0.0000065)          (0.0000065)                 (0.0000068)
Kijewski, M. et al. /WORKING PAPERS 12/2021 (360)                          19

 Ethnic_fraction            0.1546***              0.1545***                   0.1578***
                             (0.0253)               (0.0254)                    (0.0253)
 Lingual_fraction           -0.228***              -0.227***                   -0.228***
                            (0.03599)                (0.036)                     (0.218)
 Relig_fraction              0.089***               0.088***                    0.087***
                             (0.0212)                (0.022)                    (0.0218)

Source: Own calculations.

        In table 3. we presented the results from Arellano and Bover / Blundell and Bond system

estimator on dynamic panel data. In the table we showed the only last three iterations of General-

to-Specific approach. All independent variables were significant at 5% level of confidence in our

final model. The autoregressive process was significant, which confirms the usage of Dynamic

Panel Data estimators. Retention rate is almost 63% for valence. Temperature and number of sun

hours has positive impact on valence, which confirms the 7th hypothesis that the bad weather forces

people to stay at home, which can lead to lower valence levels. In contradiction to the FE model,

GDP has negative impact on valence. In countries with favorable political environment (v2clrelig,

v2x_polarchy) we can expect higher valence. Variables describing trends of music genre are also

statistically significant for house, rap and pop music. Only rap music has negative impact on

valence, which is in line with intuition that rap music tends to be negative. Number of Saturdays

in a month has a positive impact on dependent variable, which suggest that Saturday itself has

significant impact. Next, we wanted to check, if the second and the third hypotheses were

confirmed. February dummy was insignificant, which confirms third hypothesis. We also observed

a positive significant impact of Christmas dummy, however variable flagging summer was

insignificant.

        To confirm the proper selection of the instruments, we calculated Arellano-Bond test. Test

confirms proper form of the model, as we expected the autocorrelation of first order – the test
Kijewski, M. et al. /WORKING PAPERS 12/2021 (360)                           20

rejects the null hypothesis for zero autocorrelation in first-differenced error and for second order

we cannot reject null hypothesis at 5% level of confidence (p-value is equal to 0.861).

3.3 CatBoost model

Our final CatBoost model gathered 22 explanatory variables. Based on cross-validation (described

in the methodology subsection), we set following hyperparameters values: iteration 250, learning

rate 0.1 and tree depth 3. The general results of the model obtained using SHAP Summary Plot are

presented in the figure 6. It clearly shows that variables like subregion, rock, GDP, rap, month of

the year, house and temperature are the most important for this discriminative model. We see that

variables generally affect model’s output in expected way. But to be more specific, we propose to

analyze SHAP Partial Dependence Plots for exogenous features. These plots are visualized in the

figure 7 (note the different scale of the ordinate axis).

Figure 6. SHAP Summary Plot based on CatBoost model

Source: Own calculations.
Kijewski, M. et al. /WORKING PAPERS 12/2021 (360)                               21

       Let us first analyze the influence of music genres. We can easily conclude that the greater

popularity of pop, house, rock in each country, the greater the valence. The effect of rap is strictly

negative, while the popularity of classical music only has a measurable positive effect on the

expected value of the target variable from a certain point onwards. In the case of meteorological

variables, temperature has a clearly monotonic positive effect on valence, and the logarithmical

cloudiness of the sky is not relevant to the model at all. The effect of sunshine hours seems to be

positively significant only for the extreme values of this exogenous variable. For the months,

valence is positively influenced by the holiday period (June to September) and Christmas

(December). The impact of other months is insignificant. Dancing days have a positive impact on

the target variable. The impact of the Happiness index is unclear. An interesting finding is that

countries with low and medium GDP per capita have higher expected valence than the richest

countries. Jointly, the political and constitutional variables do not clearly indicate their impact on

the explanatory variable. However, the rule of law, rights to private property, and freedom of

religion have a very positive influence on the outcome of the model. An interesting effect has the

freedom of discussion index, which for the largest value has a negative effect on the fitted value

from the model. The higher the religious and ethnic diversity, the more we expect the valence to

be positively affected. Linguistic diversity to some extent suggests a similar relationship. The

subregion variable is hard to interpret due to its poor balancing, while it shows that subregions are

strongly homogeneous with respect to the target variable.
Kijewski, M. et al. /WORKING PAPERS 12/2021 (360)   22

Figure 7. SHAP Partial Dependence Plots based on CatBoost model
Kijewski, M. et al. /WORKING PAPERS 12/2021 (360)   23

Source: Own calculations.
Kijewski, M. et al. /WORKING PAPERS 12/2021 (360)                                 24

       In addition, we used SHAP 2D Partial dependence plots to interpret the CatBoost results

(see figure 8). In this case, the pairs of variables of our interest are pop_music – rap, sun_hours –

temperature and HI_score - gdp. We decided to test the main effect of each feature and their

interaction effect. Based on these graphs, we can confirm the earlier conclusions of the 1D PDP,

i.e. the popularity of rap has a negative effect on valence, while pop has a positive effect. Altogether

we can see that even a relatively low popularity of rap, with a high popularity of pop strongly

negatively affects the final valence. When it comes to the relationship between temperature and

days of sunshine, temperature is clearly the key. Sunny days create a bulge in the graph, i.e. despite

high temperature, few sunny days will lower the expected value of the target variable. A very

interesting relationship is shown by the Happiness Index and GDP. It turns out that the highest

expected valence is in countries with relatively low income and high Happiness Index. Moreover,

moderate Happiness Index and average GDP also lead to above average valence.

Figure 8. SHAP 2D Partial Dependence Plots based on CatBoost model
Kijewski, M. et al. /WORKING PAPERS 12/2021 (360)                         25

Source: Own calculations.

        To sum up, the two models fully confirmed the hypothesis 1, 2 and 3. Only the Dynamic

Panel Data Regression Model did not confirm the summer effect. In the context of hypothesis 4,

all models confirmed the significance of freedom to religion (v2clrelig). Two models confirmed

the significance of the political corruption index (v2x_corr), the electoral democracy

(v2c_polyarchy), religious diversity (relig_frac) and ethnic diversity (ethnic_frac). Hypothesis

5 for gdp was confirmed, although this variable had the opposite effect to that predicted. For the

Dynamic Panel Data Regression Model and CatBoost, it was negative, for the fixed effects model

the gdp impact is positive. All models partially confirmed hypothesis 6. House and pop had

a positive effect on the valence, while the rap negative. Only the CatBoost model confirmed the

added impact of rock. All models refuted hypotheses 7a with regard to the significance of cloud

cover (sky_log). In case of hypothesis 7b subregions were strongly significant only for CatBoost

model (it can utilize very well highly cardinal variables). All models confirmed the significance

and a positive coefficient for the variable dancing_days, which confirmed the last hypothesis 8.
Kijewski, M. et al. /WORKING PAPERS 12/2021 (360)                                26

4. Conclusions

The models allowed to confirm most of the hypotheses put forward at the beginning. These results

are important as much as they contradicted the conclusions drawn by The Economist that February

would be the gloomiest month in terms of the music listened to. The remaining effects may broaden

the artists' knowledge of when to release new songs. Streaming services such as Spotify may be

another beneficiary of the results. The recommendation engines for songs and playlists could be

more accurate if they also considered the variables we added.

       The first limitation of this study is that valence may be largely related to the kinds of music.

Therefore, further research should focus on the analysis of disaggregated data and a possible

valence comparison for given genres of music. The distribution of valence at the country level

could also be interesting. The analysis of the mean alone does not provide all information about

the mood of the music being listened to, there is a possibility that distribution can be bimodal –

people can listen to extremely negative and extremely positive music. Additionally, the influence

of political variables is unclear. There is no theoretical basis for the interpretation of the obtained

results based on theories from the literature. Another limitation is the short period of the analyzed

data. Two years do not allow to properly capture the seasonality, which was our main interest in

the third hypothesis. Another limitation is the lack of monthly macroeconomic and social data. For

this reason, some of the variables in our analysis had only two unique values.
Kijewski, M. et al. /WORKING PAPERS 12/2021 (360)                           27

References

       Alesina, A., Devleeschauwer, A., Easterly, W., Kurlat, S., & Wacziarg, R. (2003).
Fractionalization.       Journal      of       Economic         growth,        8(2),        155-194.
https://doi.org/10.1023/A:1024471506938

       Baltrunas, L., & Amatriain, X. (2009). Towards time-dependent recommendation based on
implicit feedback. In Workshop on context-aware recommender systems, 25 - 30.

       Campos, J., Ericsson, N. R., & Hendry, D. F. (2005). General-to-specific modeling: an
overview and selected bibliography. FRB International Finance Discussion Paper, 838.

       Coppedge, M., Gerring, J., Knutsen, C. H., Lindberg, S. I., Teorell, J., Altman, D., Ziblatt,
D. et al. (2020). V-dem Dataset v10. Varieties of democracy (V-Dem) Project.
https://doi.org/10.23696/vdemds20

       Herrera, P., Resa, Z., & Sordo, M. (2010). Rocking around the clock eight days a week: an
exploration of temporal patterns of music listening. In 1st Workshop On Music Recommendation
And Discovery, ACM RecSys, 2010, Barcelona, Spain.

       Friedman, R. S., Gordis, E., & Förster, J. (2012). Re-exploring the influence of sad mood
on      music        preference.     Media       Psychology,       15(3),      249      -       266.
https://doi.org/10.1080/15213269.2012.693812

       Hospers, J. (1969). Introductory readings in aesthetics. Free Press.

       Kellaris, J. J. (2008). Music and consumers in Haugtvedt, C. P., Herr, P. M., & Kardes, F.
R. (Eds.). (2018). Handbook of consumer psychology, 837 - 856. Routledge.

       Kim, J.-H., Jung, K.-Y., Ryu, J.-K., Kang, U.-G., & Lee, J.-H. (2008). Design of ubiquitous
music recommendation system using MHMM. In Int. Conf. on Networked Computing and
Advanced Information Management, 2, 369 - 374.
Kijewski, M. et al. /WORKING PAPERS 12/2021 (360)                          28

         Lai, H. L. (2004). Music preference and relaxation in Taiwanese elderly people. Geriatric
Nursing, 25(5), 286 - 291. https://doi.org/10.1016/j.gerinurse.2004.08.009

         Lee, J. S., & Lee, J. C. (2007). Context awareness by case-based reasoning in a music
recommendation system. In International symposium on ubiquitious computing systems, 45-58.
Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-76772-5_4

         Levinson, J. (1997). Music and negative emotion. In J. Robinson (Ed.), Music and meaning
(215-241). Ithaca, NY: Cornell University Press.

         Liu, M., Hu, X., & Schedl, M. (2018). The relation of culture, socio-economics, and
friendship to music preferences: A large-scale, cross-country study. PloS one, 13(12).

         Lundberg, S.M. and Lee, S. (2017). A Unified Approach to Interpreting Model Predictions,
Advances in Neural Information Processing Systems, 30, 4765-4774.

         McFerran, K. S., Garrido, S., O’Grady, L., Grocke, D., & Sawyer, S. M. (2015). Examining
the relationship between self-reported mood management and music preferences of Australian
teenagers.      Nordic     Journal     of    Music      Therapy,     24(3),     187      -   203.
https://doi.org/10.1080/08098131.2014.908942

         McFerran, K. S. (2016). Contextualising the relationship between music, emotions and the
well-being of young people: A critical interpretive synthesis. Musicae Scientiae, 20(1), 103-121.
https://doi.org/10.1177/1029864915626968

         Mellander, C., Florida, R., Rentfrow, P. J., & Potter, J. (2018). The geography of music
preferences. Journal of Cultural Economics, 42(4), 593 - 618. https://doi.org/10.1007/s10824-018-
9320-x

         Pettijohn, T. F., II, & Sacco, D. F., Jr. (2009). Tough times, meaningful music, mature
performers: popular Billboard songs and performer preferences across social and economic
Kijewski, M. et al. /WORKING PAPERS 12/2021 (360)                             29

conditions      in     the    USA.      Psychology     of      Music,     37,       155      -    179.
https://doi.org/10.1177/0305735608094512

       Prokhorenkova, L., Gusev, G., Vorobev, A., Dorogush, A. V., & Gulin, A. (2017).
CatBoost: unbiased boosting with categorical features. arXiv preprint arXiv:1706.09516.

       Rentfrow, P., Mellandar, C., Florida, R., Hracs, B. J., & Potter, J. (2013). The geography
of music preferences.

       Rentfrow, P. J., & Gosling, S. D. (2003). The do re mi’s of everyday life: the structure and
personality correlates of music preference. Journal of Personality and Social Psychology, 84, 1236
- 1253. https://doi.org/10.1037/0022-3514.84.6.1236

       Roe, K. (1987). The school and music in adolescent socialisation. In J. Lull (Ed.), Pop music
and communication, 212–230. Thousand Oaks, CA: Sage.

       Skowron, M., Lemmerich, F., Ferwerda, B., & Schedl, M. (2017). Predicting genre
preferences from cultural and socio-economic factors for music retrieval. In European Conference
on Information Retrieval, 561 - 567. Springer, Cham. https://doi.org/10.1007/978-3-319-56608-
5_49

       Schedl, M., Lemmerich, F., Ferwerda, B., Skowron, M., & Knees, P. (2017). Indicators of
country similarity in terms of music taste, cultural, and socio-economic factors. In 2017 IEEE
International        Symposium     on     Multimedia        (ISM),      308     -     311.       IEEE.
https://doi.org/10.1109/ISM.2017.55

       Scherer, K. R. (2004). Which emotions can be induced by music? What are the underlying
mechanisms? And how can we measure them?. Journal of new music research, 33(3), 239-251.
https://doi.org/10.1080/0929821042000317822
Kijewski, M. et al. /WORKING PAPERS 12/2021 (360)                               30

       The Economist. (2020). Data from Spotify suggest that listeners are gloomiest in February.
The     Economist.      https://www.economist.com/graphic-detail/2020/02/08/data-from-spotify-

suggest-that-listeners-are-gloomiest-in-february

       Took, K. J., & Weiss, D. S. (1994). The relationship between heavy metal and rap music
on adolescent turmoil: Real or artifact? Adolescence, 29, 613 - 621.

       Vigliensoni, G., & Fujinaga, I. (2016). Automatic Music Recommendation Systems: Do
Demographic, Profiling, and Contextual Features Improve Their Performance?. In ISMIR, 94-100.

       Vlegels, J., & Lievens, J. (2017). Music classification, genres, and taste patterns: A ground-
up network analysis on the clustering of artist preferences. Poetics, 60, 76 - 89.
https://doi.org/10.1016/j.poetic.2016.08.004

       Vuoskoski, J. K., Thompson, W. F., McIlwain, D., & Eerola, T. (2011). Who enjoys
listening   to   sad   music     and    why?.      Music    Perception,    29(3),   311     -   317.

https://doi.org/10.1525/mp.2012.29.3.311
UNIVERSITY OF WARSAW
FACULTY OF ECONOMIC SCIENCES
44/50 DŁUGA ST.
00-241 WARSAW
WWW.WNE.UW.EDU.PL
You can also read