Exploring a Masked Language Model for Creative Text Transformation
←
→
Page content transcription
If your browser does not render page correctly, please read the page content below
Exploring a Masked Language Model for Creative Text Transformation Hugo Gonçalo Oliveira CISUC, Department of Informatics Engineering University of Coimbra, Portugal hroliv@dei.uc.pt Abstract As it happens for other creative artefacts, several subjec- tive aspects must be considered during evaluation. There- We explore a masked-language model based on BERT for fore, we had to rely on human opinions for the manual eval- shifting the meaning of text towards a target theme. Con- uation of grammaticality, semantic coherence and overall tent words in the original text are masked and the model pro- appreciation. Whenever possible, we also explored auto- vides a list of filling candidates, out of which one is selected based on its similarity to the theme and constraints regarding matic and semi-automatic procedures, namely for assessing: morphology and metre. Experimentation is performed with relatedness to the theme, given by the semantic similarity Portuguese song lyrics and trade-offs between grammatical- between human-given titles and the original theme; novelty ity, semantics, form, and novelty are analysed. We confirm towards the original lyrics, given by the textual overlap be- that BERT is a useful tool for creative text transformation. tween the new and the original lyrics. Aforementioned aspects are compared for a set of lyrics transformed with different constraints, confirming some ini- Introduction tial expectations. Results of human evaluation confirm the Many creative systems rely on some kind of inspiration set, subjectivity of the task and, overall, show moderate success. but only some assumedly aim at transforming a single exist- While it is true that the meaning of the lyrics shifts towards ing artefact with a pre-defined purpose. When it comes to the selected themes, trade-offs exist, and increasing the sim- linguistic creativity, transformation-based approaches have ilarity with the theme, by considering more candidates, of- been adopted for the generation of song lyrics (Bay, Bod- ten results in more grammatical issues. Moreover, overall ily, and Ventura, 2017; Gonçalo Oliveira, 2020) and head- appreciation is lower than for the original song lyrics. Nev- lines (van Stegeren and Theune, 2019; Mendes and Gonçalo ertheless, we believe to have confirmed that BERT is not Oliveira, 2020). In this domain, transformation consists of only a powerful model for NLP, but has also potential for replacing parts of the text, often words, with others that meet Computational Creativity. desired constraints. Applications include advertising cam- In the remainder of the paper, we overview previous work paigns and others, such as the creation of parodies, either on the generation of creative text with neural language mod- celebrating an event / someone, or making fun of them. And els, as well as on the transformation of text. We then de- starting with a well-known text or melody makes the result scribe the proposed approach, followed by the experimen- more memorable, also helping to spread the message. tation setup, including implementation details. Before con- Following recent trends on using neural language mod- cluding, evaluation results are presented and discussed, to- els in natural language processing (NLP) and generation gether with insights and examples. tasks, we explore the language-modelling capabilities of BERT (Devlin et al., 2019) for transforming creative text. Related Work BERT is a bidirectional language model based on a Trans- In the last 20 years, poetry generation, including song lyrics, former neural network, trained for the prediction of masked has arguably been one of the most active research topics in words, considering both their left and their right context. Computational Creativity, with a broad range of computa- Given some text, ideally well-known, and a theme, we rely tional techniques explored (Gonçalo Oliveira, 2017; Lamb, on BERT for providing replacement candidates for some Brown, and Clarke, 2017). Lately, we have seen a growing words. Yet, instead of always using the first candidate, in application of models based on neural networks for this task. an attempt to control the semantics of the resulting text, we For instance, recurrent neural networks with LSTM layers consider the similarity between the resulting sequences and were used for generating rap lyrics (Potash, Romanov, and the theme. This is the logic underlying Zorro, a creative Rumshisky, 2015). An advantage of such approaches is that system that transforms Portuguese text according to a given they do not require handcrafted rules. theme. It is applied to song lyrics and, besides semantic sim- Specifically, the introduction of Transformers (Vaswani et ilarity, two constraints are tested, for better consistency and al., 2017) made possible the development of large language aesthetics, namely morphology and metre. models, like GPT (Radford et al., 2019), with powerful gen- Proceedings of the 12th International Conference on Computational Creativity (ICCC ’21) 62 ISBN: 978-989-54160-3-5
eration capabilities. This model was used for poetry genera- BERT for suggesting a large list of replacement candidates, tion, namely Chinese classic poetry (Liao et al., 2019), after and further constrain the selection, based on a target theme, pre-training on a corpus of this kind of text. Using only as well as morphology and metre constraints. poems with a specific format, GPT could be fine-tuned for that format, characterised by the number of lines, their me- tre and rhymes. However, the latter features are only cap- Approach tured implicitly. So, unless specific symbols are introduced We exploit the capability of BERT predicting masked words for length, rhyme and others (Li et al., 2020), not all formal in the transformation of text. For this purpose, we mask constraints are guaranteed to be met. the content words in the text we want to transform and use Even if previous approaches may result in fluent text BERT for obtaining replacement candidates. The first can- that matches desired formal and aesthetic requirements, they didate will be BERT’s best prediction and will result in the give few or no control on the message to transmit. To deal most fluent sentence. On the other hand, the result will al- with this limitation, a Transformer-based autoencoder can ways be the same and, except for the words to replace, we learn to map input words to related text, while matching the will not have any control in the result. target form (Nikolov et al., 2020). Content words can be But BERT can provide a ranked list of replacement can- stripped from original lyrics, and the model trained to recon- didates. So, we can explore this list and select the candidate struct the latter from part of the former and their synonyms. that best suits our goal, in this case, shifting the meaning of Generating lyrics on a new topic becomes a matter of chang- the text towards a given theme. Moreover, other constraints ing the input words to others related to the target topic. can be applied to the candidates, namely on morphology and While the previous work generates new text from scratch, on their metre. We consider four configurations: others assume the goal of transforming a given text with some creative purpose. Here, another Transformer-based • Basic: selects the candidate that maximises the semantic language model, BERT (Devlin et al., 2019), can be useful similarity with the theme. for suggesting replacements. BERT is less used in genera- tion tasks, but it is very powerful when it comes to predicting • Morphology: besides similarity, for increasing syntac- words based on their context. A BERT model is often pre- tic consistency, only considers candidates that may have trained in two tasks, Masked Language Model (MLM) and the same part-of-speech (PoS) and are inflected (number, Next Sentence Prediction, but can be fine-tuned for many gender, tense) as the replaced word. downstream tasks, like Question Answering or Natural Lan- • Metre: besides similarity, to agree with the original me- guage Inference. In the MLM task, about 15% of the tokens tre, only considers candidates with the same number of in the training sequences are masked (i.e., replaced by the syllables and stress position as the replaced word. For token [MASK]) and the model is trained to predict each, line-ending words, it gives priority to candidates that considering both their left and right context. This makes rhyme with the original word. such a model different from conditional language models. In fact, the work of Nikolov et al. (2020) has a post- • Morphology and Metre: combines both of the previ- processing step where words ending each line are masked, ous (hereafter, MM). and BERT used for providing suitable replacements, out of which those maximising rhymes are selected. Constraints are applied to any list of candidates, and only Though not using neural networks, several works have nouns, verbs and adjectives in the original text are masked. tackled the transformation of text. This approach has been This is done both for keeping some syntactic consistency applied to the automatic generation of poetry (Toivanen et and resemblance with the original text, but also because sim- al., 2012), also including song lyrics, where words have been ilarity between function words (e.g., prepositions and deter- replaced by: key concepts extracted from daily news, al- miners) and any other would be meaningless. We should ways considering syntactic and metre constraints (Gatti et add that, even though the method tries to replace all content al., 2017); words selected according to a theme, emotion, words, it might not find a replacement that matches all the meter, or rhyme (Bay, Bodily, and Ventura, 2017); words morphology and meter constraints. Only in that case, the whose relation to a new theme is analogous to the rela- original word is kept. tion between the original words and the original song ti- BERT can also be used for computing the semantic simi- tle (Gonçalo Oliveira, 2020). In the last two works, distri- larity. We use it for representing both the theme and the re- butional representations of words (word embeddings) were sulting sequences, after each replacement, and then compute used for computing semantic similarity. their cosine. The higher the cosine, the higher the similarity. Moreover, transformation-based approaches were applied to the adaptation of human-produced news headlines to new Experimentation Setup contexts, by replacing the nouns in the headlines with nouns from the context (van Stegeren and Theune, 2019), or re- Even though the proposed approach is applicable to any lan- placing pairs of content words, based on analogy (Mendes guage, we adopted it in the development of Zorro, a creative and Gonçalo Oliveira, 2020). system for the transformation of Portuguese text, ideally po- In our work, we combine the earlier approach with the ems. This section describes its implementation and illus- previous constraints for text transformation, i.e., we use trates the process with an example. Proceedings of the 12th International Conference on Computational Creativity (ICCC ’21) 63 ISBN: 978-989-54160-3-5
Implementation 2. List of candidates: Implementation of Zorro is based on Python and relies [jardim, espaço, amor, verde, olhar ... ] (‘garden’, ‘space’, ‘love’, ‘green’, ‘look’) on the transformers library from Hugging Face,1 for loading BERT; and on NLTK (Loper and Bird, 2002), for 3. Maximum similarity with criatividade computacional: PoS tagging. As our language model, we use BERTim- Usando o projeto (‘using the project’) bau (Souza, Nogueira, and Lotufo, 2020) base, a pre-trained Selected replacement: projeto (‘project’) BERT model for Portuguese, with 12 layers, that encodes Going through every line and content word results in the text sequences in 768-sized vectors. In addition to NLTK, following four lines, roughly translated to those after them: LABEL-Lex (Ranchhod, Mota, and Baptista, 1999), a mor- phology lexicon for Portuguese, is used for identifying con- Usando o projeto as tecnologias e tent words and checking the PoS of the candidates. Syllable as nuvens simples e inteligente criativa division and rhyme identification were performed accord- Que oferece perto da eficiência de ing to a set of rules adapted from Tra-la-Lyrics (Gonçalo um recurso Oliveira, Cardoso, and Pereira, 2007). Recursos matemáticos a engenharia Example Using the project, the technologies and the clouds simple and intelligent creative To illustrate the proposed approach, we consider the lyrics That offers close to the efficiency of a resource of the song Efectivamente (in English, ‘actually’), by the Mathematical resources to engineering Portuguese band GNR. This song starts with the following four lines, roughly translated by those following them: Although we did not use the first candidate, the text con- sistency is still ok and, more importantly, several words re- Adoro o campo as árvores e as flores lated to the theme are used (e.g., project, technologies, in- Jarros e perpétuos amores telligent, creative, resource, engineering). The main issue is Que fiquem perto da esplanada de um bar that the metre of the new text does not match the original, Pássaros estúpidos a esvoaçar meaning that it cannot be sung with the same melody. I love the countryside, the trees and the flowers To fix the latter, we constrain the candidates by consider- Arums and perpetual loves ing only those with the same number of syllables and stress That stay close to the terrace of a bar position as the original words. Plus, whenever such a word Stupid birds flying exists, we try to end each line with a word that rhymes with In order to transform these lyrics according to a theme, the original word, in an attempt to keep the original rhyming each line is first PoS-tagged. Then, for each content word: scheme. Once these constraints are added, we get: Usando o design as máquinas e as cores 1. It is replaced by the mask token [MASK]; simples e legı́timo conceito 2. A ranked list of filling candidates is obtained from BERT; Que gira perto da novidade de um lar Cı́rculo contı́nuo a executar 3. Similarity between the theme and the sequence resulting from filling the mask with each candidate is computed.2 Using the design, the machines and the colours If it is the highest so far, it is kept. simple and legitimate concept That revolves near the novelty of a home For the theme criatividade computacional (‘computa- Continuous circle to perform tional creativity’), and considering the first 500 filling candi- Words in the domain of the theme are still used (e.g., de- dates, with no additional constraint, here is how the previous sign, machines, colours, concept, novelty), the rhythm now steps would be instantiated for the first content word (in En- matches the original lyrics, and the third line rhymes with glish, ‘[I] love’): the fourth. To increase the probability of the first and sec- 1. Masked sequence: ond lines rhyming, more replacement candidates have to be [MASK] o campo as árvores e as flores considered. This is the result with the top-5000: 2. List of candidates: Software o design as máquinas e as cores [Sobre, E, Entre, É, Para, São, ... ] Ino e inúmeros motores (‘about’, ‘and’, ‘between’, ‘is’, ‘for’, ‘are’) Que monta perto da teoria de um ar 3. Maximum similarity with criatividade computacional: máquinas contı́nua a modificar Usando (‘using’) Software the design, the machines and the colours Selected replacement: Usando Ino and countless engines That assembles close to the theory of an air Then, for the second content word (‘country’): continuous machines modifying 1. Masked sequence: Increasing the search space made it possible to have the Usando o [MASK] as árvores e as flores first two lines rhyme, also increasing the probability of us- 1 https://huggingface.co/transformers/ ing words in the domain of the theme (e.g., new words 2 We use vector representations of the [CLS] tokens, returned like ‘software’ and ‘modifying’). But this was done at the by the feature-extraction pipeline of the transformers library. expense of less syntactic consistency and stranger words, Proceedings of the 12th International Conference on Computational Creativity (ICCC ’21) 64 ISBN: 978-989-54160-3-5
e.g.: in the first line, the word software is used as a verb; Variações; O Anzol (the fishing hook), by Rádio Macau; in the second, an unknown word, Ino, possibly a suffix, is Cavalos de Corrida (racehorses), by UHF. used; in the fourth line, there is a number inconsistency, as máquinas (‘machines’) is in the plural, but its modifier • Using five different themes, namely criatividade contı́nua (‘continuous’) is in the singular. While the first computacional (computational creativity), plus four candidates should suit the syntactic structure well, matching trending topics: portal das finanças (the website the PoS and inflection, the lower we get in the rank, more where Portuguese contributors declare their taxes), and more words for which this does not happen will appear. liga dos campeões (Champions League), festival This is especially true for poetry, where less common syn- da eurovisão (Eurovision Song Contest), plano de tactic structures are frequent and some sentences are broken vacinação (vaccination plan). into different lines. • Following the four different strategies: Basic, Morphol- A way to minimise the latter issues would be constraining ogy, Metre, MM. the candidates to only those that match both the PoS and the inflection of the original words they will replace. This is • Considering the top-500 (more conservative) and the done with the help of the lexicon, which should have all the top-5000 first replacement candidates. possible PoS of each candidate. Adding this constrain to the This results in 200 different combinations (5×5×4×2) previous configuration results in the following text: and thus 200 different new lyrics. desenho o design as máquinas e as cores robôs e inúmeros rumores Human Evaluation Que contem perto da teoria de um ar cálculos cenários a utilizar We resorted to the crowdsourcing platform Amazon Me- drawing the design, the machines and the colours chanical Turk (AMT), where human workers were in- robots and countless rumours structed to perform a task consisting of: (1) reading a pre- Which count close to the theory of an air sented poem (i.e., one of the lyrics in the sample); (2) us- scenarios calculations to be used ing Likert scales for rating it according to three main as- pects (grammaticality, semantic coherence, overall apprecia- Not only it uses words in the domain of the theme, but it tion) and providing a suitable title in a text field. Instructions follows the original rhythm, has two rhymes, and no syntac- and questions were written in Portuguese, the same language tic issues, except for an odd last line. Semantic coherence as the text. The translation of the questions was: could also be better, especially for the third line. For illustrating purposes, we show the result of the last 1. At the grammar level, the text has: 1-Many issues, configuration in the same lyrics, but now with a different 3-Some issues, 5-No issues. theme, liga dos campeões (Champions League): 2. On the transmitted message, the text: 1-Does not make conjunto o clube as lı́deres e as cores any sense; 3-Has some coherence; 5-Is perfectly coherent. Clubes e teóricos cantores Que tenham perto da temporada de um mar 3. Considering its contents, a good title for the poem is: ... tı́tulos estádios a classificar 4. My overall appreciation of the poem was: 1-Hated it; set the club, the leaders and the colours 3-Interesting; 5-Loved it. Singing clubs and theorists That have close to the season of a sea For each of the 200 lyrics, questions were answered by stadium titles to be classified two different workers, who were not told that the text had been automatically transformed. Given titles are important Evaluation for assessing meaningfulness, i.e., the more semantically Experimentation showed interesting results regarding syn- similar they are to the themes, the better the theme is ex- tax, semantics and metre, all important in a poem. However, pressed by the lyrics. these are all subjective aspects, for which we cannot rely ex- Initially, it was also our intention to gather human opin- clusively on the opinion of a single person, especially if they ions on two additional aspects of the new lyrics: rhythm were involved in the development of the system. and novelty, both considering the original lyrics. Yet, for So, a sample of lyrics was created for the evaluation of this to work well, the workers had to previously know the the proposed approach. We report on its assessment, start- melody of the original songs and their lyrics. All songs are ing with a human evaluation, but also applying automatic by Portuguese artists, and even if they are popular enough measures for computing the similarity and novelty. for being known by a vast Portuguese audience, their pop- ularity is mostly restricted to Portugal and Portuguese peo- Data Sample ple. This meant that we would have to restrict the evaluation to workers located in Portugal. We actually started to do Results were produced for evaluation purposes, with: it, but had no answers for some time, which made us open • Lyrics of five well-known Portuguese pop-rock songs: the survey to other Portuguese-speaking countries, namely Efectivamente, by GNR; Contentores (containers), by Xu- Brazil. For this reason, it is expected that workers will not tos & Pontapés; Estou Além (I’m beyond), by António be familiar with the original lyrics and thus unapt for as- Proceedings of the 12th International Conference on Computational Creativity (ICCC ’21) 65 ISBN: 978-989-54160-3-5
sessing the rhythm and novelty of the new lyrics.3 While an and the theme. BERT encodings are suitable for this, as they approximation to the latter was obtained automatically (see can represent whole sequences in a vector of real numbers, Novelty below), we ended up not assessing the rhythm. Still, based on the meaning of the words in context. from our observation of the results, we can assume that, for a Medians are not very different than for grammaticality, large majority of cases, the Metre constraint guarantees that i.e., combining morphology and metre with the top-500 can- all replacements have the same metre as the original words didates gets slightly higher than the other configurations. they are replacing. On the other hand, this is not always the Scores also suggest that morphology constraints contribute case of rhymes. We now look at the results for each human- to semantic coherence more than the metre. assessed aspect. However, we were also assessing to what extent the lyrics are related to the theme, i.e., their semantic similarity with Grammaticality Table 1 summarises the scores regarding the theme. Since there is not a threshold for which we can the grammaticality of the lyrics transformed by each con- consider that two texts are similar enough, we look at the figuration with a different number of top candidates con- similarity values relatively. We can compare average simi- sidered (Top-k). It shows the proportion of scores below, larities of the given title with the theme, for different con- equal and above 3, followed by the median score. The lat- figurations, and also with the original title of the song. As ter suggests that, except perhaps for the Morphology and we were trying to shift the meaning of the lyrics, similarity Metre configuration with 500 candidates (hereafter, MM- of the given title should be higher for the theme than for the 500), there are no substantial differences among configura- original title. tions. Yet, even if differences are low, score distribution is in line with three expectations: grammaticality is harmed Even though average similarity values are all very close, by a higher number of candidates and by the metre con- the previous expectation is confirmed for every configura- straint; with the morphology constraint on and 500 candi- tion. However, some differences are very low and not sig- dates, the proportion of lyrics with scores >3 is higher, even nificant when the standard deviation is considered. This hap- when the metre constraint is added. In fact, the highest pro- pens especially for the MM configuration, the one with high- portion (62%) is when both constraints are used. Here, we est scores for grammaticality and semantic coherence. In ad- speculate that a correct metre contributes to better reading, dition to what was said about this configuration, we started possibly hiding grammar issues. At the same time, when to question whether the application of both constraints was 5,000 candidates are considered, there is a chance of select- preventing the replacement of all words, leaving large se- ing less natural replacements, leading to less fluent text and quences of the original lyrics, and thus a better grammati- exposing grammar issues. This, however, does not seem to cality, semantic coherence and a meaning that is also closer make a difference for the Basic configuration, as the propor- to the original. In the following sections, we show examples tion is the same for the three categories (3), either of lyrics transformed with this configuration and compute considering 500 or 5,000 candidates. their overlap with the original lyrics, to confirm that this is not exactly the case. Config Top-k 3 Med Out of curiosity, we looked at the most given titles to find Basic 500 24% 40% 36% 3.0 that a total of nine lyrics got the title “Agora” (now). Though Basic 5000 24% 40% 36% 3.0 not very related to any of our themes, it is the starting word Morphology 500 16% 38% 46% 3.0 of all lines in the lyrics of Cavalos de Corrida and, as an ad- Morphology 5000 28% 42% 30% 3.0 verb, it is never replaced. A similar situation occurred for the Metre 500 34% 28% 38% 3.0 title “Debaixo do Sol” (under the sun), given four times, for Metre 5000 46% 30% 24% 3.0 being the end of the lyrics of O Anzol, and kept intact with Morph+Metre 500 22% 16% 62% 4.0 metre constraints on. Other common titles that are related Morph+Metre 5000 38% 28% 34% 3.0 to the themes include: Prevenção (prevention), Vı́rus (virus) and Gripe (flu), all related to vaccination plan, respectively Table 1: Grammaticality in lyrics transformed by each con- given six, five and four times; “Futebol” (football), related figuration, considering a different number of top candidates. to Champions League, given five times; and Finanças (fi- nances), part of one of the themes, given four times. Semantics Table 2 is on the semantic-related aspects. It There was a single case for which the given title was ex- has the scores given by the workers for semantic coherence, actly the same as the theme and it was Liga dos Campeões, followed by two similarity values which are average cosines given to the lyrics of O Anzol, transformed with Basic-500. between the BERT representations of the worker-given ti- Other titles highly similar (cosine > 0.87) to the themes tles, respectively with the theme of the transformed lyrics were: Os Clubes Campeões (champion clubs) and Campeão and with the original title of the song. Since it is virtually dos Campeões (champion of the champions), respectively impossible for the workers to guess exactly the words of the for the lyrics of Efectivamente transformed with Basic-500, theme, computing the semantic similarity is a shortcut for and Cavalos de Corrida with Basic-5000; Inovação Com- quantifying the proximity between the meaning of the title putacional (computational innovation) and Design de In- teligência (design of intelligence), respectively for the lyrics 3 A link for the original song could be included, but having to of Contentores transformed by Basic-5000 and Cavalos de listen to the song first would make the task slower. Plus, since we Corrida with Morph-5000. This analysis showed that, for would have no control, many workers could still not listen to it. some original lyrics and themes (computational creativity, Proceedings of the 12th International Conference on Computational Creativity (ICCC ’21) 66 ISBN: 978-989-54160-3-5
Coherence Config Top-k 3 Med Sim(theme) Sim(original) Diff Basic 500 28% 42% 30% 3.0 0.716±0.07 0.629±0.08 (+0.088) Basic 5000 34% 42% 24% 3.0 0.740±0.08 0.615±0.08 (+0.125) Morphology 500 20% 38% 42% 3.0 0.698±0.06 0.627±0.09 (+0.072) Morphology 5000 20% 44% 36% 3.0 0.707±0.07 0.634±0.07 (+0.073) Metre 500 38% 36% 26% 3.0 0.682±0.07 0.650±0.07 (+0.032) Metre 5000 40% 46% 14% 3.0 0.670±0.08 0.628±0.07 (+0.042) Morph+Metre 500 20% 30% 50% 3.5 0.648±0.07 0.647±0.07 (+0.001) Morph+Metre 5000 42% 34% 24% 3.0 0.665±0.08 0.656±0.08 (+0.011) Table 2: Semantic coherence and average similarity of given titles with the theme and the original title. champions league) given titles were closer to the theme than MM-500 and five with MM-5000, three of them in Fig- for others. ure 2. The remaining were by Basic-500 (1), Morph-500 (1), Metre-500 (4) and Metre-5000 (2). Semantic coherence is Overall Appreciation Table 3 is on the overall apprecia- lower in all three examples shown. In the first and third, tion, which, in the end, is probably what matters the most. we note a higher presence of odd line constructions and In this aspect, we cannot say that the results were very pos- words (e.g., els, log, cm). itive, as all configurations have a great proportion of lyrics with scores 1 and 2. When comparing configurations, we see Novelty that the metre constraints alone, which favor words based on their length and stress, lead to the worst appreciation. Some- The higher average similarity between given titles and times, a better rhythm and sound can compensate for some themes was already a hint that some original words were ungrammatical text. Yet, we recall that the workers did not being replaced, resulting in a meaning shift. Still, we at- know the original song nor its rhythm, meaning that metre tempted to quantify the novelty, based on the overlap be- would hardly play a role in their judgement. tween the original and the transformed lyrics. Some overlap is expected, and even desired for increased familiarity, but Config Top-k 3 Med the resulting lyrics should not just be a copy of the original. Basic 500 36% 48% 16% 3.0 In order to approximate novelty towards the original, Basic 5000 48% 40% 12% 3.0 we adopted ROUGE (Lin, 2004), a set of metrics com- Morphology 500 30% 58% 12% 3.0 monly used for assessing automatic summarisation and ma- Morphology 5000 34% 46% 20% 3.0 chine translation, based on the overlap between the gen- Metre 500 52% 38% 10% 2.0 erated and the original text. The main difference here is Metre 5000 50% 28% 22% 2.5 that we are aiming for lower ROUGE scores, meaning that Morph+Metre 500 16% 36% 48% 3.0 overlap with the original text is also lower. In the scope Morph+Metre 5000 50% 34% 16% 2.5 of Computational Creativity, ROUGE has previously been used for assessing variation in automatically-generated po- Table 3: Overall appreciation for each configuration. ems (Gonçalo Oliveira et al., 2017). We consider the overlap between uni (R-1), bi (R-2), MM-500 was again the configuration with more lyrics and tri (R-3) grams, as well as the longest common sub- rated 4 or 5 and less below 3. As discussed earlier, the im- sequence (R-LCS). Alone, the obtained values are probably possibility of replacing some words could end up favouring not so meaningful, but they enable us to compare different this configuration. Still, looking at some lyrics by MM-500, configurations. For each metric considered, Table 4 shows we see that many content words are indeed replaced. In Fig- the average values for the simplest configuration (Basic-500) ure 1, we show the three top-scored (average=4.5) lyrics. and the difference for all the others. First of all, values con- For each, the titles given and their similarity to the theme, firm that, for any configuration, several changes are made plus the average scores for grammaticality (Gram) and se- to the original lyrics. Otherwise, they would be 1 or close. mantic coherence (Sem) are shown. We note that, in the first They further confirm some initial expectations, even if by two, a single line from the original lyrics is kept intact (Para low margins. For instance, novelty is higher for the Basic outro mundo). In addition, two other lines are the same in strategy, which does not constrain the replacements, mean- both, despite the different theme. In the third lyrics, two ing that all content words end up being replaced. It is also lines are equal to the original (Porque até aqui eu só, Quem higher with the morphology than with the metre constraints não conheci). Moreover, four and seven lines have a single alone, showing that it is more difficult to find words with a word replaced, respectively in the first two and in the third. certain number of syllables and stress than with a target PoS Rough translations for these lyrics are in the Appendix of the and inflection. In fact, when applying only the morphology paper. Further ahead on the paper, we analyse the novelty of constraints, ROUGE values are almost the same as for the the new lyrics based on their overlap with the original. Basic configuration. A final confirmation is that novelty is A total of 14 lyrics had the lowest overall apprecia- higher when considering the top-5000 candidates, for which tion (average= 1.5), out of which, one was produced with chances of finding suitable replacements increase. Proceedings of the 12th International Conference on Computational Creativity (ICCC ’21) 67 ISBN: 978-989-54160-3-5
Original: Estou Além Theme: criatividade computacional Original: Contentores Original: Contentores Titles: ao me conhecer (meeting me, 0.68), Theme: festival da eurovisão Theme: portal das finanças arte de gerar (art of generating, 0.68) Titles: a despedida (the farewell, 0.72), Titles: eu me vou (i’m leaving, 0.68), Gram: 4.5, Sem: 4.5 adeus romantico (romantic goodbye, 0.69) impostos (taxes, 0.70) Gram: 4, Sem: 4 Gram: 3.5, Sem: 3 Não preciso transformar Este sistema de utilidade A crise grega e chegada nos corredores A folha falsa e travada nos corredores A Arte de gerar Adeus aos meus sapatos que me vou Adeus aos meus impostos que me vou P’ra não tornar tarde Para outro mundo Para outro mundo Não sou do que tem que eu uso Num astro surgido num projecto judicial Num banco externo num governo comercial Será desta dimensão Não faze nada mal isto onde vou Não guarda nada mal isto onde vou Mas porque faz que eu abuso P’lo teatro mundo P’lo inteiro mundo Quem faz dar-me a mão Mudaram todas as dores Mudaram todas as dores Vou continuar a explorar fazem sozinho os leitores usam sozinho os valores A quem eu me porto dar E numa crise impossı́vel E numa crise impossı́vel Porque até aqui eu só vendo a Lisboa Central vendo a revista geral Tendo quem quem eu nunca vi Não faze nada mal Não entra nada mal Porque eu só acho quem Quem não conheci Figure 1: Lyrics with high overall appreciation, all with MM-500 configuration. Original: Contentores Original: Cavalos de Corrida Original: O Anzol Theme: plano de vacinação Theme: festival da eurovisão Theme: criatividade computacional Config: MM-5000 Config: MM-500 Config: MM-5000 Titles: vı́rus (virus, 070), Titles: hipocresia (hypocrisy, 0.59), Titles: medir (measuring, 0.80), o vı́rus (the virus, 0.70) evento de comida (food event, 0.59) medição (measurement, 0.80) Gram: 3.5, Sem: 2.5 Gram: 3.5, Sem: 2.5 Gram: 2, Sem: 2 A gripe paga e consulta nos portadores Agora há que a Europa terminou, e Ai, eu já medi Adeus aos meus arquivos que me vou os media se seguem num esforço pensar mover o top em log de azul Para outro fundo Agora há que todos eles jogaram, a Pra cm estrutural Num vı́rus tratado num esquema oriental mitologia em jogo Mas só depois medi Não trata nada mal isto onde vou Agora há que eles fazem os media, faz que plural já ele tens, monta alguém P’lo completo fundo destruindo todas as leis Que monta ideia mental indicam todas as dores Agora há que els seguem ao buraco Eu não sou se hei de medir devem sozinho os menores e fazem-no por qualquer preço Ou gerar o padrão E numa lista compatı́vel Agora, agora, agora, agora, tu dá Já não cm nada de novo aqui Plano a Saúde natal um evento de comida Debaixo do sol Não trata nada mal Figure 2: Lyrics with low overall appreciation. Config Top-k R-1 R-2 R-3 R-LCS Basic 500 0.535±0.02 0.250±0.01 0.113±0.01 0.525±0.01 Basic 5000 -0.002±0.01 0.000±0.01 0.000±0.01 -0.004±0.01 Morphology 500 +0.006±0.01 +0.005±0.01 +0.001±0.00 +0.009±0.01 Morphology 5000 +0.005±0.00 +0.001±0.01 0.000±0.00 +0.004±0.01 Metrics 500 +0.053±0.02 +0.081±0.05 +0.080±0.04 +0.058±0.03 Metrics 5000 +0.042±0.02 +0.055±0.04 +0.053±0.04 +0.045±0.03 Morph+Metrics 500 +0.080±0.02 +0.118±0.05 +0.110±0.04 +0.087±0.03 Morph+Metrics 5000 +0.056±0.02 +0.077±0.05 +0.073±0.05 +0.061±0.03 Table 4: ROUGE scores of the Basic-500 configuration and differences of the others when compared to it. Scoring the original lyrics for the former, on the same aspects we were considering for the latter, also giving us another term of comparison. Since workers were not from Portugal, they would not recognise the original lyrics. Therefore, we thought that it Table 5 shows the scores obtained by the original lyrics of would be interesting to mix the original lyrics among the au- the songs used in this experimentation. They are based on tomatically transformed. This enabled us to gather opinions four human opinions for each of the five original lyrics. The Proceedings of the 12th International Conference on Computational Creativity (ICCC ’21) 68 ISBN: 978-989-54160-3-5
semantic similarity of the given title was computed against by European Social Fund, through the Regional Operational the original title of the song. Program Centro 2020. Results are interesting to show that, according to human opinions, original songs still have issues. This happens for References grammaticality, less expected, but also for semantics and, es- pecially, for the overall appreciation, which is also the most Bay, B.; Bodily, P.; and Ventura, D. 2017. Text transforma- subjective. Our interpretation is that song lyrics are not al- tion via constraints and word embedding. In Proceedings ways easy to interpret, also due to the presence of figura- 8th International Conference on Computational Creativ- tive language. Together with the position of the line breaks, ity, ICCC 2017, 49–56. this might also have a minor impact on grammaticality. Not Devlin, J.; Chang, M.-W.; Lee, K.; and Toutanova, K. 2019. to mention that all lyrics were in European Portuguese, but BERT: Pre-training of deep bidirectional transformers for scored by Brazilian workers. language understanding. In Proceedings 2019 Confer- Regarding the average similarity between given and origi- ence of the North American Chapter of the Association nal titles, it is only higher than similarity between given titles for Computational Linguistics: Human Language Tech- for the MM configuration and their themes. This suggests nologies, 4171–4186. ACL. that, even though replacements are made in this configura- Gatti, L.; Özbal, G.; Stock, O.; and Strapparava, C. 2017. tion, relatedness between the target themes and the transmit- Automatic generation of lyrics parodies. In Proceedings ted message could be stronger. 25th ACM International Conference on Multimedia, 485– 491. Conclusion Gonçalo Oliveira, H.; Hervás, R.; Dı́az, A.; and Gervás, P. We have explored the application of a popular masked lan- 2017. Multilingual extension and evaluation of a poetry guage model, BERT, in the transformation of creative text, generator. Natural Language Engineering 23(6):929– namely Portuguese song lyrics. We took advantage of the 967. mask-filling feature of this model and further constrained the filling predictions, according to their similarity to a given Gonçalo Oliveira, H. R.; Cardoso, F. A.; and Pereira, F. C. theme, PoS and metre. 2007. Tra-la-Lyrics: an approach to generate text based on rhythm. In Proceedings 4th International Joint Work- There are trade-offs between increasing the similarity shop on Computational Creativity, 47–55. IJWCC 2007. with a theme and keeping the text grammatical, so, finding the right balance can be tricky. Our evaluation was limited Gonçalo Oliveira, H. 2017. A survey on intelligent poetry to a language, a set of original lyrics and constraints, in any generation: Languages, features, techniques, reutilisation case leading to an overall appreciation of the transformed and evaluation. In Proceedings 10th International Confer- lyrics below the one for the original. Towards more success- ence on Natural Language Generation, INLG 2017, 11– ful results, testing with different parameters is required, and, 20. ACL. depending on the final purpose, human curation might still Gonçalo Oliveira, H. 2020. WeirdAnalogyMatic: Ex- be necessary in the end. A possibly erroneous conclusion perimenting with analogy for lyrics transformation. In was that matching the metre does not contribute to higher Proceedings 11th International Conference on Computa- appreciation, but this might be due to our impediments on tional Creativity, ICCC 2020, 228–235. ACC. properly assessing singability, due to lack of workers that knew the original songs. Lamb, C.; Brown, D. G.; and Clarke, C. L. 2017. A taxon- We cannot say that we are completely satisfied, but inter- omy of generative poetry techniques. Journal of Mathe- esting results were obtained. Enough to believe that BERT matics and the Arts 11(3):159–179. should be seen as a useful tool in the transformation of cre- Li, P.; Zhang, H.; Liu, X.; and Shi, S. 2020. Rigid formats ative text. For stronger conclusions, further experimenta- controlled text generation. In Proceedings of the 58th An- tion is needed, e.g., in other languages, considering other nual Meeting of the Association for Computational Lin- texts, different numbers of candidates and possibly other guistics, 742–751. Online: Association for Computa- constraints. This also applies to alternative ways of using tional Linguistics. BERT, e.g., on computing the similarity of each candidate to the theme (e.g., just the token, right context followed by Liao, Y.; Wang, Y.; Liu, Q.; and Jiang, X. 2019. GPT-based the token, full resulting sequence), or of getting sequence generation for classical Chinese poetry. arXiv preprint representations (e.g., from different layers, considering the arXiv:1907.00151. contextual token embeddings), possibly after fine-tuning the Lin, C.-Y. 2004. ROUGE: A package for automatic evalua- model to some task where creative text is used. tion of summaries. In Text Summarization Branches Out, 74–81. Barcelona, Spain: ACL. Acknowledgments Loper, E., and Bird, S. 2002. NLTK: The natural language This work was supported by national funds through the FCT toolkit. In Proceedings ACL-02 Workshop on Effective – Foundation for Science and Technology, I.P., within the Tools and Methodologies for Teaching Natural Language scope of the project CISUC – UID/CEC/00326/2020 and Processing and Computational Linguistics, 63–70. Proceedings of the 12th International Conference on Computational Creativity (ICCC ’21) 69 ISBN: 978-989-54160-3-5
Grammar Semantic Coherence Appreciation 3 Med 3 Med Sim(title) 3 Med 10% 5% 85% 4.0 10% 10% 80% 4.0 0.670±0.08 5% 30% 65% 4.0 Table 5: Scores for the original lyrics. Mendes, R., and Gonçalo Oliveira, H. 2020. TECo: Ex- ploring word embeddings for text adaptation to a given context. In Proceedings 11th International Conference on Computational Creativity, ICCC 2020, 185–188. ACC. Nikolov, N. I.; Malmi, E.; Northcutt, C.; and Parisi, L. 2020. Rapformer: Conditional rap lyrics generation with denoising autoencoders. In Proceedings 13th In- ternational Conference on Natural Language Generation, INLG 2020, 360–373. ACL. Potash, P.; Romanov, A.; and Rumshisky, A. 2015. Ghost- Writer: Using an LSTM for automatic rap lyric gener- ation. In Proceedings 2015 Conference on Empirical Methods in Natural Language Processing, EMNLP 2015, 1919–1924. ACL. Radford, A.; Wu, J.; Child, R.; Luan, D.; Amodei, D.; and Sutskever, I. 2019. Language models are unsupervised multitask learners. OpenAI Blog 1(8):9. Ranchhod, E.; Mota, C.; and Baptista, J. 1999. A computa- tional lexicon of Portuguese for automatic text parsing. In Proceedings of SIGLEX99 Workshop: Standardizing Lex- ical Resources. ACL. Souza, F.; Nogueira, R.; and Lotufo, R. 2020. BERTim- bau: Pretrained BERT models for Brazilian Portuguese. In Proceedings Brazilian Conference on Intelligent Sys- tems (BRACIS 2020), volume 12319 of LNCS, 403–417. Springer. Toivanen, J. M.; Toivonen, H.; Valitutti, A.; and Gross, O. 2012. Corpus-based generation of content and form in po- etry. In Proceedings of the 3rd International Conference on Computational Creativity, ICCC 2012, 211–215. van Stegeren, J., and Theune, M. 2019. Churnalist: Fictional headline generation for context-appropriate flavor text. In Proceedings 10th International Conference on Computa- tional Creativity, ICCC 2019, 65–73. UNC Charlotte, North Carolina, USA: ACC. Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A. N.; Kaiser, Ł.; and Polosukhin, I. 2017. Attention is all you need. In Advances in neural informa- tion processing systems, 5998–6008. Appendix For better understanding of their contents, Figures 3 and 4 show rough English translations of the Portuguese lyrics in Figures 1 and 2, respectively. Proceedings of the 12th International Conference on Computational Creativity (ICCC ’21) 70 ISBN: 978-989-54160-3-5
Original: Estou Além Original: Contentores Original: Contentores Theme: criatividade computacional Theme: festival da eurovisão Theme: portal das finanças Titles: ao me conhecer (meeting me, 0.68), Titles: a despedida (the farewell, 0.72), Titles: eu me vou (i’m leaving, 0.68), arte de gerar (art of generating, 0.68) adeus romantico (romantic goodbye, 0.69) impostos (taxes, 0.70) Gram: 4.5, Sem: 4.5 Gram: 4, Sem: 4 Gram: 3.5, Sem: 3 No need to transform The Greek crisis and arrival in the corridors The fake and locked sheet in the corridors This utility system Farewell to my shoes cause I am going Farewell to my taxes, I’m going The art of generating To another world To another world So that it doesn’t become late In a star appearing in a judicial project In a foreign bank in a commercial govern- I am not of what there is I use This where I go does not make anything bad ment Is it from of this dimension To the theatre world It doesn’t save that bad this where I’m going But why does it make me abuse All the colours have changed All over the world Who makes me give my hand make alone the readers All the pains have changed I will continue to explore And in an impossible crisis They use the values alone To whom I give myself Seeing Central Lisbon And in an impossible crisis Because until here I only Do nothing wrong I sell the general magazine I have who, who I’ve never seen Does not enter that bad Because I only find who Who I haven’t met Figure 3: Lyrics with high overall appreciation, all with MM-500 configuration. Original: Contentores Original: O Anzol Original: Cavalos de Corrida Theme: plano de vacinação Theme: criatividade computacional Theme: festival da eurovisão Config: MM-5000 Config: MM-5000 Config: MM-500 Titles: vı́rus (virus, 070), Titles: medir (measuring, 0.80), Titles: hipocresia (hypocrisy, 0.59), o vı́rus (the virus, 0.70) medição (measurement, 0.80) evento de comida (food event, 0.59) Gram: 3.5, Sem: 2.5 Gram: 2, Sem: 2 Gram: 3.5, Sem: 2.5 Influenza pay and consultation in the bearers Oh, I’ve already measured Now that Europe is over, and the Farewell to my archives, I’m leaving think about moving the top in log from blue media follow in an effort To another background For structural cm Now they’ve all played, the mythol- In a virus treated in an oriental scheme But only then I measured ogy at stake It does not treat badly this where I go make plural you already have, it assembles Now they make the media, destroy- To the complete fund someone ing all the laws indicate all the pains Who assembles a mental idea Now they follow in the hole and should alone the underaged I am not whether to measure they do it at any price And in a compatible list Or generate the pattern Now, now, now, now, you give a Home Health Plan There’s nothing new here anymore food event It does not treat that badly Under the sun Figure 4: Lyrics with low overall appreciation. Proceedings of the 12th International Conference on Computational Creativity (ICCC ’21) 71 ISBN: 978-989-54160-3-5
You can also read