Conversational Machine Reading Comprehension for Vietnamese Healthcare Texts
←
→
Page content transcription
If your browser does not render page correctly, please read the page content below
Conversational Machine Reading Comprehension for Vietnamese Healthcare Texts Son T. Luu1,2,♣ , Mao Nguyen Bui1,2,+ , Loi Duc Nguyen1,2,+ , Khiem Vinh Tran1,2,+ , Kiet Van Nguyen1,2,♣? , and Ngan Luu-Thuy Nguyen1,2,♣ 1 University of Information Technology, Ho Chi Minh City, Vietnam 2 Vietnam National University Ho Chi Minh City, Vietnam arXiv:2105.01542v5 [cs.CL] 2 Jul 2021 ♣ {sonlt,kietnv,ngannlt}@uit.edu.vn,+ {16520724, 16521722, 17520634}@gm.uit.edu.vn Abstract. Machine reading comprehension (MRC) is a sub-field in nat- ural language processing that aims to assist computers understand un- structured texts and then answer questions related to them. In practice, the conversation is an essential way to communicate and transfer in- formation. To help machines understand conversation texts, we present UIT-ViCoQA, a new corpus for conversational machine reading compre- hension in the Vietnamese language. This corpus consists of 10,000 ques- tions with answers over 2,000 conversations about health news articles. Then, we evaluate several baseline approaches for conversational machine comprehension on the UIT-ViCoQA corpus. The best model obtains an F1 score of 45.27%, which is 30.91 points behind human performance (76.18%), indicating that there is ample room for improvement. Our dataset is available at our website: http://nlp.uit.edu.vn/datasets/ for research purposes. Keywords: conversations, question answering, machine reading com- prehension, deep neural models, texts 1 Introduction Conversation is a standard method to communicate between people, and it plays an important role in human daily life. The process of asking a question and responding to an answer brings helpful information about a specific domain. Healthcare is one of the most concerning problems for many people. Many audiences often read the healthcare news, and people tend to discuss frequently about health and medicine. Thus, based on the conversations about healthcare, we constructed a corpus named UIT-ViCoQA for conversational question an- swering on healthcare texts in Vietnamese. The UIT-ViCoQA contains 2,000 con- versations and 10,000 questions from articles about health news in Vietnamese. This corpus is used to train the computer for understanding the conversation and giving the right answers based on the conversation context from questions ? Corresponding author: Kiet Van Nguyen. Email: kietnv@uit.edu.vn
2 of users. Besides, we implement neural-based models for conversational question answering including: DrQA [1], GraphFlow [2], FlowQA [8], and SDNet [22] on the UIT-ViCoQA corpus. Then, we evaluate the performance of those models on the UIT-ViCoQA dataset. The main contribution in this paper includes providing a corpus for conversa- tional machine comprehension about healthcare texts in Vietnamese and evaluat- ing the performance of baseline MRC models on the dataset. Our paper is struc- tured as described. Section 2 takes a literature review about the conversation machine comprehension corpora and models. Section 3 provides overview infor- mation about the UIT-ViCoQA dataset. Section 4 introduces available state-of- the-art approaches for the conversational machine comprehension task. Section 5 shows our empirical results and error analysis of question-answering models on the UIT-ViCoQA corpus. Finally, Section 6 concludes our works. 2 Related Works Machine reading comprehension (MRC) is a challenging task of natural language processing (NLP) which enables machines to understand the reading text and answer the questions [16]. Many of MRC corpora are constructed on specific domains, and open domains in English such as SQuAD [16] (extractive MRC) on Wikipedia articles, RACE [11] (multiple choices MRC) on High school students English Exams domain, and NarrativeQA [9] (abstractive MRC) on books and stories domain. For the Vietnamese language, the UIT-ViQuAD [14] (Wikipedia domain), and UIT-ViNewsQA [21] (Health news domain) are two extractive MRC corpora for machine reading comprehension. Besides, the ViMMRC [13] is the multiple-choice reading comprehension corpus on the Vietnamese students’ textbook for primary schools domain. Machine reading comprehension applied in question-answering (QA) systems is another challenge that the MRC models have to understand both given texts and conversational context and then answer relevant questions. These questions are often paraphrased, contain co-reference queries, and their answers can be spans texts or free-form. This type of MRC is called Conversational Machine Comprehension (CMC) [7]. CoQA [17] and QuAC [3] are two CMC corpora in English. Based on the CoQA works, we constructed the UIT-ViCoQA for automated reading comprehension on the health news articles in the Vietnamese language. Attention-based reasoning with sequence models and FLOW mechanism are two approaches for CMC models, according to Gupta et al. [7]. DrQA [1] and PGNet [19] are two neural attention-based models implemented in the CoQA cor- pus. Next, SDNet is another attention-based model that combines inter-attention and self-attention to comprehend the conversation context. Finally, FlowQA [8] and GraphFlow [2] are two flow-based models that used to yield the contextual information through sequences.
3 3 The Corpus Our data creation process consisting of three phases is described in Figure 1. In the first phase, we collect news articles about health from VnExpress3 - the most read online newspapers in Vietnam by using scrapy4 - a web crawler tool for collecting articles from the online newspaper. In the next phase, we construct an annotation tool for creating conversational data. Our annotation tool allows two annotators to create the conversation based on the given articles. Finally, in the third phase, we hire a team of annotators who create data on our annotation tool. The detailed steps from the annotation process are described below. 3.1 Data collection Create data Crawler crawler Collected data Phase 1: Data collection Create data Annotation annotation tool tool Phase 2: Building annotation tool Data Data Data Annotators creation annotators creation hiring process UIT-ViCoQA Corpus Phase 3: Data creation Fig. 1: The creation process of the UIT-ViCoQA corpus. For each conversation (C), we hire two different annotators, which are question- ers and answerers, respectively. The questioner goes first by asking a question (Q). The question is sent to the answerer then. After receiving the question, the 3 https://vnexpress.net/suc-khoe 4 https://scrapy.org/
4 answerer gives the answer by selecting a span of text from the article (S) and then submits the natural answer (A). Next, the annotation system compares the answer given by the answerer with the asked question of the questioner by character level. If the given answer matches about 70% with the asked question, it is a valid answer, and two annotators can move to the next turn. In contrast, the answerer must give another answer. There is a total of five turns for asking and answer per article. In the data creation process, we have some requirements for questioners and answerers as: (1) The answers must be extracted from the article. Questions that cannot be answered according to the article are not allowed, (2) Questioners are encouraged to give questions with synonyms, opposite words, and coreference, and (3) The answers should be short and limited to use new words from the article content. Moreover, the selected answerers need to give full answers with complete texts, correct syntax, and punctuations. 3.2 Dataset overview Table 1: An example of conversation in the UIT-ViCoQA corpus. Trạng thái "ngủ" là cách các tế bào ngay lập tức thay đổi để kháng lại phương pháp điều trị. Các phương pháp điều trị ung thư vú thường thành công, tuy nhiên một số trường hợp ung thư tái phát và tiên lượng xấu hơn. Ông Luca Magnani, Khoa Dược, Đại học Hoàng Gia London, Anh, cho biết phương pháp điều trị bằng hormone hiện được sử dụng cho phần lớn bệnh nhân ung thư vú ... (The status of "sleep" is the way when the cell changes immediately to resist treatment. The treatment methods of breast cancer are often successful. However, some cases of cancer recur, and the prognosis worsens. Mr. Luca Magnani, Faculty of Medicine, Imperial College London, says that the treatment method by using hormones is used for a huge amount of breast cancer patients ... ) Q1 Phương pháp thường được sử dụng để chữa trị ung thư vú là gì ? (What is the treatment method usually use for breast cancer treatment?) S1 Ông Luca Magnani, Khoa Dược, Đại học Hoàng Gia London, Anh, cho biết phương pháp điều trị bằng hormone hiện được sử dụng cho phần lớn bệnh nhân ung thư vú . (Mr. Luca Magnani, Faculty of Medicine, Imperial College London, says that treatment method by using hormone is used for a huge amount of breast cancer patients.) A1 điều trị bằng hormone (using hormone) Q2 Các bác sĩ có lo ngại gì về phương pháp này? (What are doctors concerned about for this treatment?) S2 Từ lâu, các nhà khoa học đã đặt câu hỏi, liệu pháp này thực chất có tiêu diệt được các tế bào ung thư vú không, hay chỉ là chuyển các tế bào sang trạng thái "ngủ yên". (Scientists have long questioned whether this therapy actually kills breast cancer cells, or just puts the cells in an "inactive" state.) A2 nó đưa các tế bào ung thư sang trạng thái "ngủ yên" (This treatment puts the cells in an "inactive" state) Q3 Vậy những nghiên cứu này có ý nghĩa như thế nào? (What profits from these studies?) S3 cũng giải thích rằng những phát hiện hiện tại sẽ mở ra lộ trình mới cho việc nghiên cứu chữa trị ung thư. (explaining that current works can open new future researchs about cancer treatments) A3 mở ra lộ trình mới cho việc nghiên cứu chữa trị ung thư (Opening new research for cancer treatments) The UIT-ViCoQA corpus contains 2,000 conversations. Each conversation con- sists of a reading article and five question-answer pairs. We follow the structure of the CoQA [17] for our dataset. According to Table 1, to answer question Q2, the answerer needs to read the passage and looks back to question Q1 and
5 answer A1 to retrieve the relevant information. Similar to question Q2, the an- swerer needs to read the reading passage and two previous question-answer pairs (Q1, A1) and (Q2, A2) to extract the answer A3. The chain of question-answer pairs Q1-A1, Q2-A2 is the history of the conversation. Table 2 provides the overview of the UIT-ViCoQA corpus and compares it with the CoQA corpus. The result illustrates that although the number of ques- tions and answers in the UIT-ViCoQA corpus is lower than the CoQA corpus, the average number of words in the UIT-ViCoQA dataset is larger than the CoQA dataset. This is because the interrogative words in English contain a single word (e.g., who?, when?, and why?) while they may have two words in Vietnamese. For example, the words "who" means "ai", "when" means "khi nào" and "why" means "tại sao". Besides, the UIT-ViCoQA is constructed on a specific domain. Hence it is not as diverse as the CoQA corpus. Table 2: Overview information about the UIT-ViCoQA and CoQA corpus. UIT-ViCoQA CoQA Domain text Health domain Diverse domains Number of passages 2,000 8,399 Number of questions 10,000 127,000 Passage length 404.1 271.0 Question length 9.4 5.5 Answer length 9.7 2.7 3.3 Dataset analysis Table 3: The types of question in the UIT-ViCoQA corpus. Question Ratio Example types (%) What trans fat là gì? (what is trans fat?) 32.6 How many Vietnam có bao nhiêu ca nhiễm COVID-19? (How many cases of COVID 19 are 17.2 detected in Vietnam?) How DCVax hoạt động như thế nào? (How does DCVax work?) 7.6 Yes/No Có tiền sử bị bệnh gì không ? (Have a history of any illness?) 6.6 Who Những người nào dễ bị xơ gan? (Who is susceptible to cirrhosis?) 9.0 Why Vì sao nhang có thể ảnh hưởng xấu tới cơ thể? (Why incense can adversely affect 7.8 the body?) Which Nhóm nào chiếm tỉ lệ cao nhất? (Which group accounts for the highest percentage?) 7.0 When Khi nào thì cô có thể kết thúc điều trị? (When can she finish treatment?) 2.6 Where Zhou Xiaoying sinh sống ở đâu? (Where does Zhou Xiaoying live?) 4.0 Others Còn du thuyền Diamond Princess? Kể tên một số quốc gia có số mắc cao (About the 5.6 Diamond Princess yacht? Name a few countries with high risk?)
6 In Vietnamese, the process of interaction contains statements between two peo- ple. Each statement contains two functional elements, including the negotiatory for carrying the argument in statements that go through the conversation and the remainder to keep the rest information of statements [20]. The negotiatory is an essential part of the statement in the conversation. The negotiatory element comprises interrogatives particles, element interrogatives items, and imperative particles. The interrogatives are the characteristic of questions. In Table 3, we show all kinds of questions in Vietnamese that are usually used in daily life. The interrogative words are marked bold in the sentence. According to Table 3, the "What" type accounts for the highest ratio in the UIT-ViCoQA corpus (32.6%). Table 4: Linguistic phenomena in UIT-ViCoQA questions. Phe- Ratio Example nomenon (%) Relationship between a question and its passage Q: Ai làm giám đốc quốc gia của Hiệp hội Sảy thai? (Who is the director of the association of miscarriage?) Lexical A: Ruth Bender - Atik 47.6 match S: Ruth Bender - Atik, giám đốc quốc gia của Hiệp hội Sảy thai (Ruth Bender - Atik, national director of the association of miscarriage) Q: Giá cho mỗi con robot là bao nhiêu? (How much is the price of each robot?) Paraphras- A: 500000 RMB 48.0 ing S: Các robot có giá 500000 RMB (khoảng 72000 USD) (Robots have price 500000 RBM, about 72000 USD) Q: Vì sao? (Why?) A: Do sầu riêng chứa nhiều chất dinh dưỡng, nhiều năng lượng, cộng với cồn nồng độ cao làm cho nhịp tim tăng (Because durian contains lots of nutrients, energy, combining with high concentration of alcohol, make heartbeat increase.) Pragmatics 4.4 S: Chuyên gia dinh dưỡng Nguyễn Mộc Lan cho biết sầu riêng nhiều chất dinh dưỡng, nhiều năng lượng, cộng với rượu nồng độ cao làm cho nhịp tim tăng. (Nutritionist Nguyen Moc Lan said durian has a lot of nutrients, lots of energy, plus a high concentration of alcohol makes your heart rate increase.) Relationship between a question and its conversation history No Q: Phô mai có giá trị dinh dưỡng thế nào? (How does cheese have nutritional 73.6 coreference value?) Q1: Loại bệnh nào Tiểu Lý mắc phải từ ban đầu? (What kind of illness was Tieu Explicit Ly initially? ) 20.6 coreference A1: bệnh lao phổi (tuberculosis) Q2: Anh ta chữa bệnh trong thời gian bao lâu? (How long does he treat?) Q1: Ở Hải Phòng bệnh nhân từ đâu trở về? (Where does the patient come from in Implicit Hai Phong?) 5.8 coreference A1: Quảng Đông (Guangdong) Q2: Hiện có triệu chứng gì? (What symptoms are there?) Next, we randomly divide our corpus into training, development, and test sets with proportions 70%, 15%, and 15%, respectively. Then, we take 100 articles by random from the development set to analyze and evaluate the corpus, which
7 is called analysis set [17]. We segment texts in the corpus by the Underthesea framework5 . According to Gupta et al. [7], the Conversational Machine Comprehension (CMC) model answers the question by extracting information not only from the reading texts but also from conversational history. Therefore, the main linguistic phenomena in the UIT-ViCoQA are based on the relationship between questions and the reading passage and the relationship between questions and the conver- sation history. Table 4 displays the linguistic phenomena in the UIT-ViCoQA corpus. For the relationship between questions and the reading texts, there are three types of phenomena: lexical match, paraphrasing, and pragmatic. The lexical match indicates that the questions contain the same words as the reading texts. In contrast, paraphrasing is the question in which their words use synonyms from the reading texts, and pragmatic means the question uses words that do not relate to the reading texts. The proportions of lexical match, paraphrasing, and pragmatic phenomenon in the UIT-ViCoQA corpus are 47.6%, 48.0%, and 4.4%, respectively, as shown in Table 4. In addition, for the relationship between questions and the conversation his- tory, there are three types of relational phenomena: no coreference, explicit coref- erence, and implicit coreference. The percentages of no coreference, explicit coref- erence, and implicit coreference in the UIT-ViCoQA corpus are 73.6%, 20.6%, and 5.8%, respectively, according to Table 4. 4 Methodologies According to Gupta et al. [7], a typical conversation reading comprehension task consists of reading passage as context (C), the conversation history (H) in- cludes multiple question-answer pairs, and the generated answers (A). Therefore, this task combines two models: the machine reading comprehension model for encoding the questions and context into neural space vectors and the question- answering model to generate and decode answers from questions to natural lan- guage. For the machine reading comprehension model, the Document Reader (DrQA) introduced by Chen et al. [1] is a powerful model on various of machine reading comprehension corpora such as: SQuAD [16], TextWorldsQA [10], and UIT- ViQuAD [14]. The DrQA model consists of two modules: Document Retriever and Document Reader. We use the Document Reader of the DrQA to extract the answer spans for the questions. Besides, for the conversational comprehension task, the generated answers are not only from the reading passage but also the conversation history. The model extracts the history of conversations as a special context to generate new answers. SDNeT model [22] is a contextual attention-based model based on the idea of DrQA with a special mechanism to extract the context of the conversation. 5 https://github.com/undertheseanlp/underthesea
8 Furthermore, The FLOW mechanism enables the MRC models to encode the history of the conversation comprehensively. Hence, this mechanism integrates well the latent semantic of the conversation history. FlowQA [8] and GraphFlow [2] are two flow-based neural models that grasping the conversational history context to generate answers. 5 Experiments 5.1 Data preparation We pre-process the data before fitting to the model by these following steps: (1) Removing special characters and stop words, (2) Segmenting sentences into words by using the Underthesea tool, and (3) Transforming the texts into vectors by using fastText word embedding in the Vietnamese language provided by Grave et al. [6]. The dimension of fastText word embedding is 300. 5.2 Evaluation metrics We evaluate the performance of the models by comparing the generated answers with the accurate answers on F1-score and Exact match (EM) score. The F1- score measures the right predicted answers comparing with the correct answers. The EM score measures the exact matching of prediction answers with original answers [16]. 5.3 Experiment results The FLOW models give optimistic results on the UIT-ViCoQA corpus. Accord- ing to Table 5, FlowQA obtains the highest result by F1-score on both develop- ment and test sets. For the EM score, the SDNet model gives the highest results. However, there is a large gap between the F1 and the EM scores as well as the performance of CMC models and human performance. Table 5: Experimental results on the UIT-ViCoQA corpus. EM (%) F1-score (%) Model Dev Test Dev Test DrQA 13.17 13.50 43.28 37.71 SDNet 15.40 15.60 41.90 40.50 FlowQA 13.13 12.53 44.84 45.27 GraphFlow 13.77 14.73 44.69 45.16 Human performance 35.67 38.66 73.33 76.18
9 Table 6: The answers predicted by models on a sample in the UIT-ViCoQA corpus Tính đến ngày 18/2, Việt Nam có 16 ca nhiễm covid-19. Trong đó, Vĩnh Phúc có tới 5 công nhân và 6 người thân của họ bị lây nhiễm. Con số này khiến các doanh nghiệp đặt ra câu hỏi về nguy cơ lây lan virus khó lường trong môi trường doanh nghiệp. Chỉ cần một trường hợp phát hiện nhiễm Covid-19 là cả văn phòng, phân xưởng tiếp xúc với người bệnh sẽ phải cách ly cô lập, gây gián đoạn hoạt động sản xuất kinh doanh, tạo áp lực lên hệ thống y tế công. Ông Đoàn Đình Duy Khương - Tổng Giám đốc điều hành Dược Hậu Giang về vấn đề bảo vệ sức khỏe lao động cho biết, mỗi ngày họ phải dành hơn 1/3 thời gian cho nơi làm việc ... (Up to 18/2, Vietnam has 16 affected cases of covid-19. Specifically, Vinh Phuc has 5 workers and 6 relatives of whom are affected. This number makes the enterprises question about the risk of virus spreading in working environment. If only one case is detected to be affected Covid-19, the whole offices, factories which are contacted with the patients will be quarantined, disrupting production and business activities, and putting pressure on the public health system. Mr Đoàn Đình Duy Khương - General director of Hau Giang Pharmacy about protecting labor health affairs, says that, everyday they have to spend more than 1/3 of their time at work ... ) Q1 Việt Nam có bao nhiêu ca nhiễm tính đến 18/2? (How many affected cases of Vietnam have been counted up to 2/18?) Original Tính đến ngày 18/2, Việt Nam có 16 ca nhiễm covid-19. (Up to 18/2, Vietnam has 16 affected cases of covid-19 ) DrQA 16 SDNet 16 ca nhiễm covid-19. (16 affected cases of covid-19) FlowQA 16 ca (16 cases) GraphQA 16 ca nhiễm covid-19 (16 affected cases of covid-19) Q2 Cụ thể? (Detail?) Original Trong đó, Vĩnh Phúc có tới 5 công nhân và 6 người thân của họ bị lây nhiễm. (Specifically, Vinh Phuc has 5 workers and 6 relatives of whom are affected.) DrQA Vĩnh Phúc có tới 5 công nhân và 6 người thân của họ bị lây nhiễm. (Vinh Phuc has 5 workers and 6 relatives of whom are affected.) SDNet Trong đó, Vĩnh Phúc có tới 5 công nhân và 6 người thân của họ bị lây nhiễm. (Vinh Phuc has 5 workers and 6 relatives of whom are affected.) FlowQA Trong đó, Vĩnh Phúc có tới 5 công nhân và 6 người thân của họ bị lây nhiễm. (Vinh Phuc has 5 workers and 6 relatives of whom are affected.) GraphQA Trong đó, Vĩnh Phúc có tới 5 công nhân và 6 người thân của họ bị lây nhiễm. (Vinh Phuc has 5 workers and 6 relatives of whom are affected.) Q3 Doanh nghiệp nghĩ gì? (What does the enterprise think about?) Original Con số này khiến các doanh nghiệp đặt ra câu hỏi về nguy cơ lây lan virus khó lường trong môi trường doanh nghiệp.(This number makes the enterprises question about the risk of virus spreading in working environment.) DrQA doanh nghiệp đặt ra câu hỏi về nguy cơ lây lan virus khó lường trong môi trường doanh nghiệp.(the enterprise questions about the risk of virus spreading in working environment.) SDNet ca nhiễm covid-19. (affected cases of covid-19) FlowQA các doanh nghiệp đặt ra câu hỏi về nguy cơ lây lan virus khó lường trong môi trường doanh nghiệp.. (the enterprises question about the risk of virus spreading in working environment.) GraphQA virus khó lường trong môi trường doanh nghiệp. (The virus is dangerous in working environment) Q4 Nguy cơ là gì? (What is risk?) Original Chỉ cần một trường hợp phát hiện nhiễm Covid-19 là cả văn phòng, phân xưởng tiếp xúc với người bệnh sẽ phải cách ly cô lập, gây gián đoạn hoạt động sản xuất kinh doanh, tạo áp lực lên hệ thống y tế công. (If only one case is detected to be affected Covid-19, the whole offices, factories which are contacted with the patients will be quarantined, disrupting production and business activities, and putting pressure on the public health system.) DrQA Chỉ cần một trường hợp phát hiện nhiễm Covid-19 là cả văn phòng. (If only one case is detected to be affected Covid-19, the whole offices.) SDNet khó phát triển bền vững. FlowQA Chỉ cần một trường hợp phát hiện nhiễm Covid-19 là cả văn phòng. (If only one case is detected to be affected Covid-19, the whole offices.) GraphQA Chỉ cần một trường hợp phát hiện nhiễm Covid-19 là cả văn phòng. (If only one case is detected to be affected Covid-19, the whole offices.) Q5 Đoàn Đình Duy Khương là ai? (Who is Đoàn Đình Duy Khương?) Original Ông Đoàn Đình Duy Khương - Tổng Giám đốc điều hành Dược Hậu Giang. (Mr. Đoàn Đình Duy Khương - General director of Hau Giang Pharmacy) DrQA Ông Đoàn Đình Duy Khương - Tổng Giám đốc điều hành Dược Hậu Giang về vấn đề bảo vệ sức khỏe lao động cho biết. (Mr Đoàn Đình Duy Khương - General director of Hau Giang Pharmacy about protecting labor health affairs says) SDNet khó phát triển bền vững. (hard to develop stably) FlowQA Ông Đoàn Đình Duy Khương - Tổng Giám đốc điều hành Dược Hậu Gia. (Mr Đoàn Đình Duy Khương - General director of Hau Gia) GraphQA Ông Đoàn Đình Duy Khương - Tổng Giám đốc điều hành Dược Hậu Giang về vấn đề bảo vệ sức khoẻ. (Mr Đoàn Đình Duy Khương - General director of Hau Giang Pharmacy about protecting health affairs)
10 5.4 Error analysis Table 6 shows the predicted answers given by four different models, including DrQA, SDNet, FlowQA, and GraphFlow, respectively. In general, FlowQA and GraphFlow give the most relevant answer as the original answer. For example, in the question Q3 - "What the enterprise think about?", the reader needs to look back to the previous question-answer Q1-A1 and Q2-A2 to inference the context about the "affected cases of COVID-19" (Q1) and the "detailed of affected cases" (Q2). GraphFlow and FlowQA offer the most relevant answer than DrQA for the question Q3. For question Q5, GraphFlow provides the most relevant answer about the person mentioned in the reading passage, while other models give the answer with redundant information in comparison with the original answer. For the question Q4, both four models cannot give the exact answer. This is due to the ambiguity of Vietnamese interrogative words in questions where it is written in the genuine and non-genuine form. For example, the question Q2: "Cụ thể?" can be understood as "What is the detail?" or "How it happened?". Besides, the question Q4: "Nguy cơ là gì?" can be understood as "What is the risk?" or "How bad is the risk?". This is known as the MOOD in the Vietnamese. The interrogative clause in Vietnamese consists of two main elements: the negotiatory and the remainders. The negotiatory carries the centroid of the interaction. This aspect of Vietnamese interrogative is described carefully by Thai [20]. 35.12 Ratio of right answers (%) 30 20 14.55 11.7 10 8.19 7.02 6.86 5.02 4.35 4.01 3.18 0 y hy o ow t ho ch n re rs ha an /N he e he W hi W H th W m W s W W Ye O ow H Fig. 2: The impact of question types on the performance of models. In addition, we study the ability of the models for retrieving correct answers based on the type of questions on the development set. Figure 2 shows the ratio of correct answers by different kinds of questions in the UIT-ViCoQA corpus. A question gives the right answers if the F1-score is greater than 70%. According to Figure 2, the question type "What" has the highest ratio, which is 35.12%.
11 Besides, the question type "What" accounts for 32.6% as described in Table 3. Therefore, the models mostly give the correct answers to this kind of question. Furthermore, the question types "How many" and "Who" also have a high ratio. Table 7: Types of predicted answer given by the models. Ratio Types Description Example (%) Matching The predicted answers fully Q: Việc này có giúp tình trạng tốt lên không? (Does this help improve the condition?) 16.73 answers match with truth answers P: Không (No) A: Không (No) Free-form The predicted answer only Q:Tỷ lệ ung thu Việt Nam có cao không? (Is the rate of cancer in Vietnam high?) 59.93 answers match the a part of truth answers P: cao (High) A: có (Yes) Wrong Q: Béo phì có gây dậy thì sớm không? (Does obesity The predicted answer does 23.27 answers cause early puberty?) not match the truth answer P: Không (No) A: Có (Yes) Finally, we analyze the predicted answers on the development set. According to Table 7, there are three types of the answer given by the models, and most of the predicted answers are concentrated on the free-form type, which accounts for 59.93%. This is why the F1 and EM scores have a considerable difference, as described in Table 5. In general, most error predictions are due to the number of questions and the variety of answers, as well as the linguistic phenomena. Therefore, it is necessary to increase the number of questions and the question types as well as enriching answers to make the corpus more diverse. 6 Conclusion and future work In this paper, we propose the dataset about machine reading comprehension for healthcare texts in Vietnamese. This dataset includes 2,000 health articles with 10,000 questions. We also conduct experiments on several baseline models, and the best result in the F1-score is 45.27%. Nevertheless, the difference between F1 and EM scores is large. This is due to the linguistic phenomena about the Vietnamese interrogative particles and the limited answers. Therefore, it is nec- essary to increase the number of questions and answers as well as make questions and answers more diverse in further research. Besides, enabling the CMC models to capture and understand the contextual meaning of the conversation history is also a challenging task in the conversational machine reading comprehension model researching. In future, we plan to increase the quantity and quality of the UIT-ViCoQA corpus as well as to conduct further experiments on deep learning and transfer
12 learning using pre-trained language models [4, 5, 12, 18] to enhance the perfor- mance of CMC models on the UIT-ViCoQA corpus. Inspired by the conversa- tional question answering system [15], we suggest using this model and UIT- ViCoQA for building Vietnamese conversational question answering systems. Acknowledgements We would like to express our thanks to reviewers for their valuable comments to help improve our work. Besides, we would like to thank our annotators for their cooperation. References 1. Chen, D., Fisch, A., Weston, J., Bordes, A.: Reading Wikipedia to answer open- domain questions. In: Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). pp. 1870–1879. Associa- tion for Computational Linguistics, Vancouver, Canada (Jul 2017) 2. Chen, Y., Wu, L., Zaki, M.J.: Graphflow: Exploiting conversation flow with graph neural networks for conversational machine comprehension. In: Bessiere, C. (ed.) Proceedings of the Twenty-Ninth International Joint Conference on Artifi- cial Intelligence, IJCAI-20. pp. 1230–1236. International Joint Conferences on Ar- tificial Intelligence Organization (2020). https://doi.org/10.24963/ijcai.2020/171, https://doi.org/10.24963/ijcai.2020/171 3. Choi, E., He, H., Iyyer, M., Yatskar, M., Yih, W.t., Choi, Y., Liang, P., Zettlemoyer, L.: QuAC: Question answering in context. In: Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing. pp. 2174–2184. Association for Computational Linguistics, Brussels, Belgium (2018) 4. Conneau, A., Khandelwal, K., Goyal, N., Chaudhary, V., Wenzek, G., Guzmán, F., Grave, E., Ott, M., Zettlemoyer, L., Stoyanov, V.: Unsupervised cross-lingual representation learning at scale. In: Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. pp. 8440–8451. Association for Com- putational Linguistics, Online (Jul 2020). https://doi.org/10.18653/v1/2020.acl- main.747 5. Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: BERT: Pre-training of deep bidirectional transformers for language understanding. In: Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers). pp. 4171–4186. Association for Computational Linguistics, Minneapolis, Minnesota (Jun 2019). https://doi.org/10.18653/v1/N19-1423 6. Grave, E., Bojanowski, P., Gupta, P., Joulin, A., Mikolov, T.: Learning word vec- tors for 157 languages. In: Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018). European Language Resources Association (ELRA), Miyazaki, Japan (2018) 7. Gupta, S., Rawat, B.P.S., Yu, H.: Conversational machine comprehension: a lit- erature review. In: Proceedings of the 28th International Conference on Compu- tational Linguistics. pp. 2739–2753. International Committee on Computational Linguistics, Barcelona, Spain (Online) (Dec 2020) 8. Huang, H.Y., Choi, E., Yih, W.t.: Flowqa: Grasping flow in history for conversa- tional machine comprehension. arXiv preprint arXiv:1810.06683 (2018)
13 9. Kočiský, T., Schwarz, J., Blunsom, P., Dyer, C., Hermann, K.M., Melis, G., Grefen- stette, E.: The NarrativeQA reading comprehension challenge. Transactions of the Association for Computational Linguistics 6, 317–328 (2018) 10. Labutov, I., Yang, B., Prakash, A., Azaria, A.: Multi-relational question answering from narratives: Machine reading and reasoning in simulated worlds. In: Proceed- ings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). pp. 833–844. Association for Computational Linguistics, Melbourne, Australia (2018) 11. Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.: RACE: Large-scale ReAding comprehension dataset from examinations. In: Proceedings of the 2017 Con- ference on Empirical Methods in Natural Language Processing. pp. 785–794. Association for Computational Linguistics, Copenhagen, Denmark (Sep 2017). https://doi.org/10.18653/v1/D17-1082 12. Nguyen, D.Q., Tuan Nguyen, A.: PhoBERT: Pre-trained language models for Viet- namese. In: Findings of the Association for Computational Linguistics: EMNLP 2020. pp. 1037–1042. Association for Computational Linguistics, Online (Nov 2020). https://doi.org/10.18653/v1/2020.findings-emnlp.92 13. Nguyen, K.V., Tran, K.V., Luu, S.T., Nguyen, A.G.T., Nguyen, N.L.T.: Enhanc- ing lexical-based approach with external knowledge for vietnamese multiple-choice machine reading comprehension. IEEE Access 8, 201404–201417 (2020) 14. Nguyen, K., Nguyen, V., Nguyen, A., Nguyen, N.: A Vietnamese dataset for eval- uating machine reading comprehension. In: Proceedings of the 28th International Conference on Computational Linguistics. pp. 2595–2605. International Committee on Computational Linguistics, Barcelona, Spain (Online) (2020) 15. Qu, C., Yang, L., Chen, C., Qiu, M., Croft, W.B., Iyyer, M.: Open-retrieval conver- sational question answering. In: Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval. pp. 539–548 (2020) 16. Rajpurkar, P., Zhang, J., Lopyrev, K., Liang, P.: SQuAD: 100,000+ questions for machine comprehension of text. In: Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing. pp. 2383–2392. Association for Computational Linguistics, Austin, Texas (2016) 17. Reddy, S., Chen, D., Manning, C.D.: CoQA: A conversational question answering challenge. Transactions of the Association for Computational Linguistics 7, 249– 266 (2019) 18. Rogers, A., Kovaleva, O., Rumshisky, A.: A primer in bertology: What we know about how bert works. Transactions of the Association for Computational Linguis- tics 8, 842–866 (2020) 19. See, A., Liu, P.J., Manning, C.D.: Get to the point: Summarization with pointer- generator networks. In: Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). pp. 1073–1083. Associa- tion for Computational Linguistics, Vancouver, Canada (Jul 2017) 20. Thai, M.D.: Metafunctional profile of the grammar of vietnamese. Language ty- pology: A functional perspective 253, 185–254 (2004) 21. Van Nguyen, K., Van Huynh, T., Nguyen, D.V., Nguyen, A.G.T., Nguyen, N.L.T.: New vietnamese corpus for machine reading comprehension of health news articles. arXiv preprint arXiv:2006.11138 (2020) 22. Zhu, C., Zeng, M., Huang, X.: Sdnet: Contextualized attention-based deep network for conversational question answering. arXiv preprint arXiv:1812.03593 (2018)
You can also read