Conversational Machine Reading Comprehension for Vietnamese Healthcare Texts

Page created by Lori Shelton

Society

English

Like
Share
Embed
Fullscreen
Slides
Download HTML
Download PDF
Abuse

←

→

Page content transcription

If your browser does not render page correctly, please read the page content below

Conversational Machine Reading
                                        Comprehension for Vietnamese Healthcare Texts

                                            Son T. Luu1,2,♣ , Mao Nguyen Bui1,2,+ , Loi Duc Nguyen1,2,+ , Khiem Vinh
                                               Tran1,2,+ , Kiet Van Nguyen1,2,♣? , and Ngan Luu-Thuy Nguyen1,2,♣
                                                 1
                                                      University of Information Technology, Ho Chi Minh City, Vietnam
                                                        2
                                                          Vietnam National University Ho Chi Minh City, Vietnam
arXiv:2105.01542v5 [cs.CL] 2 Jul 2021

                                                     ♣
                                                       {sonlt,kietnv,ngannlt}@uit.edu.vn,+ {16520724, 16521722,
                                                                         17520634}@gm.uit.edu.vn

                                                Abstract. Machine reading comprehension (MRC) is a sub-field in nat-
                                                ural language processing that aims to assist computers understand un-
                                                structured texts and then answer questions related to them. In practice,
                                                the conversation is an essential way to communicate and transfer in-
                                                formation. To help machines understand conversation texts, we present
                                                UIT-ViCoQA, a new corpus for conversational machine reading compre-
                                                hension in the Vietnamese language. This corpus consists of 10,000 ques-
                                                tions with answers over 2,000 conversations about health news articles.
                                                Then, we evaluate several baseline approaches for conversational machine
                                                comprehension on the UIT-ViCoQA corpus. The best model obtains an
                                                F1 score of 45.27%, which is 30.91 points behind human performance
                                                (76.18%), indicating that there is ample room for improvement. Our
                                                dataset is available at our website: http://nlp.uit.edu.vn/datasets/
                                                for research purposes.

                                                Keywords: conversations, question answering, machine reading com-
                                                prehension, deep neural models, texts

                                        1     Introduction

                                        Conversation is a standard method to communicate between people, and it plays
                                        an important role in human daily life. The process of asking a question and
                                        responding to an answer brings helpful information about a specific domain.
                                           Healthcare is one of the most concerning problems for many people. Many
                                        audiences often read the healthcare news, and people tend to discuss frequently
                                        about health and medicine. Thus, based on the conversations about healthcare,
                                        we constructed a corpus named UIT-ViCoQA for conversational question an-
                                        swering on healthcare texts in Vietnamese. The UIT-ViCoQA contains 2,000 con-
                                        versations and 10,000 questions from articles about health news in Vietnamese.
                                        This corpus is used to train the computer for understanding the conversation
                                        and giving the right answers based on the conversation context from questions
                                        ?
                                            Corresponding author: Kiet Van Nguyen. Email: kietnv@uit.edu.vn

2

of users. Besides, we implement neural-based models for conversational question
answering including: DrQA [1], GraphFlow [2], FlowQA [8], and SDNet [22] on
the UIT-ViCoQA corpus. Then, we evaluate the performance of those models
on the UIT-ViCoQA dataset.
    The main contribution in this paper includes providing a corpus for conversa-
tional machine comprehension about healthcare texts in Vietnamese and evaluat-
ing the performance of baseline MRC models on the dataset. Our paper is struc-
tured as described. Section 2 takes a literature review about the conversation
machine comprehension corpora and models. Section 3 provides overview infor-
mation about the UIT-ViCoQA dataset. Section 4 introduces available state-of-
the-art approaches for the conversational machine comprehension task. Section
5 shows our empirical results and error analysis of question-answering models
on the UIT-ViCoQA corpus. Finally, Section 6 concludes our works.

2   Related Works

Machine reading comprehension (MRC) is a challenging task of natural language
processing (NLP) which enables machines to understand the reading text and
answer the questions [16]. Many of MRC corpora are constructed on specific
domains, and open domains in English such as SQuAD [16] (extractive MRC) on
Wikipedia articles, RACE [11] (multiple choices MRC) on High school students
English Exams domain, and NarrativeQA [9] (abstractive MRC) on books and
stories domain. For the Vietnamese language, the UIT-ViQuAD [14] (Wikipedia
domain), and UIT-ViNewsQA [21] (Health news domain) are two extractive
MRC corpora for machine reading comprehension. Besides, the ViMMRC [13] is
the multiple-choice reading comprehension corpus on the Vietnamese students’
textbook for primary schools domain.
    Machine reading comprehension applied in question-answering (QA) systems
is another challenge that the MRC models have to understand both given texts
and conversational context and then answer relevant questions. These questions
are often paraphrased, contain co-reference queries, and their answers can be
spans texts or free-form. This type of MRC is called Conversational Machine
Comprehension (CMC) [7]. CoQA [17] and QuAC [3] are two CMC corpora
in English. Based on the CoQA works, we constructed the UIT-ViCoQA for
automated reading comprehension on the health news articles in the Vietnamese
language.
    Attention-based reasoning with sequence models and FLOW mechanism are
two approaches for CMC models, according to Gupta et al. [7]. DrQA [1] and
PGNet [19] are two neural attention-based models implemented in the CoQA cor-
pus. Next, SDNet is another attention-based model that combines inter-attention
and self-attention to comprehend the conversation context. Finally, FlowQA [8]
and GraphFlow [2] are two flow-based models that used to yield the contextual
information through sequences.

3

3     The Corpus
Our data creation process consisting of three phases is described in Figure 1.
In the first phase, we collect news articles about health from VnExpress3 - the
most read online newspapers in Vietnam by using scrapy4 - a web crawler tool
for collecting articles from the online newspaper. In the next phase, we construct
an annotation tool for creating conversational data. Our annotation tool allows
two annotators to create the conversation based on the given articles. Finally, in
the third phase, we hire a team of annotators who create data on our annotation
tool. The detailed steps from the annotation process are described below.

3.1    Data collection

                 Create data                          Crawler
                  crawler

                                                  Collected data
               Phase 1: Data collection

                Create data               Annotation
                annotation                  tool
                   tool

               Phase 2: Building
               annotation tool

                     Data                                                      Data
                                             Data               Annotators
                   creation                                                  annotators
                                           creation                            hiring
                   process

                                          UIT-ViCoQA
                                            Corpus

                Phase 3: Data creation

            Fig. 1: The creation process of the UIT-ViCoQA corpus.

For each conversation (C), we hire two different annotators, which are question-
ers and answerers, respectively. The questioner goes first by asking a question
(Q). The question is sent to the answerer then. After receiving the question, the
3
    https://vnexpress.net/suc-khoe
4
    https://scrapy.org/

answerer gives the answer by selecting a span of text from the article (S) and
then submits the natural answer (A). Next, the annotation system compares
the answer given by the answerer with the asked question of the questioner by
character level. If the given answer matches about 70% with the asked question,
it is a valid answer, and two annotators can move to the next turn. In contrast,
the answerer must give another answer. There is a total of five turns for asking
and answer per article.
In the data creation process, we have some requirements for questioners and
answerers as: (1) The answers must be extracted from the article. Questions that
cannot be answered according to the article are not allowed, (2) Questioners are
encouraged to give questions with synonyms, opposite words, and coreference,
and (3) The answers should be short and limited to use new words from the
article content. Moreover, the selected answerers need to give full answers with
complete texts, correct syntax, and punctuations.

3.2 Dataset overview

Table 1: An example of conversation in the UIT-ViCoQA corpus.
Trạng thái "ngủ" là cách các tế bào ngay lập tức thay đổi để kháng lại phương pháp điều trị. Các
phương pháp điều trị ung thư vú thường thành công, tuy nhiên một số trường hợp ung thư tái phát và
tiên lượng xấu hơn. Ông Luca Magnani, Khoa Dược, Đại học Hoàng Gia London, Anh, cho biết phương
pháp điều trị bằng hormone hiện được sử dụng cho phần lớn bệnh nhân ung thư vú ... (The status of
"sleep" is the way when the cell changes immediately to resist treatment. The treatment methods of
breast cancer are often successful. However, some cases of cancer recur, and the prognosis worsens. Mr.
Luca Magnani, Faculty of Medicine, Imperial College London, says that the treatment method by using
hormones is used for a huge amount of breast cancer patients ... )
Q1 Phương pháp thường được sử dụng để chữa trị ung thư vú là gì ? (What is the treatment method
usually use for breast cancer treatment?)
S1 Ông Luca Magnani, Khoa Dược, Đại học Hoàng Gia London, Anh, cho biết phương pháp điều trị
bằng hormone hiện được sử dụng cho phần lớn bệnh nhân ung thư vú . (Mr. Luca Magnani, Faculty
of Medicine, Imperial College London, says that treatment method by using hormone is used for a
huge amount of breast cancer patients.)
A1 điều trị bằng hormone (using hormone)
Q2 Các bác sĩ có lo ngại gì về phương pháp này? (What are doctors concerned about for this treatment?)
S2 Từ lâu, các nhà khoa học đã đặt câu hỏi, liệu pháp này thực chất có tiêu diệt được các tế bào ung thư
vú không, hay chỉ là chuyển các tế bào sang trạng thái "ngủ yên". (Scientists have long questioned
whether this therapy actually kills breast cancer cells, or just puts the cells in an "inactive" state.)
A2 nó đưa các tế bào ung thư sang trạng thái "ngủ yên" (This treatment puts the cells in an "inactive"
state)
Q3 Vậy những nghiên cứu này có ý nghĩa như thế nào? (What profits from these studies?)
S3 cũng giải thích rằng những phát hiện hiện tại sẽ mở ra lộ trình mới cho việc nghiên cứu chữa trị ung
thư. (explaining that current works can open new future researchs about cancer treatments)
A3 mở ra lộ trình mới cho việc nghiên cứu chữa trị ung thư (Opening new research for cancer treatments)

The UIT-ViCoQA corpus contains 2,000 conversations. Each conversation con-
sists of a reading article and five question-answer pairs. We follow the structure
of the CoQA [17] for our dataset. According to Table 1, to answer question
Q2, the answerer needs to read the passage and looks back to question Q1 and

answer A1 to retrieve the relevant information. Similar to question Q2, the an-
swerer needs to read the reading passage and two previous question-answer pairs
(Q1, A1) and (Q2, A2) to extract the answer A3. The chain of question-answer
pairs Q1-A1, Q2-A2 is the history of the conversation.
Table 2 provides the overview of the UIT-ViCoQA corpus and compares it
with the CoQA corpus. The result illustrates that although the number of ques-
tions and answers in the UIT-ViCoQA corpus is lower than the CoQA corpus, the
average number of words in the UIT-ViCoQA dataset is larger than the CoQA
dataset. This is because the interrogative words in English contain a single word
(e.g., who?, when?, and why?) while they may have two words in Vietnamese.
For example, the words "who" means "ai", "when" means "khi nào" and "why"
means "tại sao". Besides, the UIT-ViCoQA is constructed on a specific domain.
Hence it is not as diverse as the CoQA corpus.

Table 2: Overview information about the UIT-ViCoQA and CoQA corpus.
UIT-ViCoQA CoQA
Domain text Health domain Diverse domains
Number of passages 2,000 8,399
Number of questions 10,000 127,000
Passage length 404.1 271.0
Question length 9.4 5.5
Answer length 9.7 2.7

3.3 Dataset analysis

Table 3: The types of question in the UIT-ViCoQA corpus.
Question Ratio
Example
types (%)
What trans fat là gì? (what is trans fat?) 32.6

How many Vietnam có bao nhiêu ca nhiễm COVID-19? (How many cases of COVID 19 are 17.2
detected in Vietnam?)
How DCVax hoạt động như thế nào? (How does DCVax work?) 7.6
Yes/No Có tiền sử bị bệnh gì không ? (Have a history of any illness?) 6.6
Who Những người nào dễ bị xơ gan? (Who is susceptible to cirrhosis?) 9.0

Why Vì sao nhang có thể ảnh hưởng xấu tới cơ thể? (Why incense can adversely affect 7.8
the body?)
Which Nhóm nào chiếm tỉ lệ cao nhất? (Which group accounts for the highest percentage?) 7.0
When Khi nào thì cô có thể kết thúc điều trị? (When can she finish treatment?) 2.6
Where Zhou Xiaoying sinh sống ở đâu? (Where does Zhou Xiaoying live?) 4.0

Others Còn du thuyền Diamond Princess? Kể tên một số quốc gia có số mắc cao (About the 5.6
Diamond Princess yacht? Name a few countries with high risk?)

6

In Vietnamese, the process of interaction contains statements between two peo-
ple. Each statement contains two functional elements, including the negotiatory
for carrying the argument in statements that go through the conversation and
the remainder to keep the rest information of statements [20]. The negotiatory is
an essential part of the statement in the conversation. The negotiatory element
comprises interrogatives particles, element interrogatives items, and imperative
particles. The interrogatives are the characteristic of questions. In Table 3, we
show all kinds of questions in Vietnamese that are usually used in daily life. The
interrogative words are marked bold in the sentence. According to Table 3, the
"What" type accounts for the highest ratio in the UIT-ViCoQA corpus (32.6%).

            Table 4: Linguistic phenomena in UIT-ViCoQA questions.
Phe-                                                                                               Ratio
                                                 Example
nomenon                                                                                             (%)
                           Relationship between a question and its passage
           Q: Ai làm giám đốc quốc gia của Hiệp hội Sảy thai? (Who is the director of the
           association of miscarriage?)
Lexical
           A: Ruth Bender - Atik                                                                    47.6
match
           S: Ruth Bender - Atik, giám đốc quốc gia của Hiệp hội Sảy thai (Ruth Bender -
           Atik, national director of the association of miscarriage)
           Q: Giá cho mỗi con robot là bao nhiêu? (How much is the price of each robot?)
Paraphras- A: 500000 RMB
                                                                                                    48.0
ing        S: Các robot có giá 500000 RMB (khoảng 72000 USD) (Robots have price 500000
           RBM, about 72000 USD)
           Q: Vì sao? (Why?)
           A: Do sầu riêng chứa nhiều chất dinh dưỡng, nhiều năng lượng, cộng với cồn nồng
           độ cao làm cho nhịp tim tăng (Because durian contains lots of nutrients, energy,
           combining with high concentration of alcohol, make heartbeat increase.)
Pragmatics                                                                                           4.4
           S: Chuyên gia dinh dưỡng Nguyễn Mộc Lan cho biết sầu riêng nhiều chất dinh
           dưỡng, nhiều năng lượng, cộng với rượu nồng độ cao làm cho nhịp tim tăng.
           (Nutritionist Nguyen Moc Lan said durian has a lot of nutrients, lots of energy, plus
           a high concentration of alcohol makes your heart rate increase.)
                     Relationship between a question and its conversation history
No          Q: Phô mai có giá trị dinh dưỡng thế nào? (How does cheese have nutritional
                                                                                                    73.6
coreference value?)

            Q1: Loại bệnh nào Tiểu Lý mắc phải từ ban đầu? (What kind of illness was Tieu
Explicit    Ly initially? )
                                                                                                    20.6
coreference A1: bệnh lao phổi (tuberculosis)
            Q2: Anh ta chữa bệnh trong thời gian bao lâu? (How long does he treat?)

            Q1: Ở Hải Phòng bệnh nhân từ đâu trở về? (Where does the patient come from in
Implicit    Hai Phong?)
                                                                                                     5.8
coreference A1: Quảng Đông (Guangdong)
            Q2: Hiện có triệu chứng gì? (What symptoms are there?)

   Next, we randomly divide our corpus into training, development, and test sets
with proportions 70%, 15%, and 15%, respectively. Then, we take 100 articles
by random from the development set to analyze and evaluate the corpus, which

7

is called analysis set [17]. We segment texts in the corpus by the Underthesea
framework5 .
    According to Gupta et al. [7], the Conversational Machine Comprehension
(CMC) model answers the question by extracting information not only from the
reading texts but also from conversational history. Therefore, the main linguistic
phenomena in the UIT-ViCoQA are based on the relationship between questions
and the reading passage and the relationship between questions and the conver-
sation history. Table 4 displays the linguistic phenomena in the UIT-ViCoQA
corpus.
    For the relationship between questions and the reading texts, there are three
types of phenomena: lexical match, paraphrasing, and pragmatic. The lexical
match indicates that the questions contain the same words as the reading texts.
In contrast, paraphrasing is the question in which their words use synonyms
from the reading texts, and pragmatic means the question uses words that do
not relate to the reading texts. The proportions of lexical match, paraphrasing,
and pragmatic phenomenon in the UIT-ViCoQA corpus are 47.6%, 48.0%, and
4.4%, respectively, as shown in Table 4.
    In addition, for the relationship between questions and the conversation his-
tory, there are three types of relational phenomena: no coreference, explicit coref-
erence, and implicit coreference. The percentages of no coreference, explicit coref-
erence, and implicit coreference in the UIT-ViCoQA corpus are 73.6%, 20.6%,
and 5.8%, respectively, according to Table 4.

4     Methodologies
According to Gupta et al. [7], a typical conversation reading comprehension
task consists of reading passage as context (C), the conversation history (H) in-
cludes multiple question-answer pairs, and the generated answers (A). Therefore,
this task combines two models: the machine reading comprehension model for
encoding the questions and context into neural space vectors and the question-
answering model to generate and decode answers from questions to natural lan-
guage.
    For the machine reading comprehension model, the Document Reader (DrQA)
introduced by Chen et al. [1] is a powerful model on various of machine reading
comprehension corpora such as: SQuAD [16], TextWorldsQA [10], and UIT-
ViQuAD [14]. The DrQA model consists of two modules: Document Retriever
and Document Reader. We use the Document Reader of the DrQA to extract
the answer spans for the questions.
    Besides, for the conversational comprehension task, the generated answers
are not only from the reading passage but also the conversation history. The
model extracts the history of conversations as a special context to generate
new answers. SDNeT model [22] is a contextual attention-based model based
on the idea of DrQA with a special mechanism to extract the context of the
conversation.
5
    https://github.com/undertheseanlp/underthesea

8

    Furthermore, The FLOW mechanism enables the MRC models to encode the
history of the conversation comprehensively. Hence, this mechanism integrates
well the latent semantic of the conversation history. FlowQA [8] and GraphFlow
[2] are two flow-based neural models that grasping the conversational history
context to generate answers.

5     Experiments

5.1     Data preparation

We pre-process the data before fitting to the model by these following steps:
(1) Removing special characters and stop words, (2) Segmenting sentences into
words by using the Underthesea tool, and (3) Transforming the texts into vectors
by using fastText word embedding in the Vietnamese language provided by
Grave et al. [6]. The dimension of fastText word embedding is 300.

5.2     Evaluation metrics

We evaluate the performance of the models by comparing the generated answers
with the accurate answers on F1-score and Exact match (EM) score. The F1-
score measures the right predicted answers comparing with the correct answers.
The EM score measures the exact matching of prediction answers with original
answers [16].

5.3     Experiment results

The FLOW models give optimistic results on the UIT-ViCoQA corpus. Accord-
ing to Table 5, FlowQA obtains the highest result by F1-score on both develop-
ment and test sets. For the EM score, the SDNet model gives the highest results.
However, there is a large gap between the F1 and the EM scores as well as the
performance of CMC models and human performance.

           Table 5: Experimental results on the UIT-ViCoQA corpus.
                                  EM (%)                  F1-score (%)
           Model
                              Dev        Test            Dev        Test
    DrQA                         13.17       13.50          43.28       37.71
    SDNet                       15.40       15.60           41.90       40.50
    FlowQA                       13.13       12.53         44.84       45.27
    GraphFlow                    13.77       14.73          44.69       45.16
    Human performance           35.67       38.66          73.33       76.18

Table 6: The answers predicted by models on a sample in the UIT-ViCoQA
corpus
Tính đến ngày 18/2, Việt Nam có 16 ca nhiễm covid-19. Trong đó, Vĩnh Phúc có tới 5 công nhân và 6
người thân của họ bị lây nhiễm. Con số này khiến các doanh nghiệp đặt ra câu hỏi về nguy cơ lây lan
virus khó lường trong môi trường doanh nghiệp. Chỉ cần một trường hợp phát hiện nhiễm Covid-19 là
cả văn phòng, phân xưởng tiếp xúc với người bệnh sẽ phải cách ly cô lập, gây gián đoạn hoạt động sản
xuất kinh doanh, tạo áp lực lên hệ thống y tế công. Ông Đoàn Đình Duy Khương - Tổng Giám đốc điều
hành Dược Hậu Giang về vấn đề bảo vệ sức khỏe lao động cho biết, mỗi ngày họ phải dành hơn 1/3 thời
gian cho nơi làm việc ... (Up to 18/2, Vietnam has 16 affected cases of covid-19. Specifically, Vinh Phuc
has 5 workers and 6 relatives of whom are affected. This number makes the enterprises question about the
risk of virus spreading in working environment. If only one case is detected to be affected Covid-19, the
whole offices, factories which are contacted with the patients will be quarantined, disrupting production
and business activities, and putting pressure on the public health system. Mr Đoàn Đình Duy Khương -
General director of Hau Giang Pharmacy about protecting labor health affairs, says that, everyday they
have to spend more than 1/3 of their time at work ... )
Q1 Việt Nam có bao nhiêu ca nhiễm tính đến 18/2? (How many affected cases of Vietnam have
been counted up to 2/18?)
Original Tính đến ngày 18/2, Việt Nam có 16 ca nhiễm covid-19. (Up to 18/2, Vietnam has 16 affected
cases of covid-19 )
DrQA 16
SDNet 16 ca nhiễm covid-19. (16 affected cases of covid-19)
FlowQA 16 ca (16 cases)
GraphQA 16 ca nhiễm covid-19 (16 affected cases of covid-19)
Q2 Cụ thể? (Detail?)
Original Trong đó, Vĩnh Phúc có tới 5 công nhân và 6 người thân của họ bị lây nhiễm. (Specifically, Vinh
Phuc has 5 workers and 6 relatives of whom are affected.)
DrQA Vĩnh Phúc có tới 5 công nhân và 6 người thân của họ bị lây nhiễm. (Vinh Phuc has 5 workers
and 6 relatives of whom are affected.)
SDNet Trong đó, Vĩnh Phúc có tới 5 công nhân và 6 người thân của họ bị lây nhiễm. (Vinh Phuc has
5 workers and 6 relatives of whom are affected.)
FlowQA Trong đó, Vĩnh Phúc có tới 5 công nhân và 6 người thân của họ bị lây nhiễm. (Vinh Phuc has
5 workers and 6 relatives of whom are affected.)
GraphQA Trong đó, Vĩnh Phúc có tới 5 công nhân và 6 người thân của họ bị lây nhiễm. (Vinh Phuc has
5 workers and 6 relatives of whom are affected.)
Q3 Doanh nghiệp nghĩ gì? (What does the enterprise think about?)
Original Con số này khiến các doanh nghiệp đặt ra câu hỏi về nguy cơ lây lan virus khó lường trong
môi trường doanh nghiệp.(This number makes the enterprises question about the risk of virus
spreading in working environment.)
DrQA doanh nghiệp đặt ra câu hỏi về nguy cơ lây lan virus khó lường trong môi trường doanh
nghiệp.(the enterprise questions about the risk of virus spreading in working environment.)
SDNet ca nhiễm covid-19. (affected cases of covid-19)
FlowQA các doanh nghiệp đặt ra câu hỏi về nguy cơ lây lan virus khó lường trong môi trường doanh
nghiệp.. (the enterprises question about the risk of virus spreading in working environment.)
GraphQA virus khó lường trong môi trường doanh nghiệp. (The virus is dangerous in working environment)
Q4 Nguy cơ là gì? (What is risk?)
Original Chỉ cần một trường hợp phát hiện nhiễm Covid-19 là cả văn phòng, phân xưởng tiếp xúc với
người bệnh sẽ phải cách ly cô lập, gây gián đoạn hoạt động sản xuất kinh doanh, tạo áp lực
lên hệ thống y tế công. (If only one case is detected to be affected Covid-19, the whole offices,
factories which are contacted with the patients will be quarantined, disrupting production and
business activities, and putting pressure on the public health system.)
DrQA Chỉ cần một trường hợp phát hiện nhiễm Covid-19 là cả văn phòng. (If only one case is detected
to be affected Covid-19, the whole offices.)
SDNet khó phát triển bền vững.
FlowQA Chỉ cần một trường hợp phát hiện nhiễm Covid-19 là cả văn phòng. (If only one case is detected
to be affected Covid-19, the whole offices.)
GraphQA Chỉ cần một trường hợp phát hiện nhiễm Covid-19 là cả văn phòng. (If only one case is detected
to be affected Covid-19, the whole offices.)
Q5 Đoàn Đình Duy Khương là ai? (Who is Đoàn Đình Duy Khương?)
Original Ông Đoàn Đình Duy Khương - Tổng Giám đốc điều hành Dược Hậu Giang. (Mr. Đoàn Đình
Duy Khương - General director of Hau Giang Pharmacy)
DrQA Ông Đoàn Đình Duy Khương - Tổng Giám đốc điều hành Dược Hậu Giang về vấn đề bảo vệ sức
khỏe lao động cho biết. (Mr Đoàn Đình Duy Khương - General director of Hau Giang Pharmacy
about protecting labor health affairs says)
SDNet khó phát triển bền vững. (hard to develop stably)
FlowQA Ông Đoàn Đình Duy Khương - Tổng Giám đốc điều hành Dược Hậu Gia. (Mr Đoàn Đình Duy
Khương - General director of Hau Gia)
GraphQA Ông Đoàn Đình Duy Khương - Tổng Giám đốc điều hành Dược Hậu Giang về vấn đề bảo vệ sức
khoẻ. (Mr Đoàn Đình Duy Khương - General director of Hau Giang Pharmacy about protecting
health affairs)

10

5.4            Error analysis
Table 6 shows the predicted answers given by four different models, including
DrQA, SDNet, FlowQA, and GraphFlow, respectively. In general, FlowQA and
GraphFlow give the most relevant answer as the original answer. For example, in
the question Q3 - "What the enterprise think about?", the reader needs to look
back to the previous question-answer Q1-A1 and Q2-A2 to inference the context
about the "affected cases of COVID-19" (Q1) and the "detailed of affected cases"
(Q2). GraphFlow and FlowQA offer the most relevant answer than DrQA for
the question Q3. For question Q5, GraphFlow provides the most relevant answer
about the person mentioned in the reading passage, while other models give the
answer with redundant information in comparison with the original answer. For
the question Q4, both four models cannot give the exact answer. This is due to
the ambiguity of Vietnamese interrogative words in questions where it is written
in the genuine and non-genuine form. For example, the question Q2: "Cụ thể?"
can be understood as "What is the detail?" or "How it happened?". Besides,
the question Q4: "Nguy cơ là gì?" can be understood as "What is the risk?" or
"How bad is the risk?". This is known as the MOOD in the Vietnamese. The
interrogative clause in Vietnamese consists of two main elements: the negotiatory
and the remainders. The negotiatory carries the centroid of the interaction. This
aspect of Vietnamese interrogative is described carefully by Thai [20].

                                            35.12
      Ratio of right answers (%)

                                   30

                                   20
                                                     14.55
                                                              11.7
                                   10                                    8.19
                                                                                  7.02     6.86
                                                                                                     5.02     4.35              4.01
                                                                                                                       3.18

                                    0
                                                      y

                                                                                         hy

                                                                                                     o
                                                                     ow
                                            t

                                                             ho

                                                                                  ch

                                                                                                               n

                                                                                                                       re

                                                                                                                                rs
                                        ha

                                                    an

                                                                                                  /N

                                                                                                            he

                                                                                                                               e
                                                                                                                     he
                                                                                       W
                                                                                hi
                                                          W

                                                                     H

                                                                                                                            th
                                        W

                                                m

                                                                                                         W
                                                                                                 s
                                                                           W

                                                                                                                   W
                                                                                              Ye

                                                                                                                            O
                                             ow
                                            H

                Fig. 2: The impact of question types on the performance of models.

    In addition, we study the ability of the models for retrieving correct answers
based on the type of questions on the development set. Figure 2 shows the ratio
of correct answers by different kinds of questions in the UIT-ViCoQA corpus. A
question gives the right answers if the F1-score is greater than 70%. According
to Figure 2, the question type "What" has the highest ratio, which is 35.12%.

Besides, the question type "What" accounts for 32.6% as described in Table 3.
Therefore, the models mostly give the correct answers to this kind of question.
Furthermore, the question types "How many" and "Who" also have a high ratio.

Table 7: Types of predicted answer given by the models.
Ratio
Types Description Example
(%)
Matching The predicted answers fully Q: Việc này có giúp tình trạng tốt lên không? (Does
this help improve the condition?) 16.73
answers match with truth answers
P: Không (No)
A: Không (No)

Free-form The predicted answer only Q:Tỷ lệ ung thu Việt Nam có cao không? (Is the
rate of cancer in Vietnam high?) 59.93
answers match the a part of truth
answers P: cao (High)
A: có (Yes)

Wrong Q: Béo phì có gây dậy thì sớm không? (Does obesity
The predicted answer does 23.27
answers cause early puberty?)
not match the truth answer
P: Không (No)
A: Có (Yes)

Finally, we analyze the predicted answers on the development set. According
to Table 7, there are three types of the answer given by the models, and most
of the predicted answers are concentrated on the free-form type, which accounts
for 59.93%. This is why the F1 and EM scores have a considerable difference, as
described in Table 5.
In general, most error predictions are due to the number of questions and the
variety of answers, as well as the linguistic phenomena. Therefore, it is necessary
to increase the number of questions and the question types as well as enriching
answers to make the corpus more diverse.

6 Conclusion and future work
In this paper, we propose the dataset about machine reading comprehension for
healthcare texts in Vietnamese. This dataset includes 2,000 health articles with
10,000 questions. We also conduct experiments on several baseline models, and
the best result in the F1-score is 45.27%. Nevertheless, the difference between
F1 and EM scores is large. This is due to the linguistic phenomena about the
Vietnamese interrogative particles and the limited answers. Therefore, it is nec-
essary to increase the number of questions and answers as well as make questions
and answers more diverse in further research. Besides, enabling the CMC models
to capture and understand the contextual meaning of the conversation history
is also a challenging task in the conversational machine reading comprehension
model researching.
In future, we plan to increase the quantity and quality of the UIT-ViCoQA
corpus as well as to conduct further experiments on deep learning and transfer

12

learning using pre-trained language models [4, 5, 12, 18] to enhance the perfor-
mance of CMC models on the UIT-ViCoQA corpus. Inspired by the conversa-
tional question answering system [15], we suggest using this model and UIT-
ViCoQA for building Vietnamese conversational question answering systems.

Acknowledgements We would like to express our thanks to reviewers for their
valuable comments to help improve our work. Besides, we would like to thank
our annotators for their cooperation.

References

 1. Chen, D., Fisch, A., Weston, J., Bordes, A.: Reading Wikipedia to answer open-
    domain questions. In: Proceedings of the 55th Annual Meeting of the Association
    for Computational Linguistics (Volume 1: Long Papers). pp. 1870–1879. Associa-
    tion for Computational Linguistics, Vancouver, Canada (Jul 2017)
 2. Chen, Y., Wu, L., Zaki, M.J.: Graphflow: Exploiting conversation flow with
    graph neural networks for conversational machine comprehension. In: Bessiere, C.
    (ed.) Proceedings of the Twenty-Ninth International Joint Conference on Artifi-
    cial Intelligence, IJCAI-20. pp. 1230–1236. International Joint Conferences on Ar-
    tificial Intelligence Organization (2020). https://doi.org/10.24963/ijcai.2020/171,
    https://doi.org/10.24963/ijcai.2020/171
 3. Choi, E., He, H., Iyyer, M., Yatskar, M., Yih, W.t., Choi, Y., Liang, P., Zettlemoyer,
    L.: QuAC: Question answering in context. In: Proceedings of the 2018 Conference
    on Empirical Methods in Natural Language Processing. pp. 2174–2184. Association
    for Computational Linguistics, Brussels, Belgium (2018)
 4. Conneau, A., Khandelwal, K., Goyal, N., Chaudhary, V., Wenzek, G., Guzmán,
    F., Grave, E., Ott, M., Zettlemoyer, L., Stoyanov, V.: Unsupervised cross-lingual
    representation learning at scale. In: Proceedings of the 58th Annual Meeting of the
    Association for Computational Linguistics. pp. 8440–8451. Association for Com-
    putational Linguistics, Online (Jul 2020). https://doi.org/10.18653/v1/2020.acl-
    main.747
 5. Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: BERT: Pre-training of deep
    bidirectional transformers for language understanding. In: Proceedings of the 2019
    Conference of the North American Chapter of the Association for Computational
    Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers).
    pp. 4171–4186. Association for Computational Linguistics, Minneapolis, Minnesota
    (Jun 2019). https://doi.org/10.18653/v1/N19-1423
 6. Grave, E., Bojanowski, P., Gupta, P., Joulin, A., Mikolov, T.: Learning word vec-
    tors for 157 languages. In: Proceedings of the Eleventh International Conference on
    Language Resources and Evaluation (LREC 2018). European Language Resources
    Association (ELRA), Miyazaki, Japan (2018)
 7. Gupta, S., Rawat, B.P.S., Yu, H.: Conversational machine comprehension: a lit-
    erature review. In: Proceedings of the 28th International Conference on Compu-
    tational Linguistics. pp. 2739–2753. International Committee on Computational
    Linguistics, Barcelona, Spain (Online) (Dec 2020)
 8. Huang, H.Y., Choi, E., Yih, W.t.: Flowqa: Grasping flow in history for conversa-
    tional machine comprehension. arXiv preprint arXiv:1810.06683 (2018)

13

 9. Kočiský, T., Schwarz, J., Blunsom, P., Dyer, C., Hermann, K.M., Melis, G., Grefen-
    stette, E.: The NarrativeQA reading comprehension challenge. Transactions of the
    Association for Computational Linguistics 6, 317–328 (2018)
10. Labutov, I., Yang, B., Prakash, A., Azaria, A.: Multi-relational question answering
    from narratives: Machine reading and reasoning in simulated worlds. In: Proceed-
    ings of the 56th Annual Meeting of the Association for Computational Linguistics
    (Volume 1: Long Papers). pp. 833–844. Association for Computational Linguistics,
    Melbourne, Australia (2018)
11. Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.: RACE: Large-scale ReAding
    comprehension dataset from examinations. In: Proceedings of the 2017 Con-
    ference on Empirical Methods in Natural Language Processing. pp. 785–794.
    Association for Computational Linguistics, Copenhagen, Denmark (Sep 2017).
    https://doi.org/10.18653/v1/D17-1082
12. Nguyen, D.Q., Tuan Nguyen, A.: PhoBERT: Pre-trained language models for Viet-
    namese. In: Findings of the Association for Computational Linguistics: EMNLP
    2020. pp. 1037–1042. Association for Computational Linguistics, Online (Nov
    2020). https://doi.org/10.18653/v1/2020.findings-emnlp.92
13. Nguyen, K.V., Tran, K.V., Luu, S.T., Nguyen, A.G.T., Nguyen, N.L.T.: Enhanc-
    ing lexical-based approach with external knowledge for vietnamese multiple-choice
    machine reading comprehension. IEEE Access 8, 201404–201417 (2020)
14. Nguyen, K., Nguyen, V., Nguyen, A., Nguyen, N.: A Vietnamese dataset for eval-
    uating machine reading comprehension. In: Proceedings of the 28th International
    Conference on Computational Linguistics. pp. 2595–2605. International Committee
    on Computational Linguistics, Barcelona, Spain (Online) (2020)
15. Qu, C., Yang, L., Chen, C., Qiu, M., Croft, W.B., Iyyer, M.: Open-retrieval conver-
    sational question answering. In: Proceedings of the 43rd International ACM SIGIR
    Conference on Research and Development in Information Retrieval. pp. 539–548
    (2020)
16. Rajpurkar, P., Zhang, J., Lopyrev, K., Liang, P.: SQuAD: 100,000+ questions
    for machine comprehension of text. In: Proceedings of the 2016 Conference on
    Empirical Methods in Natural Language Processing. pp. 2383–2392. Association
    for Computational Linguistics, Austin, Texas (2016)
17. Reddy, S., Chen, D., Manning, C.D.: CoQA: A conversational question answering
    challenge. Transactions of the Association for Computational Linguistics 7, 249–
    266 (2019)
18. Rogers, A., Kovaleva, O., Rumshisky, A.: A primer in bertology: What we know
    about how bert works. Transactions of the Association for Computational Linguis-
    tics 8, 842–866 (2020)
19. See, A., Liu, P.J., Manning, C.D.: Get to the point: Summarization with pointer-
    generator networks. In: Proceedings of the 55th Annual Meeting of the Association
    for Computational Linguistics (Volume 1: Long Papers). pp. 1073–1083. Associa-
    tion for Computational Linguistics, Vancouver, Canada (Jul 2017)
20. Thai, M.D.: Metafunctional profile of the grammar of vietnamese. Language ty-
    pology: A functional perspective 253, 185–254 (2004)
21. Van Nguyen, K., Van Huynh, T., Nguyen, D.V., Nguyen, A.G.T., Nguyen, N.L.T.:
    New vietnamese corpus for machine reading comprehension of health news articles.
    arXiv preprint arXiv:2006.11138 (2020)
22. Zhu, C., Zeng, M., Huang, X.: Sdnet: Contextualized attention-based deep network
    for conversational question answering. arXiv preprint arXiv:1812.03593 (2018)

You can also read