Probing Prior Knowledge Needed in Challenging Chinese Machine Reading Comprehension
←
→
Page content transcription
If your browser does not render page correctly, please read the page content below
Probing Prior Knowledge Needed in Challenging Chinese Machine Reading Comprehension Kai Sun1∗ Dian Yu2 Dong Yu2 Claire Cardie1 1 Cornell University, Ithaca, NY 2 Tencent AI Lab, Bellevue, WA ks985@cornell.edu, {yudian, dyu}@tencent.com, cardie@cs.cornell.edu Abstract number of questions that require prior knowledge in addition to the given context (Richardson et al., With an ultimate goal of narrowing the gap 2013; Mostafazadeh et al., 2016; Lai et al., 2017; between human and machine readers in text arXiv:1904.09679v2 [cs.CL] 30 Apr 2019 comprehension, we present the first collec- Ostermann et al., 2018; Khashabi et al., 2018; Tal- tion of Challenging Chinese machine read- mor et al., 2018; Sun et al., 2019a). Therefore, ing Comprehension datasets (C3 ) collected they can serve as good test-beds for evaluating from language and professional certification progress towards goals of teaching machine read- exams, which contains 13,924 documents ers to use different kinds of prior knowledge for and their associated 23,990 multiple-choice better text comprehension and narrowing the per- questions. Most of the questions in C3 formance gap between human and machine read- cannot be answered merely by surface-form matching against the given text. ers in real-world settings such as language or sub- ject examinations. However, progress on these As a pilot study, we closely analyze the kind of tasks is mostly limited to English (Storks prior knowledge (i.e., linguistic, domain- specific, and general world knowledge) et al., 2019) because of the unavailability of large- needed in these real-world reading com- scale datasets in other languages. prehension tasks. We further explore how To study the prior knowledge needed to to leverage linguistic knowledge including better comprehend written and oral texts in a lexicon of idioms and proverbs, graphs of general world knowledge (e.g., Concept- Chinese, we propose the first collection of Net), and domain-specific knowledge such Challenging Chinese multiple-choice machine as textbooks to aid machine readers, through reading Comprehension datasets (C3 ) that contain fine-tuning a pre-trained language model. both general and domain-specific tasks. For the Experimental results demonstrate that lin- general-domain task: Given a reference document guistic and general world knowledge may that can be either written or oral (i.e., a dialogue), help improve the performance of the base- select the correct answer option from all options line reader in both general and domain- specific tasks. C3 will be available at associated with a question. Besides, we present a http://dataset.org/c3/. challenging task that has not been explored in the literature: Given a counseling oral text (a third- person narrative or a dialogue) mostly about life 1 Introduction concerns and an additional domain-specific refer- Machine reading comprehension (MRC) tasks, ence corpus, select the correct answer option(s) which aim to teach machine readers to read and from associated options of a question. Compared understand a reference material (e.g., a document), to relevant datasets (Ostermann et al., 2018; Sun and evaluate the comprehension ability of ma- et al., 2019a), besides the oral text, we also need chines by letting them answer questions relevant additional domain-specific knowledge for answer- to the given content (Poon et al., 2010; Richardson ing questions. However, it is relatively difficult to et al., 2013), have attracted substantial attention of link the content in less formal oral language to the both academia and industry. corresponding well-written facts, explanations, or An increasing number of studies focus on de- definitions in the domain-specific reference cor- veloping MRC datasets that contain a significant pus. For all the mentioned tasks, we collect ques- ∗ Part of this work was conducted when K. S. was an tions from language and professional certification intern at the Tencent AI Lab, Bellevue, WA. exams designed by experts (Section 3.1).
We observe three kinds of prior knowledge are et al., 2017; Rajpurkar et al., 2018) or conversa- required for in-depth understanding of the writ- tional (Reddy et al., 2018; Choi et al., 2018) ques- ten and oral texts to answer most of the questions tions that might require reasoning (Zhang et al., in both general and domain-specific reading com- 2018a), designing free-form answers (Kočiskỳ prehension tasks: linguistic knowledge, domain et al., 2018) or (question, answer) pairs that knowledge, and general world knowledge that is cover the content of multiple sentences or docu- further broken down into eight types such as arith- ments (Welbl et al., 2018; Yang et al., 2018). Still, metic, connotative, and cause-effect (Section 3.2). questions usually provide sufficient information to Around 86.8% of general-domain questions and find answers in the given context. ALL the domain-specific questions require knowl- There are also a variety of non-extractive ma- edge beyond the given context. We further inves- chine reading comprehension (Richardson et al., tigate the utilization of linguistic knowledge in- 2013; Mostafazadeh et al., 2016; Lai et al., 2017; cluding a lexicon of common Chinese idioms and Khashabi et al., 2018; Talmor et al., 2018; Sun proverbs (Section 4.2), general world knowledge et al., 2019a) and question answering tasks (Clark in the form of graphs (Speer et al., 2017) (Sec- et al., 2016, 2018; Mihaylov et al., 2018), mostly tion 4.3), and domain-specific knowledge such in multiple-choice forms. For question answer- as textbooks to improve the comprehension abil- ing tasks, there is no reference document provided ity of machine readers, via fine-tuning a pre- for each question. Instead, a reference corpus is trained language model (Devlin et al., 2019) (Sec- provided, which contains a collection of domain- tion 4.1). Experimental results show that general- specific textbooks or/and related encyclopedia ar- domain lexicons and general world knowledge ticles. For these tasks, besides the given reference graph generally improve the baseline performance documents or corpora, knowledge from other re- on both general and domain-specific tasks. Exper- sources may be necessary to solve a significant iments also demonstrate that for domain-specific percentage of questions. questions, typical methods such as enriching the Besides these standard tasks, we are aware that given text with retrieved sentences from addi- there is a trend of formalizing tasks such as re- tional domain-specific corpora actually hurt the lation extraction (Levy et al., 2017), word pre- performance of the baseline, which indicates the diction (Chu et al., 2017), and judgment pre- great challenge in finding external knowledge rel- diction (Long et al., 2018) as extractive or non- evant to informal oral texts (Section 5.2). We extractive machine reading comprehension prob- hope our observations and proposed challenging lems, which are beyond the scope of this paper. datasets may inspire further research on knowl- We compare our proposed tasks with similar edge acquisition and utilization for Chinese or datasets in Table 1. C3 -2A and C3 -2B can be re- cross-language reading comprehension. garded as new challenging tasks that have never been studied before. 2 Related Work 2.1 Machine Reading Comprehension and 2.2 Chinese Machine Reading Question Answering Comprehension and Non-Extractive Question Answering We discuss tasks in which texts are written in English. Much of the early work focuses on We have seen the construction of span-style constructing large-scale extractive MRC datasets: Chinese machine reading comprehension Answers are spans from the reference docu- datasets (Cui et al., 2016, 2018b,a; Shao et al., ment (Hermann et al., 2015; Hill et al., 2016; Baj- 2018), using Chinese news reports, books, and gar et al., 2016; Rajpurkar et al., 2016; Trischler Wikipedia articles as source documents, similar to et al., 2017; Joshi et al., 2017). As a question their English counterparts CNN/Daily Mail (Her- and its answer are usually in the same sentence, mann et al., 2015), CBT/BT (Hill et al., 2016; deep neural models (Devlin et al., 2019; Radford Bajgar et al., 2016), and SQuAD (Rajpurkar et al., et al., 2019) have outperformed human perfor- 2016), in which all answers are extractive spans mance on many such tasks. To increase the dif- from the provided reference documents. ficulty of MRC tasks, researchers have explored Previous work also focus on non-extractive ways including adding unanswerable (Trischler question answering tasks (Cheng et al., 2016; Guo
Language Task Reference Document Reference Corpus Domain Chinese English C3 -1A mixed-genre × general N/A RACE (Lai et al., 2017) C3 -1B dialogue × general N/A DREAM (Sun et al., 2019a) √ C3 -2A narrative psychological N/A N/A √ C3 -2B dialogue psychological N/A N/A Table 1: Comparison of C3 and existing representative large-scale multiple-choice reading comprehension tasks. et al., 2017a,b; Zhang and Zhao, 2018; Zhang reference document. The rest of the problems be- et al., 2018b; Hao et al., 2019), in which ques- long to the C3 -1A. We show a sample problem for tions are usually collected from examinations. An- each type in Table 2 and Table 3, respectively. other kind of non-extractive question answering Each domain-specific problem comprises a ref- tasks (He et al., 2017) is based on search engines erence document mostly about life concerns (e.g., (similar to English MS MARCO (Nguyen et al., social, work, family, school, and emotional or 2016)): Researchers collect questions from query physical health), which contains one or multiple logs and ask crowdsourcers to generate answers. sub-documents. Every sub-document is followed Compared to the tasks mentioned above, we fo- by a series of questions designed mainly for this cus on Chinese machine reading comprehension sub-document. Each question is associated with tasks that require prior knowledge to facilitate the several answer options, AT LEAST ONE of which understanding of the given text. is correct. The goal is to select all correct an- swer options. An answerer is allowed to read the 3 Data complete reference document and utilize the rele- vant knowledge in an additional reference corpus In this section, we describe how we construct such as a psychological counseling textbook (Sec- C3 (Section 3.1), and we analyze the data (Sec- tion 5.2) to reach the correct answer options. Sim- tion 3.2) and knowledge needed (Section 3.3). ilarly, domain-specific problems are divided into sub-tasks C3 -2A and C3 -2B. For each problem in 3.1 Collection Methodology and Task C3 -2B, its reference document is made up of one Definitions or multiple dialogues (in chronological order). C3 - We collect general-domain problems from Hanyu 2A contains the rest problems in which reference Shuiping Kaoshi (HSK) and Minzu Hanyu Kaoshi documents are third-person narratives. See a sam- (MHK), which are designed to evaluate the Chi- ple problem for each type in Appendix A (Table 12 nese listening and reading comprehension ability and Table 13) due to limited space. of second-language learners such as international We remove duplicate problems and randomly students, overseas Chinese, and ethnic minorities. split the data (13,924 documents and 23,990 ques- We collect domain-specific problems from Psy- tions in total) at the problem level, with 60% train- chological Counseling Examinations (a national ing, 20% development, and 20% test. qualification test that certifies level two and level three psychological counselors in China), which 3.2 Data Analysis focus on accessing the acquisition and retention We summarize the overall statistics of C3 in Ta- of subject knowledge. We include problems from ble 4. We observe some differences, which both real and practice exams, and all of them are may be relevant to the difficulty level of ques- freely accessible online for public usage. tions, exist between general-domain (i.e., C3 -1A Each general-domain problem consists of a ref- and C3 -1B) and domain-specific tasks (i.e., C3 - erence document and a series of questions. Each 2A and C3 -2B). For example, the percentage of question is associated with several answer options, non-extractive correct answer options in domain- EXACTLY ONE of which is correct. The goal is specific tasks (C3 -2A: 91.4%; C3 -2B: 95.2%) is to select the correct option. According to the much higher than that in general-domain Chinese reference document type, we divide the collected (C3 -1A: 81.9%; C3 -1B: 78.9%) and English lan- general-domain problems into two sub-tasks: C3 - guage exams (RACE (Lai et al., 2017): 87.0%; 1A and C3 -1B. In C3 -1B, a dialogue serves as the DREAM (Sun et al., 2019a): 83.7%). Besides,
1928年,经徐志摩介绍,时任中国公学校 In 1928, recommended by Hsu Chih-Mo (1897-1931), Hu Shih (1891- 长的胡适聘用了沈从文做讲师,主讲大学 1962), who was the president of the previous National University of 一年级的现代文学选修课。 China, employed Shen Ts’ung-wen (1902-1988) as a lecturer of the university who was in charge of teaching the optional course of modern literature. 当时,沈从文已经在文坛上崭露头角,在 At that time, Shen already made himself conspicuous in the literary 社会上也小有名气,因此还未到上课时 world and was a little famous in society. For this sake, even before the 间,教室里就坐满了学生。上课时间到 beginning of class, the classroom was crowded with students. Upon the 了,沈从文走进教室,看见下面黑压压 arrival of class, Shen went into the classroom. Seeing a dense crowd 一片,心里陡然一惊,脑子里变得一片空 of students sitting beneath the platform, Shen was suddenly startled and 白,连准备了无数遍的第一句话都堵在嗓 his mind went blank. He was even unable to utter the first sentence he 子里说不出来了。 had rehearsed repeatedly. 他呆呆地站在那里,面色尴尬至极,双手 He stood there motionlessly, extremely embarrassed. He wrung his 拧来拧去无处可放。上课前他自以为成 成竹 hands without knowing where to put them. Before class, he believed 在 胸 ,所以就没带教案和教材。整整10 分 that he had had a ready plan to meet the situation so he did not bring 钟,教室里鸦雀无声,所有的学生都好奇 his teaching plan and textbook. For up to 10 minutes, the classroom 地等着这位新来的老师开口。沈从文深吸 was in perfect silence. All the students were curiously waiting for the 了一口气,慢慢平静了下来,原先准备好 new teacher to open his mouth. Breathing deeply, he gradually calmed 的东西也重新在脑子里聚拢,然后他开始 down. Thereupon, the materials he had previously prepared gathered in 讲课了。不过由于他依然很紧张,原本预 his mind for the second time. Then he began his lecture. Nevertheless, 计一小时的授课内容,竟然用了不到15 分 since he was still nervous, it took him less than 15 minutes to finish the 钟就讲完了。 teaching contents he had planned to complete in an hour. 接下来怎么办?他再次陷入了窘境。无奈 What should he do next? He was again caught in embarrassment. He 之下,他只好拿起粉笔在黑板上写道:我 had no choice but to pick up a piece of chalk before writing several 第一次上课,见你们人多,怕了。 words on the blackboard: This is the first time I have given a lecture. In the presence of a crowd of people, I feel terrified. 顿时,教室里爆发出了一阵善意的笑声, Immediately, a peal of friendly laughter filled the classroom. Presently, 随即一阵鼓励的掌声响起。得知这件事之 a round of encouraging applause was given to him. Hearing this 后,胡适对沈从文大加赞赏,认为他非常 episode, Hu heaped praise upon Shen, thinking that he was very suc- 成功。 cessful. 有了这次经历,在以后的课堂上,沈从文 Because of this experience, Shen always reminded himself of not being 都会告诫自己不要紧张,渐渐地,他开始 nervous in his class for years afterwards. Gradually, he began to give 在课堂上变得从容起来。 his lecture at leisure in class. Q1 第2段中,“黑压压一片”指的是: Q1 In paragraph 2, “a dense crowd” refers to A. 教室很暗 A. the light in the classroom was dim. B. 听课的人多⋆ B. the number of students attending his lecture was large. ⋆ C. 房间里很吵 C. the room was noisy. D. 学生们发言很积极 D. the students were active in voicing their opinions. Q2 沈从文没拿教材,是因为他觉得: Q2 Shen did not bring the textbook because he felt that A. 讲课内容不多 A. the teaching contents were not many. B. 自己准备得很充分⋆ B. his preparation was sufficient. ⋆ C. 这样可以减轻压力 C. his mental pressure could be reduced in this way. D. 教材会限制自己的发挥 D. the textbook was likely to restrict his ability to give a lecture. Q3 看见沈从文写的那句话,学生们: Q3 Seeing the sentence written by Shen, the students A. 急忙安慰他 A. hurriedly consoled him. B. 在心里埋怨他 B. blamed him in mind. C. 受到了极大的鼓舞 C. were greatly encouraged. D. 表示理解并鼓励了他⋆ D. expressed their understanding and encouraged him.⋆ Q4 上文主要谈的是: Q4 The passage above is mainly about A. 中国教育制度的发展 A. the development of the Chinese educational system. B. 紧张时应如何调整自己 B. how to make self-adjustment if one is nervous. C. 沈从文第一次讲课时的情景⋆ C. the situation where Shen gave his lecture for the first time.⋆ D. 沈从文如何从作家转变为教师的 D. how Shen turned into a teacher from a writer. Table 2: A sample C3 -1A problem (left) and its English translation (right) (⋆: the correct answer option). the average document/question length of C3 -2A proficiency native readers who obtained at least a and C3 -2B is much longer than that of C3 -1A and bachelor’s degree in psychology, education, or so- C3 -1B. The differences are probably due to the ciology, while C3 -1A and C3 -1B are designed for fact that domain-specific exam designers (experts) those less proficient second-language learners. assume that most of the participants are high- Chinese idioms and proverbs, which are widely
F: How is it going? Have you bought your ticket? idioms and proverbs in Section 4.2. M: There are so many people at the railway station. I have waited in line all day long. However, when 3.3 Categories of Knowledge my turn comes, they say that there is no ticket left unless the Spring Festival is over. Because there is no prior work discussing the F: It doesn’t matter. It is all the same for you to come required knowledge in Chinese machine reading back after the Spring Festival is over. comprehension, we carefully analyze a subset of M: But according to our company’s regulation, I must go to the office on the 6th day of the first lunar questions randomly sampled from the develop- month. I’m afraid I have no time to go back after ment and test sets of C3 (Table 4) and arrive at the Spring Festival, so could you and my dad come the following three kinds of prior knowledge. to Shanghai for the coming Spring Festival? L INGUISTIC: To answer a given question (e.g., F: I am too old to endure the travel. M: It is not difficult at all. After I help you buy the Q2 in Table 2 and Q3 in Table 3), we require lex- tickets, you can come here directly. ical/grammatical knowledge include but not lim- Q1 What is the relationship between the speakers? ited to: idioms, proverbs, negation, antonymy, A. father and daughter synonymy, and sentence structures. B. mother and son ⋆ D OMAIN -S PECIFIC: This kind of world knowl- C. classmates edge consists of, but not limited to, facts D. colleagues Q2 What difficulty has the male met? about domain-specific concepts, their definitions A. his company does not have a vacation. and properties, and relations among these con- B. things are expensive during the Spring Festival. cepts (Grishman et al., 1983; Hansen, 1994). C. he has not bought his ticket. ⋆ G ENERAL W ORLD: It refers to the general D. he cannot find the railway station. Q3 What suggestion does the male put forth? knowledge about how world works, sometimes A. he invites the female to come to Shanghai. ⋆ called commonsense knowledge. We focus on B. he is going to wait in line the next day. the sort of world knowledge that an encyclope- C. he wants to go to the company as soon as possible. dia would assume readers know without being D. he is going to go home after the Spring Festival is over. told (Lenat et al., 1985; Schubert, 2002) instead Table 3: English translation of a sample problem from of the factual knowledge such as properties of C3 -1B (⋆: the correct answer option). We show the famous entities. We further break down gen- original problem in Chinese in Appendix A (Table 8). eral world knowledge into eight types, some of which (marked with †) are similar to the cate- gories for recognizing textual entailment summa- used in both written and oral language, play an rized by LoBue and Yates (2011). essential role in Chinese learning and understand- ing because of their conciseness in forms and ex- • Arithmetic† : This includes numerical compu- pressiveness in meaning (Lewis et al., 1998; Yang tation and analysis (e.g., comparisons). and Xie, 2013). We notice that a significant per- centage of reference documents in C3 (especially • Connotation: This includes knowledge about C3 -2A in Table 4) contain at least one idiom or implicit and implied sentiments (Feng et al., proverb. As the meaning of such an expression 2013; Van Hee et al., 2018). may not be predicted from the meanings of its constituent parts, we require culture-specific back- • Cause-effect† : The occurrence of a event A ground knowledge (Wong et al., 2010). For exam- causes the occurrence of event B. See Q2 in ple, to answer Q2 in Table 2, we need to know Table 2 for an example. that the bolded idiom “成竹在胸” means “has a ready plan to meet the situation” instead of • Implication: This category indicates the its literal meaning “chest-have-fully developed- implicit inference from the content explic- bamboo” derived from a story about a painter who itly described in the text, which cannot be has a complete image of the bamboo in mind be- reached by paraphrasing sentences using lin- fore drawing it. Therefore, the frequent use of id- guistic knowledge. For example, Q1 and Q4 ioms and proverbs in C3 may impede comprehen- in Table 2 belong to this category. sion of human readers as well as pose challenges for machine readers. We will introduce details • Part-whole: We require knowledge that ob- about how we attempt to teach machine readers ject A is a part of object B. Relations such as
Metric C3 -1A C3 -1B C3 -2A C3 -2B Min./Avg./Max. # of options per question 2 / 3.7 / 4 3 / 3.8 / 4 4/4/4 4/4/4 Min./Avg./Max. # of correct options per question 1/1/1 1/1/1 1 / 1.9 / 4 1 / 1.8 / 4 Min./Avg./Max. # of questions per reference document 1 / 1.9 / 6 1 / 1.2 / 6 2 / 10.0 / 20 1 / 6.4 / 22 Avg./Max. option length (in characters) 6.5 / 45 4.4 / 31 5.6 / 39 6.5 / 36 Avg./Max. question length (in characters) 13.5 / 57 10.9 / 34 22.5 / 97 26.0 / 91 Avg./Max. reference document length (in characters) 180.2 / 1,274 76.3 / 1,540 395.9 / 995 440.1 / 1,651 character vocabulary size 4,120 2,922 2,093 2,075 non-extractive correct option (%) 81.9 78.9 91.4 95.2 (sub-)documents that contain proverbs or idioms (%) 25.4 7.8 70.9 33.6 # of (sub-)documents / # of questions Training 3,138 / 6,013 4,885 / 5,856 143 / 1,414 225 / 1,216 Development 1,046 / 1,991 1,628 / 1,825 45 / 469 41 / 406 Testing 1,045 / 2,002 1,627 / 1,890 49 / 492 52 / 416 All 5,229 / 10,006 8,140 / 9,571 237 / 2,375 318 / 2,038 Table 4: The overall statistics of C3 . For C3 -2A and C3 -2B, the reference documents refer to the sub-documents, with the exception that we regard an option that does not appear in the full reference document as non-extractive. Metric C3 -1A C3 -1B C3 -1 C3 -2A C3 -2B C3 -2 Matching 12.0 14.3 13.2 0.0 0.0 0.0 Prior knowledge 88.0 85.7 86.8 100.0 100.0 100.0 ⋄ Linguistic 49.0 30.7 39.8 33.3 6.7 20.0 ⋄ Domain-specific 0.7 1.0 0.8 91.7 98.3 95.0 ⋄ General world 50.7 64.0 57.3 71.7 80.0 75.8 Arithmetic 3.0 4.7 3.8 0.0 0.0 0.0 Connotation 1.3 5.3 3.3 0.0 8.3 4.2 Cause-effect 14.0 6.7 10.3 6.7 1.7 4.2 Implication 17.7 20.3 19.0 6.7 10.0 8.3 Part-whole 5.0 5.0 5.0 0.0 0.0 0.0 Precondition 2.7 4.3 3.5 0.0 0.0 0.0 Scenario 9.6 24.3 17.0 61.7 68.3 65.0 Other 3.3 0.3 1.8 0.0 0.0 0.0 Single sentence 50.7 22.7 36.7 1.7 3.3 2.5 Multiple sentences 47.0 77.0 62.0 83.3 81.7 82.5 Independent 2.3 0.3 1.3 15.0 15.0 15.0 # of annotated questions 300 300 600 60 60 120 Table 5: Distribution (%) of types of required knowledge based on a subset of test and development sets of C3 . member-of, stuff-of, and component-of be- • Precondition† : If had event A not happened, tween two objects also fall into this cate- event B would not have happened (Ikuta gory (Winston et al., 1987; Miller, 1998). et al., 2014; O’Gorman et al., 2016). • Other: Knowledge that belongs to none of the • Scenario: It includes knowledge about hu- above categories. man behaviors or activities, which may in- volve corresponding time and location infor- As shown in Table 5, compared to narrative- mation. We also consider knowledge about based (C3 -2A) or dialogue-based (C3 -1B and C3 - the profession, education, personality, and 2B) tasks, we tend to require more linguistic mental or physical health of the involved par- knowledge and less general world knowledge to ticipant as well as the relations among the answer questions designed for well-written texts participants, indicated by the behaviors or ac- in C3 -1A. In C3 -2, not surprisingly, 95.0% of tivities described in texts. For example, we questions require domain-specific knowledge, and put Q3 in Table 2 in this category as “friendly we notice that a higher percentage (75.8%) of laughter” may express “understanding”. questions require general world knowledge es-
pecially the scenario-based knowledge (65.0%) ing model on C3 . Specifically, we generate compared to that in C3 -1. Besides, we require two types of problems: (1) Given an explana- multiple sentences to answer most of the domain- tion of a proverb/idiom, choose the corresponding specific questions, which also reflects the diffi- proverb/idiom; (2) Given a proverb/idiom, select culty of these kind of tasks. the corresponding explanation. To generate dis- tractors (wrong answer options), we first sort all 4 Approaches entries (i.e., proverbs and idioms) in alphabetical order and assume two entries are more likely to be 4.1 Reading Comprehension Model closer in meaning if they share more characters. We follow the framework of discriminatively For type (1) problems, we treat the entry close to fine-tuning pre-trained language models on ma- the correct entry as distractors. For type (2) prob- chine reading comprehension tasks (Radford et al., lems, distractors are explanations of entries close 2018). We use the Chinese BERT-Base model (de- to the given entry. See examples of generated noted as BERTCN ) released by Devlin et al. (2019) problems in Table 9 in Appendix A. When fine- as the pre-trained language model. tuning on the generated problems, we regard the Given a reference document d, a question q, and given explanation (for type (1) problems) or given an answer option oi , we construct the input se- entry (for type (2) problems) as the reference doc- quence by concatenating a [CLS] token, tokens ument and leave the question context empty. in d, a [SEP] token, tokens in q, a [SEP] token, tokens in oi , and a [SEP] token, where [CLS] 4.3 General World Knowledge Graph and [SEP] are the classifier token and sentence separator token in BERT, respectively. We add A graph of general world knowledge such as Con- an embedding A to every token before the first ceptNet (Speer et al., 2017) is useful to help us [SEP] token (inclusive) and a B embedding to understand the meanings behind the words and every other token, where A and B are pre-trained therefore may bridge knowledge gaps between hu- segmentation embeddings in BERT. We denote the man and machine readers. For instance, rela- final hidden state for the first token in the input se- tional triples under relation categories C AUSES quence as Si ∈ R1×H . For C3 -1A and C3 -1B, we and PARTO F in ConceptNet may be helpful for introduce a classification layer W1 ∈ R1×H and us to solve questions in C3 , which fall into cause- obtain the unnormalized log probability Pi ∈ R effect and part-whole subcategories of the general of oi being correct by Pi = Si W1T . For C3 - world knowledge defined in Section 3.3. 2A and C3 -2B, we introduce a classification layer We propose to introduce an additional fine- W2 ∈ R2×H and obtain the probabilities Pi ∈ R2 tuning stage to incorporate general world knowl- of answer option oi being correct and incorrect by edge. We first fine-tune BERTCN on multiple- Pi = softmax(Si W2T ). We refer readers to Devlin choice problems, which are automatically gener- et al. (2019) for more details. ated based on ConceptNet. We then fine-tune the resulting model on C3 . Let (a, r, b) denote a re- 4.2 Proverb and Idiom Knowledge lational triple in ConceptNet: a and b are Chi- As mentioned in Section 3.2, Chinese proverbs nese words or phrases; r represents the relation and idioms are usually difficult to understand type (e.g., C AUSES) between a and b. For each without enough background knowledge. Teach- relation type r, we introduce two special tokens ers generally believe that these expressions can [r→] and [r←] to represent r and its reverse re- be learned effectively via a proverb or idiom dic- lation type, respectively. We convert each (a, r, b) tionary (Lewis et al., 1998). Thus, we consider into two problems: (1) Given a and [r→], choose infusing the linguistic knowledge in a lexicon of b. (2) Given b and [r←], choose a. Distractors proverbs and idioms into the baseline reader. are formed by randomly picked Chinese words or We propose to introduce an additional fine- phrases in ConceptNet. See examples of generated tuning stage: Instead of directly fine-tuning problems in Table 10 in Appendix A. During the BERTCN on C3 , we first fine-tune BERTCN on fine-tuning stage on the generated problems, we multiple-choice proverb and idiom problems that regard the given word or phrase as the reference are automatically generated based on proverb and document and the given relation type token as the idiom dictionaries and then fine-tune the result- question.
5 Experiment However, none of the above methods outper- forms the BERTCN baseline. It remains a chal- 5.1 Experimental Settings lenge to leverage expert knowledge to improve the We set the learning rate to 2 × 10−5 , the batch size performance on C3 -2 for future investigation. to 24, and the maximal sequence length to 512. Gap between machine and human: We show hu- We truncate the longest sequence among d, q, and man performance on the same subset of C3 -1 used oi (Section 4.1) when the input sequence length for analysis of the required knowledge in Sec- exceeds 512. The embeddings of relation type to- tion 3.3. We do not report the human performance kens are initialized randomly (Section 4.3). For on C3 -2 due to the wide variance in the human ex- C3 -2A and C3 -2B, we regard each sub-document pertise with psychological counseling. We see a as d. When we introduce one additional fine- significant gap between the automated approach tuning stage before fine-tuning on the target C3 and human performance on C3 -1, especially on task, we first fine-tune on the additional task for questions that require prior knowledge or multi- one epoch. For all experiments, we fine-tune on ple sentences than questions that can be answered the target C3 task(s) for eight epochs. We run ev- by surface matching or only involve content from ery experiment five times with different random a single sentence (Section 3.3). seeds and report the best development set perfor- mance and its corresponding test set performance. 5.3 Impact of Linguistic Knowledge and General World Knowledge 5.2 Baseline Results and Discussion Linguistic Knowledge: We generate 86, 720 We report the baseline performance in Table 6 and problems based on 43, 360 proverbs/idioms and discuss the following aspects of our observations. their explanations. By introducing an additional Domain-specific knowledge: For domain- fine-tuning stage on the generated proverb and id- 3 specific questions in C -2, we explore three ways iom problems, we see consistent gain in accuracy to introduce domain-specific knowledge. over all C3 tasks compared to the BERTCN base- First, we follow previous work (Sun et al., line, with an absolute improvement of 2% in aver- 2019b) for English question answering tasks. age accuracy (Table 6). Given a reference document d, a question q, and We also compare with an alternative approach an answer option oi , we use Lucene (McCandless of imparting proverb and idiom knowledge. For et al., 2010) to retrieve top 50 sentences from two each reference document d, we use the same lex- psychological counselling textbooks by using the icon as used in proverb and idiom problem gen- concatenation of q and oi as a query. In compari- eration to look up proverbs and idioms in d for son, we append the retrieved sentences to the end- their explanations. Let a1...n and b1...n denote ing of d to form the new input sequence. proverbs/idioms in d and their corresponding ex- In another attempt, we collect 4,544 multiple- planations, respectively. We replace d with the choice question answering problems1 on core concatenation of d and a1 : b1 a2 : b2 . . . an : bn knowledge of psychological counseling from Psy- when constructing the input sequence. However, chological Counseling Examinations (the same this approach does not yield promising results. source as problems in C3 -2A and C3 -2B). Each General World Knowledge: We generate problem is composed of a question and four an- 737,534 problems based on ConceptNet. By in- swer options, at least one of which is correct. See troducing an additional fine-tuning stage on the an example in Table 11 in Appendix A. We first generated problems, we see gain in accuracy over fine-tune BERTCN on the core knowledge prob- most C3 tasks compared to the BERTCN baseline lems and then fine-tune the resulting model on C3 - (Table 6). We notice that C3 -1 benefits more than 2A/C3 -2B. In the first stage, we leave the reference C3 -2 from this general world knowledge graph in- document context empty as no context is provided. troduced by the proposed approach. In the third method, we run pre-training steps on 5.4 Other Attempt: Cross Genre Training two psychological counselling textbooks starting We also fine-tune BERTCN on sub-tasks from sim- from the BERTCN checkpoint and use the resulting ilar domains but in different genres simultane- model as the new pre-trained model. ously, instead of fine-tuning it on each of the four 1 We will release them along with C3 . C3 sub-tasks separately. We observe that BERTCN
C3 -1A C3 -1B C3 -2A C3 -2B Average Method Dev Test Dev Test Dev Test Dev Test Dev Test Random 27.8 27.8 26.4 26.6 6.7 6.7 6.7 6.7 16.9 17.0 BERTCN 63.0 62.6 62.3 62.1 36.7 26.2 34.7 31.3 49.2 45.6 Domain-specific Knowledge: BERTCN + IR from Textbooks – – – – 35.2 26.0 32.5 30.0 – – BERTCN + Core Knowledge Problems – – – – 34.8 27.0 35.5 29.6 – – BERTCN + Textbook Pre-Training – – – – 36.2 29.7 33.7 28.1 – – Linguistic Knowledge: BERTCN + Proverb and Idiom Look-Up 62.6 63.2 62.8 62.2 37.1 27.6 32.3 30.0 48.7 45.8 BERTCN + Proverb and Idiom Problems 63.6 63.5 63.9 64.0 38.8 30.9 38.4 32.2 51.2 47.7 General World Knowledge: BERTCN + Graph-Structured Knowledge 63.8 65.0 63.9 64.5 37.7 28.9 34.7 32.0 50.0 47.6 Cross Genre: BERTCN 64.8 65.1 65.4 64.3 37.1 28.0 36.7 33.7 51.0 47.8 ⋆ Human Performance 96.0 93.3 98.0 98.7 – – – – – – Table 6: Performance of baseline and models absorbing linguistic knowledge and general world knowledge in accuracy (%) on the C3 dataset (⋆: performance based on a subset of test and development sets of C3 ). BERTCN Human References C3 -1A | C3 -1B C3 -1A | C3 -1B Ondrej Bajgar, Rudolf Kadlec, and Jan Kleindi- Matching 100.0 | 74.1 100.0 | 100.0 Prior knowledge 57.6 | 60.2 95.7 | 97.6 enst. 2016. Embracing data abundance: Book- test dataset for reading comprehension. CoRR, Single sentence 65.8 | 79.4 97.0 | 97.0 Multiple sentences 55.6 | 57.8 94.0 | 98.0 cs.CL/1610.00956v1. Table 7: Performance comparison in accuracy (%) Gong Cheng, Weixi Zhu, Ziwei Wang, Jianghui based on the coarse-grained question categories. Chen, and Yuzhong Qu. 2016. Taking up the gaokao challenge: An information retrieval ap- proach. In Proceedings of the IJCAI, pages trained on the combination of C3 -1A or C3 -1B 2479–2485, New York City, NY. consistently outperforms the same model trained solely on C3 -1A or C3 -1B. We have a similar ob- Eunsol Choi, He He, Mohit Iyyer, Mark Yatskar, servation on C3 -2. Wen-tau Yih, Yejin Choi, Percy Liang, and Luke Zettlemoyer. 2018. QuAC: Question an- 6 Conclusion swering in context. In Proceedings of the EMNLP, pages 2174–2184, Brussels, Belgium. We present the first collection of Challenging Chi- Zewei Chu, Hai Wang, Kevin Gimpel, and David nese multiple-choice machine reading Compre- McAllester. 2017. Broad context language hension datasets (C3 ) collected from real-world modeling as reading comprehension. In Pro- exams, requiring linguistic, general or domain- ceedings of the EACL, pages 52–57, Valencia, specific knowledge to answer questions based on Spain. the given oral or written text. We study the prior knowledge needed in these challenging read- Peter Clark, Isaac Cowhey, Oren Etzioni, Tushar ing comprehension tasks and further explore how Khot, Ashish Sabharwal, Carissa Schoenick, to utilize linguistic, general world, and domain- and Oyvind Tafjord. 2018. Think you have specific knowledge to improve the comprehen- solved question answering? try arc, the ai2 rea- sion ability of machine readers through fine-tuning soning challenge. CoRR, cs.CL/1803.05457v1. BERT. Experimental results show that linguis- tic and general world knowledge may help the Peter Clark, Oren Etzioni, Tushar Khot, Ashish reader baseline perform better in both general and Sabharwal, Oyvind Tafjord, Peter D Turney, domain-specific reading comprehension tasks. and Daniel Khashabi. 2016. Combining re- trieval, statistics, and inference to answer ele-
mentary science questions. In Proceedings of Yu Hao, Xien Liu, Ji Wu, and Ping Lv. 2019. Ex- the AAAI, pages 2580–2586, Phoenix, AZ. ploiting sentence embedding for medical ques- tion answering. In Proceedings of the AAAI, Yiming Cui, Ting Liu, Zhipeng Chen, Wentao Ma, Honolulu, HI. Shijin Wang, and Guoping Hu. 2018a. Dataset for the first evaluation on chinese machine read- Wei He, Kai Liu, Jing Liu, Yajuan Lyu, Shiqi ing comprehension. In Proceedings of the Zhao, Xinyan Xiao, Yuan Liu, Yizhong Wang, LREC, pages 2721–2725, Miyazaki, Japan. Hua Wu, Qiaoqiao She, et al. 2017. Dureader: a chinese machine reading comprehension Yiming Cui, Ting Liu, Zhipeng Chen, Shijin dataset from real-world applications. In Pro- Wang, and Guoping Hu. 2016. Consensus ceedings of the MRQA, pages 37–46, Mel- attention-based neural networks for chinese bourne, Australia. reading comprehension. In Proceedings of the COLING, pages 1777–1786, Osaka, Japan. Karl Moritz Hermann, Tomas Kocisky, Edward Grefenstette, Lasse Espeholt, Will Kay, Mustafa Yiming Cui, Ting Liu, Li Xiao, Zhipeng Chen, Suleyman, and Phil Blunsom. 2015. Teaching Wentao Ma, Wanxiang Che, Shijin Wang, and machines to read and comprehend. In Proceed- Guoping Hu. 2018b. A span-extraction dataset ings of the NIPS, pages 1693–1701, Montreal, for chinese machine reading comprehension. Canada. CoRR, cs.CL/1810.07366v1. Felix Hill, Antoine Bordes, Sumit Chopra, and Ja- Jacob Devlin, Ming-Wei Chang, Kenton Lee, and son Weston. 2016. The goldilocks principle: Kristina Toutanova. 2019. BERT: Pre-training Reading children’s books with explicit memory of deep bidirectional transformers for language representations. In Proceedings of the ICLR, understanding. In Proeedings of the NAACL- Caribe Hilton, Puerto Rico. HLT, Minneapolis, MN. Rei Ikuta, Will Styler, Mariah Hamang, Tim Song Feng, Jun Seok Kang, Polina Kuznetsova, O’Gorman, and Martha Palmer. 2014. Chal- and Yejin Choi. 2013. Connotation lexicon: A lenges of adding causation to richer event de- dash of sentiment beneath the surface meaning. scriptions. In Proceedings of the Second Work- In Proceedings of the ACL, pages 1774–1784, shop on EVENTS: Definition, Detection, Coref- Sofia, Bulgaria. erence, and Representation, pages 12–20, Bal- Ralph Grishman, Lynette Hirschman, and Carol timore, MD. Friedman. 1983. Isolating domain dependen- cies in natural language interfaces. In Proceed- Mandar Joshi, Eunsol Choi, Daniel S. Weld, ings of the ANLP, pages 46–53. and Luke Zettlemoyer. 2017. TriviaQA: A large scale distantly supervised challenge Shangmin Guo, Kang Liu, Shizhu He, Cao Liu, dataset for reading comprehension. CoRR, Jun Zhao, and Zhuoyu Wei. 2017a. Ijcnlp- cs.CL/1705.03551v2. 2017 task 5: Multi-choice question answering in examinations. In Proceedings of the IJCNLP Daniel Khashabi, Snigdha Chaturvedi, Michael 2017, Shared Tasks, pages 34–40, Taipei, Tai- Roth, Shyam Upadhyay, and Dan Roth. 2018. wan. Looking beyond the surface: A challenge set for reading comprehension over multiple sen- Shangmin Guo, Xiangrong Zeng, Shizhu He, tences. In Proceedings of the NAACL-HLT, Kang Liu, and Jun Zhao. 2017b. Which is the pages 252–262, New Orleans, LA. effective way for gaokao: Information retrieval or neural networks? In Proceedings of the Tomáš Kočiskỳ, Jonathan Schwarz, Phil Blun- EACL, pages 111–120, Valencia, Spain. som, Chris Dyer, Karl Moritz Hermann, Gáa- bor Melis, and Edward Grefenstette. 2018. The Steffen Leo Hansen. 1994. Reasoning with a do- narrativeqa reading comprehension challenge. main model. In Proceedings of the NODALIDA, Transactions of the Association of Computa- pages 111–121. tional Linguistics, 6:317–328.
Guokun Lai, Qizhe Xie, Hanxiao Liu, Yiming Tri Nguyen, Mir Rosenberg, Xia Song, Jianfeng Yang, and Eduard Hovy. 2017. RACE: Large- Gao, Saurabh Tiwary, Rangan Majumder, and scale reading comprehension dataset from ex- Li Deng. 2016. MS MARCO: A human gen- aminations. In Proceedings of the EMNLP, erated machine reading comprehension dataset. pages 785–794, Copenhagen, Denmark. CoRR, cs.CL/1611.09268v2. Douglas B Lenat, Mayank Prakash, and Mary Tim O’Gorman, Kristin Wright-Bettner, and Shepherd. 1985. Cyc: Using common sense Martha Palmer. 2016. Richer event descrip- knowledge to overcome brittleness and knowl- tion: Integrating event coreference with tempo- edge acquisition bottlenecks. AI magazine, ral, causal and bridging annotation. In Proceed- 6(4):65–65. ings of the CNS, pages 47–56, Austin, TX. Omer Levy, Minjoon Seo, Eunsol Choi, and Luke Simon Ostermann, Michael Roth, Ashutosh Modi, Zettlemoyer. 2017. Zero-shot relation extrac- Stefan Thater, and Manfred Pinkal. 2018. tion via reading comprehension. In Proceed- SemEval-2018 Task 11: Machine comprehen- ings of the CoNLL, pages 333–342, Vancouver, sion using commonsense knowledge. In Pro- Canada. ceedings of the SemEval, pages 747–757, New Orleans, LA. R Lewis, RWP Luk, and ABY Ng. 1998. Computer-assisted learning of chinese id- Hoifung Poon, Janara Christensen, Pedro Domin- ioms. Journal of Computer Assisted Learning, gos, Oren Etzioni, Raphael Hoffmann, Chloe 14(1):2–18. Kiddon, Thomas Lin, Xiao Ling, Alan Ritter, Stefan Schoenmackers, et al. 2010. Machine Peter LoBue and Alexander Yates. 2011. Types reading at the university of washington. In Pro- of common-sense knowledge needed for recog- ceedings of the NAACL-HLT FAM-LbR, pages nizing textual entailment. In Proceedings of the 87–95, Los Angeles, CA. ACL, pages 329–334, Portland, OR. Alec Radford, Karthik Narasimhan, Tim Sali- Shangbang Long, Cunchao Tu, Zhiyuan Liu, mans, and Ilya Sutskever. 2018. Improving lan- and Maosong Sun. 2018. Automatic judg- guage understanding by generative pre-training. ment prediction via legal reading comprehen- In Preprint. sion. CoRR, cs.AI/1809.06537v1. Alec Radford, Jeffrey Wu, Rewon Child, David Michael McCandless, Erik Hatcher, and Otis Luan, Dario Amodei, and Ilya Sutskever. 2019. Gospodnetic. 2010. Lucene in Action, Second Language models are unsupervised multitask Edition: Covers Apache Lucene 3.0. Manning learners. In Preprint. Publications Co., Greenwich, CT. Pranav Rajpurkar, Robin Jia, and Percy Liang. 2018. Know what you don’t know: Unanswer- Todor Mihaylov, Peter Clark, Tushar Khot, and able questions for squad. In Proceedings of th Ashish Sabharwal. 2018. Can a suit of armor ACL, pages 784–789, Melbourne, Australia. conduct electricity? a new dataset for open book question answering. In Proceedings of the Pranav Rajpurkar, Jian Zhang, Konstantin Lopy- EMNLP, pages 2381–2391, Brussels, Belgium. rev, and Percy Liang. 2016. SQuAD: 100,000+ questions for machine comprehension of text. George Miller. 1998. WordNet: An electronic lex- In Proceedings of the EMNLP, pages 2383– ical database. MIT press. 2392, Austin, TX. Nasrin Mostafazadeh, Nathanael Chambers, Xi- Siva Reddy, Danqi Chen, and Christopher D aodong He, Devi Parikh, Dhruv Batra, Lucy Manning. 2018. Coqa: A conversational Vanderwende, Pushmeet Kohli, and James question answering challenge. CoRR, Allen. 2016. A corpus and evaluation frame- cs.CL/1808.07042v1. work for deeper understanding of commonsense stories. In Proceedings of the NAACL-HLT, Matthew Richardson, Christopher JC Burges, and pages 839–849, San Diego, CA. Erin Renshaw. 2013. MCTest: A challenge
dataset for the open-domain machine compre- Johannes Welbl, Pontus Stenetorp, and Sebastian hension of text. In Proceedings of the EMNLP, Riedel. 2018. Constructing datasets for multi- pages 193–203, Seattle, WA. hop reading comprehension across documents. Transactions of the Association of Computa- Lenhart Schubert. 2002. Can we derive general tional Linguistics, 6:287–302. world knowledge from texts? In Proceedings of the HLT, pages 94–97, San Diego, CA. Morton E Winston, Roger Chaffin, and Douglas Herrmann. 1987. A taxonomy of part-whole re- Chih Chieh Shao, Trois Liu, Yuting Lai, Yiying lations. Cognitive science, 11(4):417–444. Tseng, and Sam Tsai. 2018. DRCD: a chi- nese machine reading comprehension dataset. Lung-Hsiang Wong, Chee-Kuen Chin, Chee-Lay CoRR, cs.CL/1806.00920v2. Tan, and May Liu. 2010. Students’ personal and social meaning making in a chinese idiom Robyn Speer, Joshua Chin, and Catherine Havasi. mobile learning environment. Journal of Edu- 2017. ConceptNet 5.5: An Open Multilingual cational Technology & Society, 13(4):15–26. Graph of General Knowledge. In Proceedings of the AAAI, pages 4444–4451, San Francisco, Chunsheng Yang and Ying Xie. 2013. Learn- CA. ing chinese idioms through ipads. Language Learning & Technology, 17(2):12–23. Shane Storks, Qiaozi Gao, and Joyce Y Chai. 2019. Commonsense reasoning for natural Zhilin Yang, Peng Qi, Saizheng Zhang, Yoshua language understanding: A survey of bench- Bengio, William Cohen, Ruslan Salakhutdi- marks, resources, and approaches. CoRR, nov, and Christopher D Manning. 2018. Hot- cs.CL/1904.01172v1. potqa: A dataset for diverse, explainable multi- hop question answering. In Proceedings of the Kai Sun, Dian Yu, Jianshu Chen, Dong Yu, Yejin EMNLP, pages 2369–2380, Brussels, Belgium. Choi, and Claire Cardie. 2019a. DREAM: A challenge dataset and models for dialogue- Sheng Zhang, Xiaodong Liu, Jingjing Liu, based reading comprehension. Transactions of Jianfeng Gao, Kevin Duh, and Benjamin the Association for Computational Linguistics. Van Durme. 2018a. Record: Bridging the gap between human and machine com- Kai Sun, Dian Yu, Dong Yu, and Claire Cardie. monsense reading comprehension. CoRR, 2019b. Improving machine reading compre- cs.CL/1810.12885v1. hension with general reading strategies. In Proceedings of the NAACL-HLT, Minneapolis, Xiao Zhang, Ji Wu, Zhiyang He, Xien Liu, and MN. Ying Su. 2018b. Medical exam question an- swering with large-scale reading comprehen- Alon Talmor, Jonathan Herzig, Nicholas Lourie, sion. In Proceedings of the AAAI, pages 5706– and Jonathan Berant. 2018. CommonsenseQA: 5713, New Orleans, LA. A question answering challenge targeting com- monsense knowledge. In Proceedings of the Zhuosheng Zhang and Hai Zhao. 2018. One-shot NAACL-HLT, Minneapolis, MN. learning for question-answering in gaokao his- tory challenge. In Proceedings of the COLING, Adam Trischler, Tong Wang, Xingdi Yuan, Justin pages 449–461, Santa Fe, NM. Harris, Alessandro Sordoni, Philip Bachman, and Kaheer Suleman. 2017. NewsQA: A ma- chine comprehension dataset. In Proceedings of the RepL4NLP, pages 191–200, Vancouver, Canada. Cynthia Van Hee, Els Lefever, and Véronique Hoste. 2018. We usually don’t like going to the dentist: Using common sense to detect irony on twitter. Computational Linguistics, 44(4):793– 832.
A Appendices 女 : 怎么样?买到票了吗? 男 : 火车站好多人啊,我排了整整一天的队,等排 到我了,他们说没票了,要等过了年才有。 女 : 没关系,过了年回来也是一样的。 男 : 公司初六就上班了,我怕过了年来不及。要不 今年您和我爸来上海过年吧? Sample Problem 1: 女 : 我这老胳膊老腿的不想折腾了。 努力赚钱 男 : 一点儿不折腾,等我帮你们买好票,你们直接 过来就行。 Q [MotivatedByGoal→] A. 增广见识 Q1 说话人是什么关系? B. 打开电视机 A. 父女 C. 刚搬家 B. 母子⋆ D. 美好的生活⋆ C. 同学 D. 同事 Sample Problem 2: Q2 男的遇到了什么困难? 美好的生活 A. 公司不放假 B. 过年东西贵 Q [MotivatedByGoal←] C. 没买到车票⋆ A. 想偷懒的时候 D. 找不到车站 B. 努力赚钱⋆ Q3 男的提出了什么建议? C. 抢妹妹的食物 A. 让女的来上海⋆ D. 龟起来 B. 明天再去排队 C. 早点儿去公司 Table 10: Examples of generated problems based on D. 过了年再回家 general relational knowledge (⋆: the correct answer op- tion). Table 8: A sample problem from C3 -1B (⋆: the correct answer option). Sample Problem 1: 古谚语,意思是为将者善战,其士卒亦必勇敢无 前。亦比喻凡事为首者倡导于前,则其众必起而 效之。 A. 一人向隅,满坐不乐 B. 一人善射,百夫决拾⋆ C. 一人得道,鸡犬升天 D. 一人传虚,万人传实 Sample Problem 2: 龙跳虎伏 A. 犹言龙腾虎卧。比喻笔势。⋆ 对求助者形成初步印象的工作程序包括( )。 B. 比喻高举远逝。 A. 对求助者心理健康水平进行衡量⋆ C. 比喻文笔、书法纵逸雄劲。 B. 对求助者心理问题的原因作解释 D. 比喻超逸雄奇。 C. 对求助者的问题进行量化的评估⋆ D. 对某些含混的临床表现作出鉴别⋆ Table 9: Examples of generated Chinese proverb/idiom problems (⋆: the correct answer option). Table 11: An example of problems on core knowledge of psychological counseling (⋆: the correct answer op- tion).
General information: a female, at the age of 24, unmarried, a cashier. The help seeker’s self-narration: In the past two years, I have always felt everything is dirty, especially money. For this sake, I wash my hands so frequently that the skin of them has peeled off. Even so, I do not feel at ease. I know that this is not good for me but I cannot help doing so. Case Introduction: Ten years ago, the help seeker went to hospital to pay a visit to her classmate. After she went home, she ate an apple without washing her hands and was scolded by her parents after being noticed by them. They warned her that she would fall ill if she eats things without washing her hands. For this sake, she was worried and suffered from insomnia for two days. Since then, whenever she has come home from school, she has remembered to wash her hands earnestly. Little by little, this episode has gone by. Two years ago, she became a cashier dealing with money every day. She always believes that money is dirty for it is covered with numerous bacteria. So she washes her hands repeatedly after work. In spite of being clearly aware that her hands are quite clean, she is still unable to control her mentality and always afraid that “in case they might not washed clean”? Much time has been spent in her perplexity. She is so vexed that she has been unable to go to work recently. A month ago, she began to worry that she was likely to suffer from a mental disorder, which has made her dispirited and sleepless. As a result, she often suffers from a headache and frequently sees doctors. However, she is unwilling to take the medicine prescribed by doctors because of many side effects written in the instructions. Her parents think that she does not have mental illness and accompany her to seek for psychological counseling. According to the parents of the help seeker, she is the only child in her family and her parents are strict with her. During her childhood, she was not permitted to go out to play by herself. Since her parents were busy, she was sent to her grandmother’s home to be taken care of. Her grandmother tightly protected her and did not allow her to play with other little friends. Every day, she must have meals and go to bed on time. Later, she was at school. She was very meticulous in her study and her academic results were excellent. Yet her classmate relation was ordinary. She listened to her parents and was careful about what she did. She was a little coward. After failing to pass the postgraduate entrance examination after her graduation from college, she once suffered from depression and insomnia. Fortunately, everything has gone well with her since she was employed. As a cashier, she has to deal with money every day and always thinks that money is dirty. Consequently, she has washed her hands more and more frequently. Recently, she has been so depressed that she is unable to work. Q1 The main symptoms of the help seeker are ( ) A. fear B. depression⋆ C. obsession ⋆ D. compulsive behavior ⋆ Q2 Which of the following symptoms does not happen to the help seeker ( ) A. palpitation B. insomnia C. headache D. nausea⋆ Q3 The main behavioral symptoms of the help seeker are ( ) A. repeated behavior⋆ B. repeated counseling C. repeated examination⋆ D. repeated weeping Q4 Which of the following behavioral symptoms does not happen to the help seeker ( ) A. worry B. depression C. frequent headaches D. nervousness and fear⋆ Q5 The course of disease on the part of the help seeker is ( ) A. one month B. two years⋆ C. three months D. ten years Q6 The psychological characteristics of the help seeker include ( ) A. strict family education B. cowardliness and overcaution ⋆ C. failure in passing the postgraduate entrance examination D. headaches and insomnia Q7 The reasons for judging whether the help seeker has a normal mentality are ( ) A. whether she has self-consciousness ⋆ B. see doctors of her own accord⋆ C. severity of symptom D. impairment of social function Q8 The causes of the help seeker’s psychological problem exclude ( ) A. personality factors B. cognitive factors C. examination stress⋆ D. stress imposed by her parents⋆ Q9 The causes of the formation of the help seeker’s personality may be ( ) A. parental control⋆ B. tight protection given by her grandmother⋆ C. work environment D. failure in passing the postgraduate entrance examination Q10 The help seeker’s characteristics of personality include ( ) A. excessive demands on herself⋆ B. excessive pursuit of perfection⋆ C. excessive sentimentality D. excessively self-righteous Q11 The crux of the psychological problem on the part of the help seeker lies in ( ) A. self-inferiority and parental criticism B. fear and being afraid of falling ill⋆ C. anxiety and excessive demands on herself D. depression and the loss of work Q12 The life events affecting the help seeker include ( ) A. separation from her parents during her childhood⋆ B. being unable to play by herself when she was a child C. insomnia caused by her failure in passing the postgraduate entrance examination D. being scolded for eating things without washing her hands⋆ Table 12: English translation of a sample problem (Part 1) from C3 -2A (⋆: the correct answer option). We show the original problem in Chinese in Table 14.
Q13 The grounds for the help seeker’s mental disease exclude ( ) A. free of depressive symptom B. free of hallucination ⋆ C. free of delusional disorder⋆ D. free of thinking disorder Q14 The definite diagnose on the part of the help seeker needs to have a further understanding of the following materials ( ) A. her body conditions B. characteristics of her inner world⋆ C. her economic conditions D. her inter-personal communication⋆ Q15 While offering the counseling service to the help seeker, what a counselor should pay attention to are ( ) A. job change B. behavior change⋆ C. mood change⋆ D. cognitive change ⋆ Table 12: English translation of a sample problem (Part 2) from C3 -2A (⋆: the correct answer option). We show the original problem in Chinese in Table 14.
General information: a help seeker, male, at the age of 24, graduation from a university, waiting for employment. Case introduction: Two years have passed since the help seeker graduated from a university. Coerced by his parents, he once went to a job fair. However, shortly after he arrived at the job fair, he went away without saying one sentence. He confesses that he is not good at expressing himself. Seeing that other people are skilled at promoting themselves, he feels that he is inferior to others in this regard. Because of the lack of work experience, he is afraid that he cannot be competent for a job. Therefore, he always has no self-confidence, only to stay at home. The information of the help seeker observed and understood by a psychological counselor includes introverted person- ality, poor independence, no interest in making friends and low mood. The following is a section of talk between the psychological counselor and the help seeker: Psychological Counselor: What aspect would you like to receive my help? Help Seeker: I am afraid to go to a job fair. Psychological Counselor: Have you been there? Help Seeker: Yes, I have. But I only wandered around before going away. Psychological Counselor: You only wandered about without saying anything. Can you be called a candidate? Help Seeker: I’m afraid to tell my intention in case I might be declined. What’s more, I’m worried that I am not competent at a job. Psychological Counselor: Just now, you’ve said that you are eager to land a job but now you tell me that you have participated in almost no job fair. There seems to be a contradiction. Can you explain it? Help Seeker: Almost two years have passed since I graduated from university. Yet I have not landed a job. I am really anxious. But I really have no idea how I can prepare for a job fair. Psychological Counselor: As a university graduate, you’d better consider how to attend a job fair. Help Seeker: (keeps silent for a moment) I have to go to a job fair and tell employers that I need a job and I must discuss with them about work nature, conditions, wage and so on. Psychological Counselor: For what reasons have you not done these? Help Seeker: I’m afraid that they may not accept me if I go there. Psychological Counselor: You mean that as long as you go there, employers will be scrambling for employing you. In other words, if you go to the State Council to apply for a job, you will make it; if you go to the municipal government to seek a job, you will still realize your intention and if you go to an enterprise to apply for a post, you will succeed all the same. Help Seeker: Ah. . . it seems that I don’t think so (shaking his head). Psychological Counselor: Since time is limited, so much for this counseling. Please go home to think it over. Next time, we can continue to discuss this issue. Q1 The major causes that trigger the help seeker’s psychological problem include ( ) A. personality factors⋆ B. inter-personal stress C. cognitive factors⋆ D. economic pressures Q2 At the beginning of the counseling, the approaches to asking questions adopted by the psychological counselor are () A. to question intensely B. to ask open-ended questions⋆ C. to ask indirectly about something⋆ D. to ask closed questions Q3 The psychological counselor says, “You only wandered about without saying anything. Can you be called a candidate? ” He indicates his ( ) attitude to the help seeker. A. reprobation B. query⋆ C. enlightenment D. encouragement Q4 By saying “There seems to be a contradiction. Can you explain it? ” the psychological counselor adopts the following tactic ( ) A. guidance B. confrontation⋆ C. encouragement D. explanation Q5 Silence phenomenon has several major types except for ( ) A. suspicion type B. intellectual type⋆ C. vacant type D. resistant type Q6 “I’m afraid that they may not accept me if I go there.” This sentence reflects the help seeker’s ( ) A. pessimistic attitude B. overgeneralization C. absolute requirement D. extreme awfulness⋆ Q7 The psychological counselor says, “You mean that as long as you go there, ...” What tactic is employed in this paragraph? A. Aristotle’s sophistry B. “mid-wife” argumentation ⋆ C. technique of rational-emotion imagination D. rational analysis report The help seeker is the same person in the passage above. Here is the section of the second talk between the psychological counselor and the help seeker: Psychological Counselor: From the perspective of psychology, what triggers your emotional reaction is not some events that have happened externally but your opinions of these events. It is necessary to change your emotion instead of external events. That is to say, it is necessary to change your opinions and comment on these events. In fact, people have their opinions of things. Some of their opinions are reasonable while others are not reasonable. Different opinions may lead to different emotional results. If you have realized that your present emotional state has been caused by some unreasonable ideas in your mind, perhaps you are likely to control your emotion. Table 13: Translation of a sample problem (Part 1) from C3 -2B (⋆: the correct answer option). We show the original problem in Chinese in Table 15.
You can also read