Natural Language Processing for Computer Scientists and Data Scientists at a Large State University
←
→
Page content transcription
If your browser does not render page correctly, please read the page content below
Natural Language Processing for Computer Scientists and Data Scientists at a Large State University Casey Kennington Department of Computer Science Boise State University caseykennington@boisestate.edu Abstract I reflect on my experience setting up and main- taining a class in NLP at Boise State University, a The field of Natural Language Processing large state university, how to prepare students for (NLP) changes rapidly, requiring course offer- research and industry careers, and how the class has ings to adjust with those changes, and NLP is not just for computer scientists; it’s a field changed over four years to fit the needs of students. that should be accessible to anyone who has a The next section explains the course objectives. sufficient background. In this paper, I explain I then explain challenges that are likely common to how students with Computer Science and Data many university student populations, how NLP is Science backgrounds can be well-prepared for designed for students with Data Science and Com- an upper-division NLP course at a large state puter Science backgrounds, then I explain course university. The course covers probability and content including lecture topics and assignments information theory, elementary linguistics, ma- that are designed to fulfill the course objectives. I chine and deep learning, with an attempt to bal- ance theoretical ideas and concepts with prac- then offer a reflection on the three times I taught tical applications. I explain the course objec- this course over the past four years, and future plans tives, topics and assignments, reflect on adjust- for the course. ments to the course over the last four years, as well as feedback from students. 2 Course Objectives 1 Introduction Most students who take an NLP course will not pursue a career in NLP proper; rather, they take Thanks in part to a access to large datasets, in- the course to learn skills that will help them find creases in compute power, and easy-to-use pro- employment or do research in areas that make use gramming programming libraries that leverage neu- of language, largely focusing on the medium of text ral architectures, the field of Natural Language Pro- (e.g., anthropology, information retrieval, artificial cessing (NLP) has become more popular and has intelligence, data mining, social media network seen more widespread adoption in research and in analysis, provided they have sufficient data science commercial products. On the research side, the training). My goal for the students is that they Association for Computational Linguistics (ACL) can identify aspects of natural language (phonet- conference—the flagship NLP conference—and ics, syntax, semantics, etc.) and how each can be related annual conferences have seen dramatic in- processed by a computer, explain the difference creases in paper submissions. For example, in 2020 between classification models and approaches, be ACL had 3,429 paper submissions, whereas 2019 able to map from (basic) formalisms to functional had 2,905, and this upward trend has been happen- code, and use existing tools, libraries, and data sets ing for several years. Certainly, lowering barriers for learning while attempting to strike a balance to access of NLP methods and tools for researchers between theory and practice. In my view, there are and practitioners is a welcome direction for the several aspects of NLP that anyone needs to grasp, field, enabling researchers and practitioners from and how to apply NLP techniques in novel circum- many disciplines to make use of NLP. stances. Those aspects are illustrated in Figure 1. It is therefore becoming more important to better No single NLP course can possibly account for equip students with an understanding of NLP to a level of depth in all of the aspects in Figure 1, but prepare them for careers either directly related to a student who has taken courses or has experience NLP, or which leverage NLP skills. In this paper, in at least two of the areas (e.g., they have taken a 115 Proceedings of the Fifth Workshop on Teaching NLP, pages 115–124 June 10–11, 2021. ©2021 Association for Computational Linguistics
3 Preparing Students with Diverse Academic and Technical Backgrounds Boise State University is the largest university in Idaho, situated in the capital of the State of Idaho. The university has a high number of non-traditional students (e.g., students outside the traditional stu- dent age range, or second-degree seeking students). Moreover, the university has a high acceptance rate (over 80%) for incoming first-year students. As is the case with many universities and organizations, a greater need for “computational thinking" among students of many disciplines has been an important driver of recent changes in course offerings across Figure 1: Areas of content that are important for a many departments. Moreover, certain departments course in Natural Language Processing. have responded to the need and student interest in machine learning course offerings. In this section, we discuss how we altered the Data Science and statistics course and have experience with Python, Computer Science curricula to meet these needs or they have taken linguistics courses and have used and the implications these changes have had on the some data science or machine learning libraries) NLP course.1 will find success in the course more easily than those with no experience in any aspect. Data Science The Data Science offerings begin This introduces a challenge that has been ex- with a foundational course (based on Berkeley’s plored in prior work on teaching NLP (Fosler- data8 content) that has only a very basic math pre- 2 It introduces and allows students to Lussier, 2008): the diversity of the student pop- requisite. ulation. NLP is a discipline that is not just for practice Python, Jupyter notebooks, data analysis computer science students, but it is challenging to and visualization, and basic statistics (including prepare students for the technical skills required in the bootstrap method of statistical significance). a NLP course. Moreover, similar to the student pop- Several courses follow this course that are more ulation in Fosler-Lussier (2008), there should be domain specific, giving the students options for course offerings for both graduate and undergradu- gaining practical experience in Data Science skills ate students. In my case, which is fairly common relative to their abilities and career goals. One path in academia, as the sole NLP researcher at the uni- more geared towards students of STEM-related versity I can only offer one course once every four majors (though not targeting Computer Science semesters for both graduate and undergraduate stu- majors) as well as some majors in the Humanities, dents, but also students with varied backgrounds– is a certificate program that includes the founda- not only computer science. As a result, this is not a tional course, a follow-on course that gives students research methods course; rather, it is more geared experience with more data analysis as well as prob- towards learning the important concepts and techni- ability and information theory, an introductory ma- cal skills surrounding recent advances in NLP. Oth- chine learning course, and a course on databases. ers have attempted to gear the course content and The courses largely use Python as the programming delivery towards research (Freedman, 2008) giving language of choice. the students the opportunity to have open-ended as- Computer Science In parallel to the changes signments. I may consider this for future offerings, in Data Science-related courses, the Department but for now the final project acts as an open-ended of Computer Science has seen increased enroll- assignment, though I don’t require students to read ment and increased request for machine learning- and understand recent research papers. related courses. The department offers several In the following section, I explain how we pre- 1 pare students of diverse backgrounds to succeed in We do not have a Data Science department or major de- gree program; the Data Science courses are housed in different an NLP course for upper-division undergraduate departments across campus. 2 and graduate students. http://data8.org/ 116
Figure 2: Two course paths that prepare students to succeed in the NLP course: Data Science and Computer Sci- ence. Solid arrows denote pre-requisites, dashed arrows denote co-requisites. Solid outlines denote Data Science and Computer Science courses, dashed outlines denote Math courses. courses, though they focus on upper-division stu- learning course. Figure 2 depicts the two course dents (e.g., artificial intelligence, applied deep paths visually. learning, information retrieval and recommender systems). This is a challenge because the main 4 NLP Course Content Computer Science curriculum focuses on procedu- In this section, I discuss course content including ral languages such as Java with little or no expo- topics and assignments that are designed to meet sure to Python (similar to the student population the course objectives listed above. Woven into the reported in Freedman (2008)), forcing the upper- topics and assignments are the themes of ambiguity division machine learning-related courses to spend and limitations, explained below. time teaching Python to the students. This cannot be underestimated–programming languages may 4.1 Topics & Assignments not be natural languages, but they are languages Theme of Ambiguity Figure 3 shows the topics in that it takes time to learn them, particularly rele- (solid outlines) that roughly translate to a single vant Python libraries. To help meet this challenge, lecture, though some topics require multiple lec- the department has recently introduced a Machine tures. One main theme that is repeated through- Learning Emphasis that makes use of the Data Sci- out the course, but is not a specific lecture topic ence courses mentioned above, along with addi- is ambiguity. This helps the students understand tional requirements for math and upper-division differences between natural human languages and Computer Science courses. programming languages. The Introduction to Lin- guistics topic, for example, gives a (very) high- Preqrequisites Unlike Agarwal (2013) which at- level overviews of phonetics, morphology, syntax, tempted and introductory course for all levels of semantics, and pragmatics, with examples of am- students, this course is designed for upper-division biguity for each area of linguistics (e.g., phonetic students or graduate students. Students of either ambiguity is illustrated by hearing someone say it’s discipline, Computer Science or Data Science, are hard to recognize speech but it could be heard as prepared for this NLP course, albeit in different it’s hard to wreck a nice beach, and syntactic ambi- ways. Computer Scientists can think computation- guity is illustrated by the sentence I saw the person ally, have experience with a syntactic formalism, with the glasses having more than one syntactic and have experience with statistics. Data Science parse). students likewise have experience with statistics as well as Python, including data analysis skills. Probability and Information Theory This Though the backgrounds can be quite diverse, my course does not focus only on deep learning, NLP course allows two prerequisite paths: all stu- though many university NLP offerings seem to dents must take a statistics course, but Computer be moving to deep-learning only courses. There Science students must take a Programming Lan- are several reasons not to focus on deep learning guages course (which covers context free grammars for a university like Boise State. First, students for parsing computer languages and now covers will not have a depth of background in probabil- some Python programming), and the Data Science ity and information theory, nor will they have a students must have taken the introductory machine deep understanding of optimization (both convex 117
and non-convex) or error functions in neural net- Syntax The Syntax & Parsing assignment also works (e.g., cross entropy). I take time early in the deserves mention. The students use any parser in course to explain discrete and continuous probabil- NLTK to parse a context free grammar of a fictional ity, and information theory. Discrete probability language with a limited vocabulary.3 This helps theory is straight forward as it requires counting, the students think about structure of language, and something that is intuitive when working with lan- while there are other important ways to think about guage data represented as text strings. Continuous syntax such as dependencies (which we discuss in probability theory, I have found, is more difficult the course), another reason for this assignment is for students to grasp as it relates to machine learn- to have the students write grammars for a small ing or NLP, but building on students’ understand- vocabulary of words in a language they don’t know, ing of discrete probability theory seems to work but also to create a non-lexicalized version of the pedagogically. For example, if we use continuous grammar based on parts of speech, which helps data and try somehow to count values in that data, them understand coverage and syntactic ambiguity it’s not clear what should be counted (e.g., using more concretely.4 There is no machine learning binning), highlighting the importance of continu- or estimating a probabilistic grammar here, just ous probability functions that fit around the data, parsing. and the importance of estimating parameters for those continuous functions–an important concept Semantics An important aspect of my NLP class for understanding classifiers later in the course. To is semantics. I introduce them briefly to formal illustrate both discrete and continuous probability, semantics (e.g., first-order logic), WordNet (Miller, I show students how to program a discrete Naive 1995), distributional semantics, and grounded se- Bayes classifier (using ham/spam email classifica- mantics. We discuss the merits of representing tion as a task) and a continuous Gaussian Native language “meaning" as embeddings and the limi- Bayes classifier (using the well-known iris data set) tations of meaning representations trained only on from scratch. Both classifiers have similarities, but text and how they might be missing important se- the differences illustrate how continuous classifiers mantic knowledge (Bender and Koller, 2020). The learn parameters. Topic Modeling assignment uses word-level em- beddings (e.g., word2vec (Mikolov et al., 2013) and GloVe (Pennington et al., 2014)) to represent Sequential Thinking Students experience prob- texts and gives them an opportunity to begin us- ability and information theory in a targeted and ing a deep learning library (tensorflow & keras or highly-scaffolded assignment. They then extend pytorch). their knowledge and program, from scratch, a part- We then consider how a semantic representation of-speech tagger using counting to estimate proba- that has knowledge of modalities beyond text, i.e., bilities modeled as Hidden Markov Models. These images is part of human knowledge (e.g., what is models seem old-fashioned, but it helps students the meaning of the word red?), and how recent gain experience beyond the standard machine learn- work is moving in this direction. Two assignments ing workflow of mapping many-to-one (i.e., fea- give the students a deeper understanding of these tures to a distribution over classes) because this is ideas. The transfer learning assignment requires a many-to-many sequential task (i.e., many words the students to use convolutional neural networks to many parts-of-speech), an important concept to pre-trained on image data to represent objects in understand when working with sequential data like images and train a classifiers to identify simple ob- language. It also helps students understand that ject types, tying images to words. This is extended NLP often goes beyond just fitting “models" be- in the Grounded Semantics assignment where a cause it requires things like building a trellis and binary classifier (based on the words-as-classifiers decoding a trellis (undergraduate students are re- model introduced in Kennington and Schlangen quired to program a greedy decoder; graduates are (2015), then extended to work with images with required to program a Viterbi decoder). This is a "real" objects in Schlangen et al. (2016)) is trained challenging assignment for most students, irrespec- 3 tive of their technical background, but grasping the I opt for Tamarian: https://en.wikipedia.org/ wiki/Darmok concepts of this assignment helps them grasp more 4 I have received feedback from multiple students that this difficult concepts that follow. was their favorite assignment. 118
Figure 3: Course Topics and Assignments. Colors cluster topics (green=probability and information theory, blue=NLP libraries, red=neural networks, purple=linguistics, yellow=assignments). Courses have solid lines; topic dependencies are illustrated using arrows. Assignments have dashed lines. Assignments are indexed with the topics on which they depend. for all words in referring expressions to objects in like how neurons in neural networks “mimic" real images in the MSCOCO dataset annotated with re- neurons in human brains, something that is very ferring expressions to objects in images (Mao et al., far from true, though certainly the idea of neural 2016). Both assignments require ample scaffolding networks is inspired from human biology. For my to help guide the students in using the libraries and students, we progress from linear regression to lo- datasets, and the MSCOCO dataset is much bigger gistic regression (illustrating how parameters are than they are used to, giving them more real-world being fit and how gradient descent is different from experience with a larger dataset. directly estimating parameters in continuous prob- ability functions; i.e., maximum likelihood estima- Deep Learning An understanding of deep learn- tion vs convex and non-convex optimization), build- ing is obviously important for recent NLP re- ing towards small neural architectures and feed- searchers and practitioners. One constant challenge forward networks. We also cover convolutional is determining what level of abstraction to present neural networks (for transfer learning and grounded neural networks (should students know what is hap- semantics), attention (Vaswani et al., 2017), and pening at the level of the underlying linear alge- transformers including transformer-based language bra, or is a conceptual understanding of parame- models like BERT (Devlin et al., 2018), and how ter fitting in the neurons enough?). Furthermore, to make use of them; understanding how they are deep learning as a topic requires and understand- trained, but then only assigning fine-tuning for stu- ing of its limitations and at least to some degree dents to experience directly. I focus on smaller how it works “under the hood" (learning just how datasets and fine-tuning so students can train and to use deep learning libraries without understand- tune models on their own machines. ing how they work and how they “learn" from the data is akin to giving someone a car to drive with- Final Project There is a "final project" require- out teaching them how to use it safely). This also ment. Students can work solo, or in a group of means explaining some common misconceptions up to three students. The project can be anything 119
NLP related, but projects generally are realized as learn to critically read popular articles about NLP. using an NLP or machine/deep learning library to Given an article, they need to summarize the article, train on some specific task, but others include meth- then scrutinize the sources using the following as ods for data collection (a topic we don’t cover in a guide: can they (1) access the primary source, class specifically, but some students have interest such as original published paper, (2) assess if the in the data collection process for certain settings claims in the article relate to what’s claimed by the like second language acquisition), as well as in- primary source, (3) determine if experimental work terfaces that they evaluate with some real human was involved or if the article is simply offering users. Scoping the projects is always the biggest conjecture based on current trends, and (4) if the challenge as many students initially envision very article did not carry out an evaluation, offer ideas ambitious projects (e.g., build an end-to-end chat- on what kind of evaluation would be approrpriate bot from scratch). I ask students to consider how to substantiate any claims made by the article’s au- much effort it would take to do three assignments thor(s). Then students should relate the headline of and use that as a point of comparison. Throughout the article to the main text and determine if reading the first half of the semester students can ask for the headline provides an abstract understanding of feedback on project ideas, and at the halfway point the article’s contents, and determine to what extent in the semester, students are required to submit a the author identified limitations to the NLP technol- short proposal that outlines the scope and timeline ogy they were reporting on, what someone without for their project. They have the second half of the training in NLP might take away from the article, semester to then work through the project (with and if the authors identified the people who might a "checkpoint" part way through to inform me of be affected (negatively or positively) by the NLP progress and needed adjustments), then they write technology. This assignment gives students expe- a project report on their work at the end of the rience in recognizing the gap between the reality semester with evaluations and analayses of their of NLP technology, how it is perceived by others, work. Graduate students must write a longer re- whom it affects, and its limitations. port than the undergraduate students, and graduate We dedicate an entire lecture to ethics, and stu- students are required to give a 10-minute presen- dents are also asked to consider the implications of tation on their project. The timeline here is crit- their final projects, what their work can and cannot ical: the midway point for beginning the project reasonably do, and who might be affected by their allows students to have experience with classifica- work.6 tion and NLP tasks, but have enough time to make adjustments as they work on their project. For ex- Discussion Striking a balance between content ample, students students attempt to apply BERT on probability and information theory, linguistics, fine-tuning after the BERT assignment even though and machine learning is challenging for a single it wasn’t in their original project proposal. course, but given the diverse student population at a public state school, this approach seems to Theme of Limitations As is the case with am- work for the students. An NLP class should have at biguity, limitations is theme in the course: lim- least some content about linguistics, and framing itations of using probability theory on language aspects of linguistics in terms of ambiguity gives phenomena, limitations on datasets, and limitations students the tools to think about how much they on machine learning models. The theme of limi- experience ambiguity on a daily basis, and the fact tations ties into an overarching ethical discussion that if language were not ambiguous, data-driven that happens at intervals throughout the semester NLP would be much easier (or even unnecessary). about what can reasonably be expected from NLP The discussions about syntax and semantics are technology and whom it affects as more practical especially important as many have not considered models are deployed commercially. (particularly those who have not learned a foreign The final assignment critical reading of the pop- 6 ular press is based on a course under the same Approaching ethics from a framework of limitations I think helps students who might otherwise be put off by a title taught by Emily Bender at the University of discussion on ethics because it can clearly be demonstrated Washington.5 The goal of the assignment is to that NLP models have not just technical limitations and those limitations have implications for real people; an important 5 The original assignment definition appears to no longer message from this class is that all NLP models have limita- be public. tions. 120
language) how much they take for granted when it for many languages. All code I write or show in comes to understanding and producing language, class is accessible to the students throughout the both speech and written text. The discussions on semester so they can refer back to the code exam- how to represent meaning computationally (sym- ples for assignments and projects. This of course bolic strings? classifiers? embeddings? graphs?) means that students only obtain a fairly shallow and how a model should arrive at those representa- experience for any library; the goal is to show them tions (using speech? text? images?) is rewarding enough examples and give them enough experience for the students. While most of the assignments and in assignments to make sense of public documen- examples focus on English, examples of linguistic tation and other code examples that they might phenomena are often shown from other languages encounter. (e.g., Japanese morphology and German declen- The course uses two books, both which are avail- sion) and the students are encouraged to work on able free online, the NLTK book,7 and an ongoing other languages for their final project. draft of Jurafsky and Martin’s upcoming 3rd edi- Assignments vary in scope and scaffolding. For tion.8 The first assignment (Python & Jupyter in the probability and information theory and BERT Figure 3) is an easy, but important assignment: I assignments, I provide a fairly well-scaffolded tem- ask the students to go through Chapter 1 and parts plate that the students fill in, whereas most other of Chapter 4 of the NLTK book and for all code ex- assignments are more open-ended, each with a set amples, write them by hand into a Jupyter notebook of reflection and analysis questions. (i.e., no copy and pasting). This ensures that their programming environments are setup, steps them 4.2 Content Delivery through how NLTK works, gives them immediate exposure to common NLP tasks like concordance Class sizes vary between 35-45 students. Class and stemming, and gives them a way to practice content is presented largely either as presentation Python syntax in the context of a Jupyter notebook. slides or live programming using Jupyter note- Another part of the assignment asks them to look books. Slides introduce concepts, explain things at some Jupyter notebooks that use tokenization, outside of code (e.g., linguistics and ambiguity or counters, stop words, and n-grams, and asks them graphical models), but most concepts have concrete questions about best practices for authoring note- examples using working code. The students see books (including formatted comments).9 code for Naive Bayes (both discriminative and con- Students can use cloud-based Jupyter servers tiniuous) classifiers, I use Python code to explain for doing their assignments (e.g., Google colab), probability and information theory, classification but all must be able to run notebooks on a local tasks such as spam filtering, name classification, machine and spend time learning about Python topic modeling, parsing, loading and prepossess- environments (i.e., anaconda). Assignments are ing datasets, linear and logistic regression, senti- submitted and graded using okpy which renders ment classification, an implementation of neural notebooks and allows instructors to assign grading networks from scratch as well as popular libraries. to themselves or teaching assistants, and students While we use NLTK for much of the instruc- can see their grades and written feedback for each tion following in some ways what is outlined in assignment.10 Bird et al. (2008), we also look at supported NLP Python libraries including textblob, flair (Akbik 4.3 Adjustments for Remote Learning et al., 2019), spacy, stanza (Qi et al., 2020), scik- This course was relatively straightforward to adjust itlearn (Pedregosa et al., 2011), tensorflow (Abadi for remote delivery. The course website and okpy et al., 2016) and keras (Chollet et al., 2015), py- (for assignment submissions) are available to the torch (Paszke et al., 2019), and huggingface (Wolf students at all times. I decided to record lectures et al., 2020). Others are useful, but most libraries live (using Zoom) then make them available with help students use existing tools for standard NLP 7 http://www.nltk.org/book_1ed/ pre-processing like tokenization, sentence segmen- 8 https://web.stanford.edu/~jurafsky/ tation, stemming or lemmatization, part-of-speech slp3/ 9 tagging, and many have existing models for com- I use the notebooks listed here for this part of the assignment https://github.com/bonzanini/ mon NLP tasks like sentiment classification and nlp-tutorial 10 machine translation. The stanza library has models https://okpy.org/ 121
semester Python Ling ML DL Spring 2019 Between 2017 and 2019, several Spring 2017 none none 10% none important papers showing how transformer net- Spring 2019 50% 30% 30% 10% works can be used for robust language modeling Spring 2021 80% 30% 60% 20% were gaining in momentum, resulting in a shift to- wards altering and understanding their limitations Table 1: Impressions of preparedness for Python, Lin- (so called BERTology, see Rogers et al. (2020) for guistics (Ling), Machine Learning (ML), and Deep Learning (DL). a primer). This, along with the fact that changes in the curriculum gave students better experience with Python, caused me to shift focus from gen- transcripts to the students. This course has one erative models to neural architectures in NLP and midterm, a programming assignment that is similar to shift to cover word-level embeddings more rig- in structure to the regular assignments. During orously. I spent the second half of the semester an in-person semester, there would normally be a introducing neural networks (including multi-layer written final, but I opted to make the final be part perceptrons, convolutional, and recurrant architec- of their final project grade. tures) and giving students assignments to give them practice in tensorflow and keras. After the 2017 course, I changed the pre-requisite structure to re- 5 Reflection on Three Offerings over 4 quire our programming languages course instead of years data structures. This led to greater pareparedness in at least the syntax aspect of linguistics. Due to department constraints on offering required courses vs. elective courses (NLP is elective), I Spring 2021 In this iteration, I shifted focus from am only able to offer the NLP course in the Spring recurrant to attention/transformer-based models semester of odd years; i.e., every 4 semesters. The and assignmed a BERT fine-tuning assignment on a course is very popular, as enrollment is always novel dataset using huggingface. I also introduced over the standard class size (35 students). Below pytorch as another option for a neural network li- I reflect on changes that have taken place in the brary (I also spend time on tensorflow and keras). course due to the constant and rapid change in the This shift reflects a shift in my own research and field of NLP, in our undergraduate curriculum, and understanding of the larger field, though exposure the implications those changes had on the course. to each library is only partial and somewhat ab- These reflections are summarized in Table 1. As stract. I note that students who have a data science I am, to my knowledge, the first NLP researcher background will likely appreciate tensorflow and at Boise State University, I had to largely develop keras more as they are not as object-oriented than the contents of the course on my own, requiring pytorch, which seems to be more geared towards adjustments over time as I better understand student students with Computer Science backgrounds. Stu- preparedness. At this point, despite the two paths dents can choose which one they will use (if any) into the course, most students who take the course for their final projects. More students are gaining are still Computer Science students. interest in machine learning and deep learning and are turning to MOOC courses or online tutorials, Spring 2017 The first time I taught the course, which has led in some degree to better prepara- only a small percentage of the students had expe- tion for the NLP course, but often students have rience with Python. The only developed Python little understanding about the limitations of ma- library for NLP was NLTK, so that and scikit-learn chine learning and deep learning after completing were the focus of practical instruction. I spent the those courses and tutorials. Moreover, students first three weeks of the course helping students gain from our university have started an Artificial In- experience with Python (including Jupyter, numpy, telligence Club (the club started in 2019; I am the pandas) then used Python as a means to help them faculty advisor), which has given the students guid- understand probability and information theory. The ance on courses, topics, and skills that are required course focused on generative classification includ- for practical machine learning. Many of the NLP ing statistical n-gram language modeling with some class students are already members of the AI Club, exposure to discriminative models, but no exposure and the club has members from many academic to neural networks. disciplines. 122
5.1 Student Feedback Much of my course materials including note- I reached out to former students who took the class books, slides, topics, and assignments can be found to ask for feedback on the course. Specifically, I on a public Trello board.12 asked if they use the skills and concepts from the NLP class directly for their work, or if the skills Acknowledgements and concepts transferred in any way to their work. I am very thankful for the anonymous reviewers for Student responses varied, but some answered that giving excellent feedback and suggestions on how they use NLP directly (e.g., to analyze customer to improve this document. I hope that it followed feedback or error logs), while most responded that those suggestions adequately. they use many of the Python libraries we covered in class for other things that aren’t necessarily NLP related, but more geared towards Data Science. For References several students, using NLP tools helped them in Martín Abadi, Paul Barham, Jianmin Chen, Zhifeng research projects that led to publications. Chen, Andy Davis, Jeffrey Dean, Matthieu Devin, Sanjay Ghemawat, Geoffrey Irving, Michael Isard, 6 Conclusions & Open Questions et al. 2016. Tensorflow: A system for large-scale machine learning. In 12th {USENIX} symposium With each offering, the NLP course at Boise State on operating systems design and implementation University is better suited pedagogically for stu- ({OSDI} 16), pages 265–283. dents with some Data Science or Computer Science training, and the content reflects ongoing changes Apoorv Agarwal. 2013. Teaching the basics of NLP and ML in an introductory course to information sci- in the field of NLP to ensure their preparation. The ence. In Proceedings of the Fourth Workshop on topics and assignments cover a wide range, but as Teaching NLP and CL, pages 77–84, Sofia, Bulgaria. students have become better prepared with Python Association for Computational Linguistics. (by the introduction of new prerequisite courses that cover Python as well as changing some courses Alan Akbik, Tanja Bergmann, Duncan Blythe, Kashif Rasul, Stefan Schweter, and Roland Vollgraf. 2019. to include assignments in Python), more focus is Flair: An easy-to-use framework for state-of-the-art spent on topics that are more directly related to nlp. In NAACL 2019, 2019 Annual Conference of NLP. the North American Chapter of the Association for Though I feel it important to stay abreast of the Computational Linguistics (Demonstrations), pages 54–59. ongoing changes in NLP and help students gain the knowledge and skills needed to be successful Emily M Bender and Alexander Koller. 2020. Climb- in NLP, an open question is what changes need to ing towards NLU: On Meaning, Form, and Under- be made, and a related question is how soon. For standing in the Age of Data. In Association for Com- putational Linguistics, pages 5185–5198. example, I think at this point it is clear that neu- ral networks are essential for NLP, though it isn’t Steven Bird, Ewan Klein, Edward Loper, and Jason always clear what architectures should be taught Baldridge. 2008. Multidisciplinary instruction with (e.g., should we still cover recurrant neural net- the natural language toolkit. In Proceedings of works or jump directly to transformers?). It seems the Third Workshop on Issues in Teaching Compu- tational Linguistics, pages 62–70, Columbus, Ohio. important to cover even new topics sooner than Association for Computational Linguistics. later, though a course that is focused on research methods might be more concerned with staying up- François Chollet et al. 2015. Keras. https:// to-date with the field, whereas a course that is more keras.io. focused on general concepts and skills should wait Jacob Devlin, Ming-Wei Chang, Kenton Lee, and for accessible implementations (e.g., huggingface Kristina Toutanova. 2018. BERT: Pre-training of for transformers) before covering those topics. Deep Bidirectional Transformers for Language Un- With recordings and updated content, I hope derstanding. arXiv. to flip the classroom in the future by assigning taught the course, I flipped it and used class time for students readings and watching lectures before class, then to work on assignments, which gave me a better indication of use class time for working on assignments.11 how students were understanding the material, time for one- on-one interactions, and fewer issues with potential cheating. 11 12 This worked well for the Foundations of Data Science https://trello.com/b/mRcVsOvI/ course that I introduced to the university; the second time I boise-state-nlp 123
Eric Fosler-Lussier. 2008. Strategies for teaching cessing Systems 32, pages 8024–8035. Curran Asso- “mixed” computational linguistics classes. In Pro- ciates, Inc. ceedings of the Third Workshop on Issues in Teach- ing Computational Linguistics, pages 36–44, Colum- F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, bus, Ohio. Association for Computational Linguis- B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer, tics. R. Weiss, V. Dubourg, J. Vanderplas, A. Passos, D. Cournapeau, M. Brucher, M. Perrot, and E. Duch- Reva Freedman. 2008. Teaching NLP to computer esnay. 2011. Scikit-learn: Machine learning in science majors via applications and experiments. Python. Journal of Machine Learning Research, In Proceedings of the Third Workshop on Issues 12:2825–2830. in Teaching Computational Linguistics, pages 114– 119, Columbus, Ohio. Association for Computa- Jeffrey Pennington, Richard Socher, and Christopher tional Linguistics. Manning. 2014. Glove: Global Vectors for Word Representation. In Proceedings of the 2014 Con- C. Kennington and D. Schlangen. 2015. Simple learn- ference on Empirical Methods in Natural Language ing and compositional application of perceptually Processing (EMNLP), pages 1532–1543. groundedword meanings for incremental reference resolution. In ACL-IJCNLP 2015 - 53rd Annual Peng Qi, Yuhao Zhang, Yuhui Zhang, Jason Bolton, Meeting of the Association for Computational Lin- and Christopher D Manning. 2020. Stanza: guistics and the 7th International Joint Conference A python natural language processing toolkit on Natural Language Processing of the Asian Feder- for many human languages. arXiv preprint ation of Natural Language Processing, Proceedings arXiv:2003.07082. of the Conference, volume 1. Anna Rogers, Olga Kovaleva, and Anna Rumshisky. Junhua Mao, Jonathan Huang, Alexander Toshev, Oana 2020. A Primer in BERTology: What we know Camburu, Alan L Yuille, and Kevin Murphy. 2016. about how BERT works. arXiv. Generation and comprehension of unambiguous ob- David Schlangen, Sina Zarriess, and Casey Kenning- ject descriptions. In Proceedings of the IEEE con- ton. 2016. Resolving References to Objects in Pho- ference on computer vision and pattern recognition, tographs using the Words-As-Classifiers Model. In pages 11–20. Proceedings of the 54th Annual Meeting of the Asso- Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg Cor- ciation for Computational Linguistics, pages 1213– rado, and Jeffrey Dean. 2013. Distributed Represen- 1223. tations of Words and Phrases and their Composition- Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob ality. In Proceedings of NIPS. Uszkoreit, Llion Jones, Aidan N Gomez, Lukasz George A Miller. 1995. Wordnet: a lexical database for Kaiser, and Illia Polosukhin. 2017. Attention is all english. Communications of the ACM, 38(11):39– you need. arXiv preprint arXiv:1706.03762. 41. Thomas Wolf, Lysandre Debut, Victor Sanh, Julien Adam Paszke, Sam Gross, Francisco Massa, Adam Chaumond, Clement Delangue, Anthony Moi, Pier- Lerer, James Bradbury, Gregory Chanan, Trevor ric Cistac, Tim Rault, Rémi Louf, Morgan Funtow- Killeen, Zeming Lin, Natalia Gimelshein, Luca icz, Joe Davison, Sam Shleifer, Patrick von Platen, Antiga, Alban Desmaison, Andreas Kopf, Edward Clara Ma, Yacine Jernite, Julien Plu, Canwen Xu, Yang, Zachary DeVito, Martin Raison, Alykhan Te- Teven Le Scao, Sylvain Gugger, Mariama Drame, jani, Sasank Chilamkurthy, Benoit Steiner, Lu Fang, Quentin Lhoest, and Alexander M. Rush. 2020. Junjie Bai, and Soumith Chintala. 2019. Py- Transformers: State-of-the-art natural language pro- torch: An imperative style, high-performance deep cessing. In Proceedings of the 2020 Conference on learning library. In H. Wallach, H. Larochelle, Empirical Methods in Natural Language Processing: A. Beygelzimer, F. d'Alché-Buc, E. Fox, and R. Gar- System Demonstrations, pages 38–45, Online. Asso- nett, editors, Advances in Neural Information Pro- ciation for Computational Linguistics. 124
You can also read