Attacks against Ranking Algorithms with Text Embeddings: a Case Study on Recruitment Algorithms
←
→
Page content transcription
If your browser does not render page correctly, please read the page content below
Attacks against Ranking Algorithms with Text Embeddings: a Case Study on Recruitment Algorithms Anahita Samadi1 , Debapriya Banerjee2 , and Shirin Nilizadeh3 1,2,3 University of Texas at Arlington Email ids: anahita.samadi@mavs.uta.edu, 2 debapriya.banerjee2@mavs.uta.edu, 3 shirin.nilizadeh@uta.edu 1 Abstract meaning of new or existing words by changing their locations in the embedding space. In this Recently, some studies have shown that text work, however, we investigate a new type of attack, arXiv:2108.05490v1 [cs.CL] 12 Aug 2021 classification tasks are vulnerable to poisoning and evasion attacks. However, little work has i.e., rank attack, when the text application utilizes investigated attacks against decision-making text embedding approaches. In this attack, the ad- algorithms that use text embeddings, and their versary does not poison the training corpora but output is a ranking. In this paper, we fo- tries to learn about the embedding space, and how cus on ranking algorithms for recruitment pro- adding some keywords to a document can change cess, that employ text embeddings for rank- the representation vector of it, and based on that ing applicants’ resumes when compared to a tries to improve the ranking of the adversarial text job description. We demonstrate both white- document among a collection of documents. box and black-box attacks that identify text items, that based on their location in embed- As a case study, we focus on ranking algorithms ding space, have significant contribution in in a recruitment process scenario. Recruitment increasing the similarity score between a re- process is a key to find a suitable candidate for a sume and a job description. The adversary job application. then uses these text items to improve the rank- Nowadays, companies use ML-based ap- ing of their resume among others. We tested proaches to rank resumes from a pool of candidates recruitment algorithms that use the similar- (Sumathi and Manivannan, 2020; Roy et al., ity scores obtained from Universal Sentence Encoder (USE) and Term Frequency–Inverse 2020). Document Frequency (TF-IDF) vectors. Our One naive approach for boosting the ranking results show that in both adversarial settings, of a resume can be adding the most words and on average the attacker is successful. We also phrases to his/her resume from the job description. found that attacks against TF-IDF is more suc- However, this is not the best approach all the time, cessful compared to USE. because: First, the attacker must add a specific meaningful word to their resume. For example, 1 Introduction they cannot claim they are proficient in some skills Recently some studies have shown that text classi- while they are not. Second, adding any random fication tasks are vulnerable to poisoning and eva- keyword from the job description to the resume sion attacks (Liang et al., 2017; Li et al., 2018; not always increases the similarity between the re- Gao et al., 2018; Grosse et al., 2017). For ex- sume and job description. Instead, we demonstrate ample, some works have shown that an adversary that the adversary can learn about the influential can fool toxic content detection (Li et al., 2018), words and phrases that can increase the similarity spam detection (Gao et al., 2018) and malware de- score and use this knowledge and decide about the tection (Grosse et al., 2017) by modifying some keywords or phrases to be added to their resume. text items in the adversarial examples. A recent Therefore, in this work, we investigate: (1) How work (Schuster et al., 2020) showed that applica- can an adversary utilize the text embedding space tions that rely on word embeddings are vulnerable to extract words and phrases that have a higher to poisoning attacks, where an attacker can mod- impact on the ranking of a specific document (here ify the corpus that the embedding is trained on, resume)? and, (2) How can an adversary, with no i.e., Wikipedia and Twitter posts, and modify the knowledge about the ranking algorithm , modify a
Recruitment Algorithm Machine Learning to have more efficiency in rank- Vectorizing all resumes and job description using Text ing resumes.A successful approach to rank resumes embeddings is calculating similarity among resume and job de- Resumes Computing cosine similarity scription leveraging NLP techniques (Excellarate). between each resume and job In this study, recruitment algorithm is based on description Job a ranking algorithm. This algorithm takes input description Sorting resumes based on their cosine similarity scores from the resume and the job description and finds similarities based on their matching score. We use universal sentence encoder (USE) as the text em- Figure 1: Ranking algorithm for recruitment bedding approach to vectorize each resume and the job descriptions. As it is shown in Figure 1, our text document (here resume) to improve its ranking algorithm includes three steps: among a set of documents? (1) all resumes and the job description are vec- While we focus on the recruitment applica- torized using USE text embedding approach; tion, the same approaches can be employed on (2) the similarity between each job description other ranking algorithms that use text embeddings and an individual resume is computed. Cosine sim- for measuring the similarity between text docu- ilarity is used as a metric to compute the similarity ments and ranking them. For example, such rank- score; and ing algorithms are applied as part of summariza- (3) the resumes are sorted based on their simi- tion (Rossiello et al., 2017; Ng and Abrecht, 2015) larity scores computed in the second step in such a and question and answering systems (Bordes et al., way that the resume with highest similarity score 2014; Zhou et al., 2015; Esposito et al., 2020). appears at the top of the list and on the other hand We consider both white-box and black-box set- the resume with least similarity score appears at tings, depending on the knowledge of adversary bottom. about the specific text embeddings approach that is Threat Model. Adversaries have a huge moti- used for the resume ranking. In white-box settings, vation to change the rankings provided by some al- we propose a novel approach which utilizes the text gorithms, when they are used for decision-making, embedding space to extract words and phrases that e.g., for recruitment purposes. Adjusting a resume influence the rankings significantly. based on the job description is a well-known ap- We consider both Universal Sentence Encoder proach for boosting the chance of being selected (USE) (Cer et al., 2018) and TF-IDF as the ap- for the next rounds of recruitment (Team, 2021). proaches for obtaining the word vectors. USE com- In this work, we show how an adversary can au- putes embeddings based on Transformer architec- tomatically generate adversarial examples specific ture (Wang et al., 2019a), and it captures contextual to a recruitment algorithm and text embeddings. information while on the other hand TF-IDF does We define this attack as a rank attack, where the not capture the contextual information. adversary adds some words or phrases to its docu- In the black-box setting, we propose a neural ment to improve its ranking among a collection of network based model that can identify the most documents. We consider white-box and black-box influential words/phrases without knowing the ex- settings. In a white-box setting, we assume the act text embedding approach that is used by the attacker has complete knowledge about the ranking ranking algorithm. algorithm. In a black-box setting, however, the at- tacker has no knowledge of the recruitment process 2 System and Threat Model but has limited access to the recruitment algorithm and can test some resumes against the algorithm. The recruitment process helps to find a suitable candidate for a job application. 3 Background Based on Glassdoor statistics, the average job opening attracts approximately 250 resumes USE Text Embedding. This embedding has been (Team) , and a recent survey used to solve tasks, such as semantic search, text found that the average cost per hire is just over classification, question answering (Rossiello et al., $4,000 (Bika) 2017; Ng and Abrecht, 2015; Bordes et al., 2014; To Limit work in progress many companies use Zhou et al., 2015; Esposito et al., 2020).
USE uses an encoder to convert given text to a White-box Setting fixed-length 512-dimensional vector. Phase 1: Finding the best job It has been shown that after embedding sen- description keywords tences, sentences that have closer meaning carry Phase 2: Sorting keywords based out higher cosine similarity (Tosik et al., 2018). on the resume We used USE pretrained model, which is trained on the STS (Semantic Textual Similarity) bench- Phase 3: Adding N keywords to the adversarial resume mark(Agirre) TF-IDF. TF-IDF or term frequency-inverse doc- Figure 2: White-box setting ument frequency is a widely used approach in in- formation retrieval and text mining. TF-IDF is computed as multiplication of TF, the frequency of 5 White-Box Setting and a Recruitment a word in a document, and IDF, the inverse of the Algorithm that Employs USE document frequency. In white-box settings, we assume the adversary has 4 Data Collection knowledge about the recruitment process and the use of universal sentence encoder (USE) or term We collected 100 real public applicant resumes frequency–inverse document frequency (TF-IDF) from LinkedIn public job seeker resumes, GitHub, for obtaining the embedding vectors. We propose a and personal websites. To have an equal chance for novel approach which utilizes the knowledge about applicants and make our experiences closer to real the text embedding space, here USE, to identify world recruitment procedures, we only considered keywords in the job description that can be added resumes related to computer science. Resumes to the adversarial resume and increase its similarity are chosen to be in different levels of education score with the job description. Our approach as (bachelor, master and Ph.D. with equal distribu- it is shown in Figure 2 consists of three phases: tion), skills, experiments (entry level, mid level, in Phase 1, the adversary tries to identify the im- senior level), and gender (around fifty percent men portant words in job description; in Phase 2, the and fifty percent women). adversary tries to rank the effectiveness of those We also developed a web scraper in python to words based on its own resume, i.e., identifying extract computer science jobs from the Indeed web- those words that can increase the similarity be- site.1 Our dataset includes over 10,000 job descrip- tween his/her resume and the job description. Af- tions, extracted randomly from cities in the USA. ter identifying the most effective words, then in We randomly chose 50 job descriptions and used Phase 3, the adversary modifies its resume based them in our experiments. on his/her expertise and skill set and decides to add For black-box settings, our neural network archi- N of those identified words and phrases. Note that tecture needs a huge amount of data to be trained in this attack scenario the adversary does not need on. To have enough training sets we augmented our to have any knowledge about the other candidates’ data for our models. For a simple setting model, resumes. The details of phase 1 and 2 in this at- we created 5,000 records by concatenating 100 re- tack are depicted in Algorithm 1 and Algorithm 2, sumes to 50 selected job description to augment our respectively, and we explain them in the following. data for recruitment algorithms that is not totally 5.1 Phase one: Identifying the Influential dependent to job description. For a more complex Keywords in the Job Description setting, we split resumes into half, then joined the upper part of each resume to other resumes’ lower In this phase, the adversary focuses on identifica- parts. With this approach, we could maintain the tion of the most important keywords from job de- structure of each resume and each resume could scription. Based on the use of cosine similarity be- have common resume information, such as skills, tween the vectors of resumes and job descriptions education, work experiment, etc. We did this pro- in the recruitment algorithm (depicted in Figure 1), cedure for all possible combinations and created a the highest similarity can be achieved if the resume database with 10,000 resumes. is the same as the job description. We propose to remove words/phrases from the job description and 1 https://www.indeed.com/ then examine its impact on the similarity score. A
Algorithm 1 Job Description Keywords Extraction for all tokens in the job description. This procedure Input: Job Description document provides a dictionary, where the keys are tokens, and the values are their corresponding cosine simi- Output: An list of keywords in ascending order larity scores. based on their similarity score (6) Sorting keywords: Finally, extracted key- 1: procedure P HASE 1(OrginalJob) 2: OrginalJob ← F ilterStopW ords(OrginalJob) words are sorted based on the similarity score in 3: T okens ← T okenize(OrginalJob) ascending order. 4: U SE Job ← U SE(OrginalJob) 5: N-gram phrases: Moreover, we extended the 6: T oken Sim ← {} 7: for T okeni ∈ T okens do code in Algorithm 1 so that it can also identify the 8: newJob ← DeleteT oken(JobDescription, T okeni ) 9: U SE newJob ← U SE(newJob) influential phrases, i.e., bigrams and trigrams. In 10: T oken Sim ← T oken Sim + that case, in each repetition, instead of one indi- {T okeni , CosSim(U SE Job , U SE newJob )} 11: end for vidual word, repeatedly 2 or 3 neighbor words are 12: removed from the job description and the similarity 13: return AscendingSortByV alue(T oken Sim) score is computed. substantial decrease in the similarity score when a 5.2 Phase two: Re-sorting the Influential specific word/phrase is removed demonstrates the Keywords based on a Specific Resume importance of the keyword in the word embedding space corresponding to this job description. In the The previous phase helps identify the words and white-box setting, the adversary is aware of the phrases in the job description that in USE embed- details of algorithms. Therefore, they employ USE ding space have a higher impact on providing a text embedding to obtain the vector and use cosine larger similarity score. However, each resume is similarity to calculate the similarity scores. unique and adding the most influential words ob- In addition, to examine the importance of tained from the job description might not have the phrases instead of individual words, the adversary same impact on all the resumes. In Phase 2, we can try to remove phrases with one word (uni- try to identify the best words and phrases that can gram), two words (bigram), three words (trigram), boost the similarity score between a specific re- etc., and then compute the similarity score. Algo- sume and job description. Algorithm 2 shows the rithm 1 shows the details of phase 1 which consists details of phase 2 which consists of the following of the following steps: steps: (1) Text pre-processing: the keywords of the job (1) Adding Keywords: A keyword from the list description are obtained as a bag of words, and the of fifty keywords obtained from phase 1 is added stop words are removed to lower the dimensional to the adversarial resume. space. (2) Obtaining the Embedding Vector: The em- (2) USE embedding: The embedding vector is bedding vector for the adversarial resume is ob- obtained for the original job description using uni- tained using USE. versal sentence encoder (USE). (3) Calculating the Similarity: The cosine sim- (3) Token removal: measures the importance of ilarity between adversarial and job description is each word in the job description. A single token computed. Higher cosine similarity expresses the is deleted from job description, and the new job fact that the deleted token caused an impressive description is created. change in the job description contents, therefore, it Next, the embedding vector for N ewJob is ob- might be an important keyword. tained by passing to USE. (4) Scoring keywords: (4) Repetition: These steps are repeated for all Cosine similarity is calculated to measure the fifty keywords. This procedure provides a dictio- similarity between two vectors U SE newJob and nary, where the keys are the fifty keywords, and U SE OrginalJob . the values are their corresponding cosine similarity In this regard, lower cosine similarity expresses scores. The list of keywords are sorted based on the fact that the deleted token has caused an impres- their cosine similarity scores. sive change in the job description content, therefore N-gram phrases: Algorithm 2 is also extended it might be an important keyword. to get the sorted list of bigrams or trigrams, and (5) Repetition: Steps three and four are repeated added to the adversarial resume.
Algorithm 2 Resorting the extracted job descrip- 5.4 Experimental Results tion keywords based on a resume Figure 3a shows the histogram of average rank im- Input: Job description document, Resume provements for 100 resumes and 50 random job document, A list of tokens descriptions. All resumes had rank improvement, and in most of them the rank improvement is sig- Output: An ordered list of keywords nificant. For example, on average an adversarial 1: procedure P HASE 2(JobDescription, Resume, OrderedT okens) resume moved up 16 positions in rank improve- 2: U SE JobDescription ← U SE(JobDescription) 3: Keyword Sim ← {} ment, about 6 resumes had an average of moving 4: up 28 ranking positions, and more than 65 resumes 5: for keywordi ∈ OrderedT okens do 6: adversarialResume ← AddT oken(Resume, keywordi ) moved up more than 10 ranking positions. 7: U SE advResume ← U SE(adversarialResume) 8: Keyword Sim ← Keyword Sim + Figure 3b shows the average rank improvement {Keywordi , CosSim(U SE advResume , U SE JobDescription )} based on the number of words or phrases (bigrams 9: end for 10: and trigrams) added to the adversarial resume. We 11: return SortByV alue(Keyword Sim) see similar trend when adding unigrams, bigrams, and trigrams, i.e., adding more words and phrases increases the average rank improvement. For exam- 5.3 Experimental Setup ple, while adding 2 bigrams improves the ranking of the adversarial resume on average by 6, adding 20 bigrams improves the ranking of the adversarial We implemented the rank attack in a white-box resume by 30, among 100 resumes. However, inter- setting and tested all combinations of 100 resumes estingly we see that adding too many words/phrases and 50 job descriptions. The attack is shown in might not have the same effect, e.g., adding 50 bi- Algorithm 3. For each job description, we first grams shows a rank improvement by only about 28. obtained the ranking of the original resumes. Then This shows there might be an optimal number of in each experiment, we assumed that a resume is words/phrases that can help ranking of a document. adversarial and therefore the best keywords for that Comparing the addition of unigrams, bigrams, specific resume are added to it. To investigate the and trigrams, we see that trigrams provide better impact of number of keywords on the ranking of rank improvement. For example, adding 10 uni- the resume, we tested with n ∈ (1, 2, 5, 10, 20, 50) grams, bigrams, and trigrams, we see a rank im- of keywords. We also repeated these experiments provement of 12, 21 and 25, respectively. This find- for bigram and trigram phrases. ing can be explained by USE being a context-aware embedding approach, which takes into account the Algorithm 3 White Box Adversarial Attack order of words in addition to their meaning. Input: Job Description documents, Resume 6 White-box Setting and a Recruitment documents Algorithm that Employs TF-IDF Output: Ranking We also investigate the effectiveness of the attacks 1: procedure W HITE B OX(Jobs, Resumes) in white-box setting, when the recruitment algo- 2: for jobi ∈ Jobs do 3: T okens ← P hase1(jobi ) rithm uses TF-IDF vectors to compute the simi- 4: for resumej ∈ Resumes do 5: Rank ← GetRanking(jobi , resumej , Resumes) larity between resumes and job descriptions. The 6: words ← attack is the same. The only difference is in the P hase2(JobDescription, Resume, T okens) 7: for n ∈ (1, 2, 5, 10, 20, 50) do recruitment algorithm. Since the adversary has the 8: AdvResume ← AddW ords(resumej , n, words) knowledge that the recruitment algorithm uses TF- 9: Resumes2 ← Resumes − resumej IDF vectors, then they compute TF-IDF vectors 10: Resumes2 ← Resumes2 + AdvResume 11: Ranknew ← Ranking(jobi , Resumes2) instead of USE vectors. 12: RankChangejobi ,resumej ,n ← Ranknew − Rank 13: end for 6.1 Experimental Setup 14: end for 15: end for For these experiments, we implemented the recruit- 16: ment algorithm that ranks resumes based on the 17: return RankRankChange similarity scores that are calculated between the TF-IDF vectors of resumes and the job description.
35 ● Unigram 10 Bigram Trigram ● 30 Number of Resumes 8 Average Rank Improvement ● 25 6 ● 20 ● ● 4 ● ● ● ● ● 15 ● ● ● ● ● ● ● ● ● 10 2 ● ● ● ● ● ● ● ● ● ● ● ● ● 5 0 0 1 2 3 4 5 6 7 8 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 1 2 5 10 20 50 Average Rank Improvement Number of Keywords (a) Histogram of average rank improvement (b) Average rank improvement Figure 3: Average rank improvement for 100 resumes and 50 job descriptions, in white-box setting when recruit- ment algorithm employs USE embeddings We implemented the rank attack and tested on by adding 20 keywords to attack the recruitment all combinations of the 100 resumes and 50 job algorithm that employs USE, we observe an aver- descriptions. For each job description, we first age rank improvement of 18, 29, and 33 for uni- obtained the ranking of the original resumes, and gram, bigram and trigram, respectively. However, then in each experiment, we assumed that a resume by adding 10 keywords to attack the recruitment al- is adversarial and therefore the best keywords for gorithm that employs TF-IDF, they are 48, 32, and that specific resume are added to it. To investigate 37 for unigram, bigram and trigram, respectively. the impact of number of keywords on the ranking of the resume, we tested with n ∈ (1, 2, 5, 10, 20, 50) 7 Black-Box Setting of keywords. We also repeated these experiments for bigram and trigram phrases. In the back-box setting, an adversary does not have access to the model specification but has access to the recruitment algorithm as an oracle machine, 6.2 Experimental Results which receives an input, e.g., here a resume, and Figure 4a shows the histogram of average rank im- then generates a response, e.g., here accept/ reject, provements for 100 resumes and 50 random job or the ranking among all the resumes in its pool. descriptions. Most resumes had rank improvement We assume the resume dataset is static, i.e., the and in most of them rank improvement was sig- dataset of resumes do not change during the attack, nificant. For example, on average, an adversarial and the adversary only modifies their input until it resume moved up about 25 ranking positions, and gets accepted or obtains a better ranking. more than 85 resumes moved up more than 10. We develop a neural network model to predict In Figure 4b, we investigated the effect of adding keywords that improve the ranking of a target re- bigram and trigram bags of words on average rank sume for a specific job. Therefore, the input vector improvement. Interestingly, in contrast to our re- of our neural network model is the resumes, and sults for recruitment algorithms that employ USE the output is a vector that demonstrate the best key- embedding, in these experiments we achieved bet- words. In our experiments, we examine attacks ter results for unigrams compared to bigrams and against two recruitment algorithms: a binary classi- trigrams. Note that TF-IDF approach is only based fier, which label a resume as accept or reject, and a on word similarity and does not consider the con- ranking algorithm that provides a ranking based on text. This might explain the results. No matter similarity scores between the job description and what n-gram is used, by increasing the number of the resumes, where they are vectorized by USE em- keywords, the ranks also improve. Comparing re- beddings. The attacks against both algorithms have sults in Figure 3b and Figure 4b also shows that it two phases: phase 1: pre-processing which pre- is easier for the adversary to attack a recruitment al- pares the data for the neural network, and phase 2: gorithm that employs TF-IDF compared to USE, as a neural network model. the rank improvement is larger for attacks against Phase 1: Pre-processing. To provide an ac- the the recruitment algorithm that employs TF-IDF, ceptable format for the neural network input, we specially in the case of unigrams. For example, applied one-hot encoding (Harris and Harris, 2012)
50 ● Unigram 6 ● ● ● Bigram Trigram Number of Resumes 5 ● ● ● ● ● ● 40 Average Rank Improvement 4 ● ● 30 3 ● ● ● ● ● ● 20 2 ● ● ● ● ● ● ● ● 10 1 ● ● ● ● ● ● ● ● ● ● 0 1 2 3 4 5 6 7 8 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 1 2 5 10 20 50 Average Rank Improvement Number of Keywords (a) Histogram of average rank improvement (b) Average rank improvement Figure 4: Average rank improvement in white-box setting when recruitment algorithm employs TF-IDF vectors following these steps: other classes. (1) Tokenization: The job description and re- For the optimization of loss function (train- sumes are tokenized and a dictionary of words is ing the neural network), we used stochastic created, where key is the token/word, and value is gradient descent-based optimization algorithm the frequency of the words in these documents. (Adam; (Kingma and Ba, 2014)). (2) Vectorization of Resumes: Tokens/words are For the regularization technique, to avoid over- encoded as a one-hot numeric array, where for each fitting (Goodfellow et al., 2016), we tested dropout word a binary column is created. As the result, with different rates [0 to 0.5]. a sparse matrix is returned, which represents the Dropout was applied for all hidden layers, and resumes in rows and the words in columns. If a our assessment showed that the dropout rate (0.1) resume includes some word, the entry for that row yielded better results. and column is 1, otherwise it is 0. 7.1 Experiments for the Binary Classifier Neural Network Architecture. Neural net- works have been applied in many real-world prob- This is a simpler model, where the recruitment pro- lems cess is defined as a binary classification algorithm, (Nielsen, 2015; Goodfellow et al., 2016; Has- and a resume is accepted based on some rules. In sanzadeh et al., 2020; Ghazvinian et al., 2021; our experiments, we defined simple rules , e.g., if Krizhevsky et al., 2012). We propose a deep neural python as a skill is in the resume. network architecture After tokenization of the resume and the job which consists of an input layer, three dense description, instead of generating the one-hot en- layers as hidden layers, and an output layer rep- coding for all the words obtained, we chose 20 of resenting the labels. In the output vector, ones the most frequent words of all resumes and job indicate an index of words in the dictionary that descriptions. This is to test with a low dimension adding them to a resume will increase the rank vector. We then concatenated the vectors of the of the resume. For the first two hidden layers we resume and job description. used rectified linear unit (ReLU) (Nair and Hinton, Creating the groundtruth dataset. 2010) as the activation function. models (Nair and We employ two steps: Hinton, 2010). Due to their unique formulations, (1) Xtrain: 5000 records of 40-dimension vec- ReLUs provide faster training and better conver- tors, each vector is a resume that is concatenated to gence relative to other activation functions (Nair a job description and is coded by one-hot format. and Hinton, 2010). For the output layer we used sig- In Section 4 we explained the method for obtaining moid activation function to map the output to lie in these resumes. the range [0,1], i.e., actual probability values (Han (2) Ytrain: 5000 records of 20-dimension vector. and Moraga, 1995). As our problem was multil- If the value of a word index is set to one then abel problem, we applied binary cross-entropy loss. adding this word to resume makes the resume be In contrast to softmax cross entropy loss, binary accepted. cross-entropy is independent in terms of class, i.e., Model Training. To train our neural network the loss measured for every class is not affected by model, we split our data into train and test (70%
train set, 30% test set); to evaluate our training re- output label is also a encoded vector by one-hot sults, we used validation set approach (we allocated format. 30% of training set for validation) and trained on Model Training. To train our neural network 2,450 samples, validate on 1,050 samples. We also model, we split our data into train and test (70% set batch size equal to 50 and number of epochs train set, 30% test set). equal to 100. Our results showed that the model is trained well Our results shows that the model is trained well through epochs. After 10 epochs, the recall, preci- through epochs. After 50 epochs, the recall, preci- sion and F1-score of this model over the validation sion and F1-score of this model over the validation set reached their maximum values, i.e., 0.62, 0.68 set reached their maximum, which are 0.6, 0.7 and and 0.75, respectively. To test the performance of 0.82, respectively. our trained neural network model, we added the We also examined the performance of the trained 50 predicted words in the neural network to each neural network model on test data. For that, we related resume and submitted each resume to a added predicted words in neural network to each recruitment algorithm, and obtained the response related resume and submitted each resume to the from recruitment oracle, in the form of a ranking recruitment algorithm and obtained the response score. In Figure 6, we see most of the resumes from recruitment oracle in the form of a binary have a significant rank improvement. For example, value. Figure 6 shows that the success rate of more than 200 resumes had a rank improvement of getting accepted significantly increases by using more than 400, while more than 400 resumes had suggested keywords. While without adding the an rank improvement between 150 and 200 after keywords, the acceptance rate was 20%, after the adding the suggested keywords. attack the acceptance rate increased to about 100%. 400 NEWrank 1400 OLDrank 1200 number of resumes 300 Number of resumes 1000 800 200 600 400 100 200 0 acceptance reject 0 statistics
black-box attacks. Daniel Cer, Yinfei Yang, Sheng-yi Kong, Nan Hua, A recent work (Cheng et al., 2020) attacked a Nicole Limtiaco, Rhomni St John, Noah Constant, Mario Guajardo-Cespedes, Steve Yuan, Chris Tar, text summarization, that is based on seq2seq mod- et al. 2018. Universal sentence encoder for english. els. In our paper, however, we propose crafting an In Proceedings of the 2018 Conference on Empirical adversarial text for ranking algorithms. Attacks Methods in Natural Language Processing: System against resume search. A recent work (Schuster Demonstrations, pages 169–174. et al., 2020) showed that applications that rely on Minhao Cheng, Jinfeng Yi, Pin-Yu Chen, Huan Zhang, word embeddings are vulnerable to poisoning at- and Cho-Jui Hsieh. 2020. Seq2sick: Evaluating the tacks, where an attacker can modify the corpus that robustness of sequence-to-sequence models with ad- the embedding is trained on and modify the mean- versarial examples. In AAAI, pages 3601–3608. ing of new or existing words by changing their Javid Ebrahimi, Anyi Rao, Daniel Lowd, and Dejing locations in the embedding space. Dou. 2018. Hotflip: White-box adversarial exam- However, our attack is not about poisoning the ples for text classification. In Proceedings of the 56th Annual Meeting of the Association for Compu- corpus. tational Linguistics (Volume 2: Short Papers), pages In addition, 31–36. we focus on the recruitment process that em- Massimo Esposito, Emanuele Damiano, Aniello Minu- ploys cosine similarity for ranking applicants’ re- tolo, Giuseppe De Pietro, and Hamido Fujita. 2020. sumes compared to a job description. Hybrid query expansion using lexical resources and word embeddings for sentence retrieval in question 9 Conclusion answering. Information Sciences, 514:88–105. In this project we found that an automatic recruit- Excellarate. Resume Ranking using Machine Learn- ing. https://medium.com/@Excellarate/ ment algorithm, as an example of ranking algo- resume-ranking-using-machine-learning- rithms, is vulnerable against adversarial examples. implementation-47959a4e5d8e. We proposed a successful adversarial attack in Ji Gao, Jack Lanchantin, Mary Lou Soffa, and Yan- two settings: white-box and black-box. We pro- jun Qi. 2018. Black-box generation of adversarial posed a new approach for keyword extraction based text sequences to evade deep learning classifiers. In on USE. 2018 IEEE Security and Privacy Workshops (SPW), We observed the majority of resumes have sig- pages 50–56. IEEE. nificant rank improvements by adding more influ- Mohammadvaghef Ghazvinian, Yu Zhang, Dong-Jun ential keywords. Finally, in a black-box setting, we Seo, Minxue He, and Nelun Fernando. 2021. A proposed multilabel neural network architecture to novel hybrid artificial neural network - paramet- ric scheme for postprocessing medium-range pre- predict the proper keyword for each resume and cipitation forecasts. Advances in Water Resources, job description. 151:103907. Zhitao Gong, Wenlu Wang, Bo Li, Dawn Song, and Wei-Shinn Ku. 2018. Adversarial texts with gradient References methods. arXiv preprint arXiv:1801.07175. Eneko Agirre. STSbenchmark. https: Ian Goodfellow, Yoshua Bengio, Aaron Courville, and //ixa2.si.ehu.eus/stswiki/index.php/ Yoshua Bengio. 2016. Deep learning, volume 1. STSbenchmark. MIT press Cambridge. Moustafa Alzantot, Yash Sharma, Ahmed Elgohary, Kathrin Grosse, Nicolas Papernot, Praveen Manoharan, Bo-Jhang Ho, Mani Srivastava, and Kai-Wei Chang. Michael Backes, and Patrick McDaniel. 2017. Ad- 2018. Generating natural language adversarial ex- versarial examples for malware detection. In Euro- amples. arXiv preprint arXiv:1804.07998. pean Symposium on Research in Computer Security, pages 62–79. Springer. N Bika. Recruiting costs FAQ: Budget and cost per hire. https://resources.workable.com/ Jun Han and Claudio Moraga. 1995. The influence tutorial/faq-recruitment-budget- of the sigmoid function parameters on the speed of metrics. backpropagation learning. In International Work- shop on Artificial Neural Networks, pages 195–201. Antoine Bordes, Jason Weston, and Nicolas Usunier. Springer. 2014. Open question answering with weakly super- vised embedding models. In Joint European confer- David Harris and Sarah Harris. 2012. Digital design ence on machine learning and knowledge discovery and computer architecture (2nd ed.). Morgan Kauf- in databases, pages 165–180. Springer. mann.
Yousef Hassanzadeh, Mohammadvaghef Ghazvinian, Roei Schuster, Tal Schuster, Yoav Meri, and Vitaly Amin Abdi, Saman Baharvand, and Ali Joza- Shmatikov. 2020. Humpty dumpty: Controlling ghi. 2020. Prediction of short and long-term word meanings via corpus poisoning. In 2020 IEEE droughts using artificial neural networks and hydro- Symposium on Security and Privacy (SP), pages meteorological variables. 1295–1313. IEEE. Robin Jia and Percy Liang. 2017. Adversarial exam- D Sumathi and SS Manivannan. 2020. Machine ples for evaluating reading comprehension systems. learning-based algorithm for channel selection uti- In Proceedings of the 2017 Conference on Empiri- lizing preemptive resume priority in cognitive radio cal Methods in Natural Language Processing, pages networks validated by ns-2. Circuits, Systems, and 2021–2031. Signal Processing, 39(2):1038–1058. Diederik P Kingma and Jimmy Ba. 2014. Adam: A Mengying Sun, Fengyi Tang, Jinfeng Yi, Fei Wang, method for stochastic optimization. arXiv preprint and Jiayu Zhou. 2018. Identify susceptible loca- arXiv:1412.6980. tions in medical records via adversarial attacks on deep predictive models. In Proceedings of the 24th Alex Krizhevsky, Ilya Sutskever, and Geoffrey E Hin- ACM SIGKDD International Conference on Knowl- ton. 2012. Imagenet classification with deep convo- edge Discovery & Data Mining, pages 793–801. lutional neural networks. Advances in neural infor- mation processing systems, 25:1097–1105. Glassdoor Team. glassdoor. https: //www.glassdoor.com/employers/blog/ Jinfeng Li, Shouling Ji, Tianyu Du, Bo Li, and Ting 50-hr-recruiting-stats-make-think. Wang. 2018. Textbugger: Generating adversarial text against real-world applications. arXiv preprint Indeed Editorial Team. 2021. How To Tailor arXiv:1812.05271. Your Resume To a Job Description. https: //www.indeed.com/career-advice/resumes- Bin Liang, Hongcheng Li, Miaoqiang Su, Pan Bian, cover-letters/tailoring-resume. Xirong Li, and Wenchang Shi. 2017. Deep text classification can be fooled. arXiv preprint Melanie Tosik, Antonio Mallia, and Kedar Gangopad- arXiv:1704.08006. hyay. 2018. Debunking fake news one feature at a time. CoRR, abs/1808.02831. Vinod Nair and Geoffrey E Hinton. 2010. Rectified linear units improve restricted boltzmann machines. Chenguang Wang, Mu Li, and Alexander J. Smola. In Icml. 2019a. Language models with transformers. Jun-Ping Ng and Viktoria Abrecht. 2015. Better sum- Xiaosen Wang, Hao Jin, and Kun He. 2019b. Natural marization evaluation with word embeddings for language adversarial attacks and defenses in word rouge. arXiv preprint arXiv:1508.06034. level. arXiv preprint arXiv:1909.06723. Michael A Nielsen. 2015. Neural networks and deep Guangyou Zhou, Tingting He, Jun Zhao, and Po Hu. learning, volume 25. Determination press San Fran- 2015. Learning continuous word embedding with cisco, CA. metadata for question retrieval in community ques- tion answering. In Proceedings of the 53rd Annual Nicolas Papernot, Patrick McDaniel, Ananthram Meeting of the Association for Computational Lin- Swami, and Richard Harang. 2016. Crafting adver- guistics and the 7th International Joint Conference sarial input sequences for recurrent neural networks. on Natural Language Processing (Volume 1: Long In MILCOM 2016-2016 IEEE Military Communica- Papers), pages 250–259. tions Conference, pages 49–54. IEEE. Pranav Rajpurkar, Jian Zhang, Konstantin Lopyrev, and Percy Liang. 2016. Squad: 100,000+ questions for machine comprehension of text. In Proceedings of the 2016 Conference on Empirical Methods in Natu- ral Language Processing, pages 2383–2392. Gaetano Rossiello, Pierpaolo Basile, and Giovanni Semeraro. 2017. Centroid-based text summariza- tion through compositionality of word embeddings. In Proceedings of the MultiLing 2017 Workshop on Summarization and Summary Evaluation Across Source Types and Genres, pages 12–21. Pradeep Kumar Roy, Sarabjeet Singh Chowdhary, and Rocky Bhatia. 2020. A machine learning approach for automation of resume recommendation system. Procedia Computer Science, 167:2318–2327.
You can also read