Attacks against Ranking Algorithms with Text Embeddings: a Case Study on Recruitment Algorithms

Page created by Jessica Weber
 
CONTINUE READING
Attacks against Ranking Algorithms with Text Embeddings:
                                                                 a Case Study on Recruitment Algorithms

                                                               Anahita Samadi1 , Debapriya Banerjee2 , and Shirin Nilizadeh3
                                                                                  1,2,3
                                                                            University of Texas at Arlington
                                         Email ids: anahita.samadi@mavs.uta.edu, 2 debapriya.banerjee2@mavs.uta.edu, 3 shirin.nilizadeh@uta.edu
                                                    1

                                                                    Abstract                         meaning of new or existing words by changing
                                                                                                     their locations in the embedding space. In this
                                                  Recently, some studies have shown that text
                                                                                                     work, however, we investigate a new type of attack,
arXiv:2108.05490v1 [cs.CL] 12 Aug 2021

                                                  classification tasks are vulnerable to poisoning
                                                  and evasion attacks. However, little work has
                                                                                                     i.e., rank attack, when the text application utilizes
                                                  investigated attacks against decision-making       text embedding approaches. In this attack, the ad-
                                                  algorithms that use text embeddings, and their     versary does not poison the training corpora but
                                                  output is a ranking. In this paper, we fo-         tries to learn about the embedding space, and how
                                                  cus on ranking algorithms for recruitment pro-     adding some keywords to a document can change
                                                  cess, that employ text embeddings for rank-        the representation vector of it, and based on that
                                                  ing applicants’ resumes when compared to a         tries to improve the ranking of the adversarial text
                                                  job description. We demonstrate both white-
                                                                                                     document among a collection of documents.
                                                  box and black-box attacks that identify text
                                                  items, that based on their location in embed-         As a case study, we focus on ranking algorithms
                                                  ding space, have significant contribution in       in a recruitment process scenario. Recruitment
                                                  increasing the similarity score between a re-      process is a key to find a suitable candidate for a
                                                  sume and a job description. The adversary          job application.
                                                  then uses these text items to improve the rank-       Nowadays, companies use ML-based ap-
                                                  ing of their resume among others. We tested        proaches to rank resumes from a pool of candidates
                                                  recruitment algorithms that use the similar-
                                                                                                          (Sumathi and Manivannan, 2020; Roy et al.,
                                                  ity scores obtained from Universal Sentence
                                                  Encoder (USE) and Term Frequency–Inverse           2020).
                                                  Document Frequency (TF-IDF) vectors. Our              One naive approach for boosting the ranking
                                                  results show that in both adversarial settings,    of a resume can be adding the most words and
                                                  on average the attacker is successful. We also     phrases to his/her resume from the job description.
                                                  found that attacks against TF-IDF is more suc-     However, this is not the best approach all the time,
                                                  cessful compared to USE.                           because: First, the attacker must add a specific
                                                                                                     meaningful word to their resume. For example,
                                              1   Introduction
                                                                                                     they cannot claim they are proficient in some skills
                                              Recently some studies have shown that text classi-     while they are not. Second, adding any random
                                              fication tasks are vulnerable to poisoning and eva-    keyword from the job description to the resume
                                              sion attacks (Liang et al., 2017; Li et al., 2018;     not always increases the similarity between the re-
                                              Gao et al., 2018; Grosse et al., 2017). For ex-        sume and job description. Instead, we demonstrate
                                              ample, some works have shown that an adversary         that the adversary can learn about the influential
                                              can fool toxic content detection (Li et al., 2018),    words and phrases that can increase the similarity
                                              spam detection (Gao et al., 2018) and malware de-      score and use this knowledge and decide about the
                                              tection (Grosse et al., 2017) by modifying some        keywords or phrases to be added to their resume.
                                              text items in the adversarial examples. A recent          Therefore, in this work, we investigate: (1) How
                                              work (Schuster et al., 2020) showed that applica-      can an adversary utilize the text embedding space
                                              tions that rely on word embeddings are vulnerable      to extract words and phrases that have a higher
                                              to poisoning attacks, where an attacker can mod-       impact on the ranking of a specific document (here
                                              ify the corpus that the embedding is trained on,       resume)? and, (2) How can an adversary, with no
                                              i.e., Wikipedia and Twitter posts, and modify the      knowledge about the ranking algorithm , modify a
Recruitment Algorithm             Machine Learning to have more efficiency in rank-
                        Vectorizing all resumes and job
                            description using Text
                                                          ing resumes.A successful approach to rank resumes
                                 embeddings               is calculating similarity among resume and job de-
           Resumes
                        Computing cosine similarity
                                                          scription leveraging NLP techniques (Excellarate).
                        between each resume and job       In this study, recruitment algorithm is based on
                                description
              Job                                         a ranking algorithm. This algorithm takes input
          description
                        Sorting resumes based on their
                           cosine similarity scores
                                                          from the resume and the job description and finds
                                                          similarities based on their matching score. We use
                                                          universal sentence encoder (USE) as the text em-
     Figure 1: Ranking algorithm for recruitment
                                                          bedding approach to vectorize each resume and the
                                                          job descriptions. As it is shown in Figure 1, our
text document (here resume) to improve its ranking        algorithm includes three steps:
among a set of documents?                                    (1) all resumes and the job description are vec-
   While we focus on the recruitment applica-             torized using USE text embedding approach;
tion, the same approaches can be employed on                 (2) the similarity between each job description
other ranking algorithms that use text embeddings         and an individual resume is computed. Cosine sim-
for measuring the similarity between text docu-           ilarity is used as a metric to compute the similarity
ments and ranking them. For example, such rank-           score; and
ing algorithms are applied as part of summariza-             (3) the resumes are sorted based on their simi-
tion (Rossiello et al., 2017; Ng and Abrecht, 2015)       larity scores computed in the second step in such a
and question and answering systems (Bordes et al.,        way that the resume with highest similarity score
2014; Zhou et al., 2015; Esposito et al., 2020).          appears at the top of the list and on the other hand
   We consider both white-box and black-box set-          the resume with least similarity score appears at
tings, depending on the knowledge of adversary            bottom.
about the specific text embeddings approach that is          Threat Model. Adversaries have a huge moti-
used for the resume ranking. In white-box settings,       vation to change the rankings provided by some al-
we propose a novel approach which utilizes the text       gorithms, when they are used for decision-making,
embedding space to extract words and phrases that         e.g., for recruitment purposes. Adjusting a resume
influence the rankings significantly.                     based on the job description is a well-known ap-
   We consider both Universal Sentence Encoder            proach for boosting the chance of being selected
(USE) (Cer et al., 2018) and TF-IDF as the ap-            for the next rounds of recruitment (Team, 2021).
proaches for obtaining the word vectors. USE com-            In this work, we show how an adversary can au-
putes embeddings based on Transformer architec-           tomatically generate adversarial examples specific
ture (Wang et al., 2019a), and it captures contextual     to a recruitment algorithm and text embeddings.
information while on the other hand TF-IDF does           We define this attack as a rank attack, where the
not capture the contextual information.                   adversary adds some words or phrases to its docu-
   In the black-box setting, we propose a neural          ment to improve its ranking among a collection of
network based model that can identify the most            documents. We consider white-box and black-box
influential words/phrases without knowing the ex-         settings. In a white-box setting, we assume the
act text embedding approach that is used by the           attacker has complete knowledge about the ranking
ranking algorithm.                                        algorithm. In a black-box setting, however, the at-
                                                          tacker has no knowledge of the recruitment process
2   System and Threat Model                               but has limited access to the recruitment algorithm
                                                          and can test some resumes against the algorithm.
The recruitment process helps to find a suitable
candidate for a job application.
                                                          3   Background
  Based on Glassdoor statistics, the average
job opening attracts approximately 250 resumes            USE Text Embedding. This embedding has been
(Team) , and a recent survey                              used to solve tasks, such as semantic search, text
  found that the average cost per hire is just over       classification, question answering (Rossiello et al.,
$4,000 (Bika)                                             2017; Ng and Abrecht, 2015; Bordes et al., 2014;
  To Limit work in progress many companies use            Zhou et al., 2015; Esposito et al., 2020).
USE uses an encoder to convert given text to a                         White-box Setting
fixed-length 512-dimensional vector.                                    Phase 1: Finding the best job
   It has been shown that after embedding sen-                             description keywords

tences, sentences that have closer meaning carry
                                                                       Phase 2: Sorting keywords based
out higher cosine similarity (Tosik et al., 2018).                              on the resume
We used USE pretrained model, which is trained
on the STS (Semantic Textual Similarity) bench-                        Phase 3: Adding N keywords to
                                                                           the adversarial resume
mark(Agirre)
   TF-IDF. TF-IDF or term frequency-inverse doc-
                                                                    Figure 2: White-box setting
ument frequency is a widely used approach in in-
formation retrieval and text mining. TF-IDF is
computed as multiplication of TF, the frequency of     5     White-Box Setting and a Recruitment
a word in a document, and IDF, the inverse of the            Algorithm that Employs USE
document frequency.
                                                       In white-box settings, we assume the adversary has
4       Data Collection                                knowledge about the recruitment process and the
                                                       use of universal sentence encoder (USE) or term
We collected 100 real public applicant resumes         frequency–inverse document frequency (TF-IDF)
from LinkedIn public job seeker resumes, GitHub,       for obtaining the embedding vectors. We propose a
and personal websites. To have an equal chance for     novel approach which utilizes the knowledge about
applicants and make our experiences closer to real     the text embedding space, here USE, to identify
world recruitment procedures, we only considered       keywords in the job description that can be added
resumes related to computer science. Resumes           to the adversarial resume and increase its similarity
are chosen to be in different levels of education      score with the job description. Our approach as
(bachelor, master and Ph.D. with equal distribu-       it is shown in Figure 2 consists of three phases:
tion), skills, experiments (entry level, mid level,    in Phase 1, the adversary tries to identify the im-
senior level), and gender (around fifty percent men    portant words in job description; in Phase 2, the
and fifty percent women).                              adversary tries to rank the effectiveness of those
   We also developed a web scraper in python to        words based on its own resume, i.e., identifying
extract computer science jobs from the Indeed web-     those words that can increase the similarity be-
site.1 Our dataset includes over 10,000 job descrip-   tween his/her resume and the job description. Af-
tions, extracted randomly from cities in the USA.      ter identifying the most effective words, then in
We randomly chose 50 job descriptions and used         Phase 3, the adversary modifies its resume based
them in our experiments.                               on his/her expertise and skill set and decides to add
   For black-box settings, our neural network archi-   N of those identified words and phrases. Note that
tecture needs a huge amount of data to be trained      in this attack scenario the adversary does not need
on. To have enough training sets we augmented our      to have any knowledge about the other candidates’
data for our models. For a simple setting model,       resumes. The details of phase 1 and 2 in this at-
we created 5,000 records by concatenating 100 re-      tack are depicted in Algorithm 1 and Algorithm 2,
sumes to 50 selected job description to augment our    respectively, and we explain them in the following.
data for recruitment algorithms that is not totally    5.1    Phase one: Identifying the Influential
dependent to job description. For a more complex              Keywords in the Job Description
setting, we split resumes into half, then joined the
upper part of each resume to other resumes’ lower      In this phase, the adversary focuses on identifica-
parts. With this approach, we could maintain the       tion of the most important keywords from job de-
structure of each resume and each resume could         scription. Based on the use of cosine similarity be-
have common resume information, such as skills,        tween the vectors of resumes and job descriptions
education, work experiment, etc. We did this pro-      in the recruitment algorithm (depicted in Figure 1),
cedure for all possible combinations and created a     the highest similarity can be achieved if the resume
database with 10,000 resumes.                          is the same as the job description. We propose to
                                                       remove words/phrases from the job description and
    1
        https://www.indeed.com/                        then examine its impact on the similarity score. A
Algorithm 1 Job Description Keywords Extraction                 for all tokens in the job description. This procedure
Input: Job Description document                                 provides a dictionary, where the keys are tokens,
                                                                and the values are their corresponding cosine simi-
Output: An list of keywords in ascending order                  larity scores.
based on their similarity score
                                                                   (6) Sorting keywords: Finally, extracted key-
1: procedure P HASE 1(OrginalJob)
2:    OrginalJob ← F ilterStopW ords(OrginalJob)                words are sorted based on the similarity score in
3:    T okens ← T okenize(OrginalJob)                           ascending order.
4:    U SE Job ← U SE(OrginalJob)
5:                                                                 N-gram phrases: Moreover, we extended the
6:    T oken Sim ← {}
7:    for T okeni ∈ T okens do                                  code in Algorithm 1 so that it can also identify the
8:        newJob ← DeleteT oken(JobDescription, T okeni )
9:        U SE newJob ← U SE(newJob)                            influential phrases, i.e., bigrams and trigrams. In
10:        T oken Sim            ←      T oken Sim          +   that case, in each repetition, instead of one indi-
    {T okeni , CosSim(U SE Job , U SE newJob )}
11:     end for                                                 vidual word, repeatedly 2 or 3 neighbor words are
12:                                                             removed from the job description and the similarity
13:    return AscendingSortByV alue(T oken Sim)
                                                                score is computed.

substantial decrease in the similarity score when a             5.2   Phase two: Re-sorting the Influential
specific word/phrase is removed demonstrates the                      Keywords based on a Specific Resume
importance of the keyword in the word embedding
space corresponding to this job description. In the             The previous phase helps identify the words and
white-box setting, the adversary is aware of the                phrases in the job description that in USE embed-
details of algorithms. Therefore, they employ USE               ding space have a higher impact on providing a
text embedding to obtain the vector and use cosine              larger similarity score. However, each resume is
similarity to calculate the similarity scores.                  unique and adding the most influential words ob-
   In addition, to examine the importance of                    tained from the job description might not have the
phrases instead of individual words, the adversary              same impact on all the resumes. In Phase 2, we
can try to remove phrases with one word (uni-                   try to identify the best words and phrases that can
gram), two words (bigram), three words (trigram),               boost the similarity score between a specific re-
etc., and then compute the similarity score. Algo-              sume and job description. Algorithm 2 shows the
rithm 1 shows the details of phase 1 which consists             details of phase 2 which consists of the following
of the following steps:                                         steps:
   (1) Text pre-processing: the keywords of the job                (1) Adding Keywords: A keyword from the list
description are obtained as a bag of words, and the             of fifty keywords obtained from phase 1 is added
stop words are removed to lower the dimensional                 to the adversarial resume.
space.                                                             (2) Obtaining the Embedding Vector: The em-
   (2) USE embedding: The embedding vector is                   bedding vector for the adversarial resume is ob-
obtained for the original job description using uni-            tained using USE.
versal sentence encoder (USE).                                     (3) Calculating the Similarity: The cosine sim-
   (3) Token removal: measures the importance of                ilarity between adversarial and job description is
each word in the job description. A single token                computed. Higher cosine similarity expresses the
   is deleted from job description, and the new job             fact that the deleted token caused an impressive
description is created.                                         change in the job description contents, therefore, it
   Next, the embedding vector for N ewJob is ob-                might be an important keyword.
tained by passing to USE. (4) Scoring keywords:                    (4) Repetition: These steps are repeated for all
Cosine similarity is calculated to measure the                  fifty keywords. This procedure provides a dictio-
similarity between two vectors U SE newJob and                  nary, where the keys are the fifty keywords, and
U SE OrginalJob .                                               the values are their corresponding cosine similarity
   In this regard, lower cosine similarity expresses            scores. The list of keywords are sorted based on
the fact that the deleted token has caused an impres-           their cosine similarity scores.
sive change in the job description content, therefore              N-gram phrases: Algorithm 2 is also extended
it might be an important keyword.                               to get the sorted list of bigrams or trigrams, and
   (5) Repetition: Steps three and four are repeated            added to the adversarial resume.
Algorithm 2 Resorting the extracted job descrip-                   5.4    Experimental Results
tion keywords based on a resume                                    Figure 3a shows the histogram of average rank im-
Input: Job description document, Resume                            provements for 100 resumes and 50 random job
document, A list of tokens                                         descriptions. All resumes had rank improvement,
                                                                   and in most of them the rank improvement is sig-
Output: An ordered list of keywords
                                                                   nificant. For example, on average an adversarial
1:    procedure       P HASE 2(JobDescription,      Resume,
      OrderedT okens)                                              resume moved up 16 positions in rank improve-
2:       U SE JobDescription ← U SE(JobDescription)
3:       Keyword Sim ← {}
                                                                   ment, about 6 resumes had an average of moving
4:                                                                 up 28 ranking positions, and more than 65 resumes
5:      for keywordi ∈ OrderedT okens do
6:          adversarialResume ← AddT oken(Resume, keywordi )       moved up more than 10 ranking positions.
7:          U SE advResume ← U SE(adversarialResume)
8:          Keyword Sim         ←          Keyword Sim       +        Figure 3b shows the average rank improvement
      {Keywordi , CosSim(U SE advResume , U SE JobDescription )}   based on the number of words or phrases (bigrams
9:      end for
10:                                                                and trigrams) added to the adversarial resume. We
11:      return SortByV alue(Keyword Sim)                          see similar trend when adding unigrams, bigrams,
                                                                   and trigrams, i.e., adding more words and phrases
                                                                   increases the average rank improvement. For exam-
5.3      Experimental Setup                                        ple, while adding 2 bigrams improves the ranking
                                                                   of the adversarial resume on average by 6, adding
                                                                   20 bigrams improves the ranking of the adversarial
We implemented the rank attack in a white-box
                                                                   resume by 30, among 100 resumes. However, inter-
setting and tested all combinations of 100 resumes
                                                                   estingly we see that adding too many words/phrases
and 50 job descriptions. The attack is shown in
                                                                   might not have the same effect, e.g., adding 50 bi-
Algorithm 3. For each job description, we first
                                                                   grams shows a rank improvement by only about 28.
obtained the ranking of the original resumes. Then
                                                                   This shows there might be an optimal number of
in each experiment, we assumed that a resume is
                                                                   words/phrases that can help ranking of a document.
adversarial and therefore the best keywords for that
                                                                      Comparing the addition of unigrams, bigrams,
specific resume are added to it. To investigate the
                                                                   and trigrams, we see that trigrams provide better
impact of number of keywords on the ranking of
                                                                   rank improvement. For example, adding 10 uni-
the resume, we tested with n ∈ (1, 2, 5, 10, 20, 50)
                                                                   grams, bigrams, and trigrams, we see a rank im-
of keywords. We also repeated these experiments
                                                                   provement of 12, 21 and 25, respectively. This find-
for bigram and trigram phrases.
                                                                   ing can be explained by USE being a context-aware
                                                                   embedding approach, which takes into account the
Algorithm 3 White Box Adversarial Attack                           order of words in addition to their meaning.

Input: Job Description documents, Resume                           6     White-box Setting and a Recruitment
documents                                                                Algorithm that Employs TF-IDF
Output: Ranking                                                    We also investigate the effectiveness of the attacks
1:  procedure W HITE B OX(Jobs, Resumes)
                                                                   in white-box setting, when the recruitment algo-
2:     for jobi ∈ Jobs do
3:         T okens ← P hase1(jobi )                                rithm uses TF-IDF vectors to compute the simi-
4:         for resumej ∈ Resumes do
5:             Rank ← GetRanking(jobi , resumej , Resumes)         larity between resumes and job descriptions. The
6:             words                                      ←        attack is the same. The only difference is in the
    P hase2(JobDescription, Resume, T okens)
7:             for n ∈ (1, 2, 5, 10, 20, 50) do                    recruitment algorithm. Since the adversary has the
8:                 AdvResume                              ←
    AddW ords(resumej , n, words)                                  knowledge that the recruitment algorithm uses TF-
9:                 Resumes2 ← Resumes − resumej                    IDF vectors, then they compute TF-IDF vectors
10:                 Resumes2 ← Resumes2 + AdvResume
11:                 Ranknew ← Ranking(jobi , Resumes2)             instead of USE vectors.
12:                 RankChangejobi ,resumej ,n ← Ranknew −
    Rank
13:             end for                                            6.1    Experimental Setup
14:         end for
15:     end for                                                    For these experiments, we implemented the recruit-
16:                                                                ment algorithm that ranks resumes based on the
17:      return RankRankChange
                                                                   similarity scores that are calculated between the
                                                                   TF-IDF vectors of resumes and the job description.
35   ●       Unigram
                     10                                                                                                                                                                                                                      Bigram
                                                                                                                                                                                                                                             Trigram
                                                 ●
                                                                                                                                                                                                                                30
 Number of Resumes

                      8

                                                                                                                                                                                                     Average Rank Improvement
                                         ●                                                                                                                                                                                      25

                      6                                               ●
                                                                                                                                                                                                                                20
                                                                                                                       ●                      ●

                      4                              ●           ●        ●                                                 ●                     ●                                                                             15
                                 ●                       ●   ●                 ●   ●                 ●    ●                     ●                      ●
                                                                                                                                                                                                                                10
                      2      ●               ●                                          ●   ●    ●                 ●                     ●                 ●

                                     ●                                                                                               ●                          ●             ●                  ●                              5
                      0
                                                                                                                                                                                                                                0
                             1 2 3 4 5 6 7 8 9                   11       13       15       17       19       21       23       25       27       29       31       33   35       37   39   41
                                                                                                                                                                                                                                         1             2      5        10       20   50
                                                                                        Average Rank Improvement                                                                                                                                           Number of Keywords

                                                         (a) Histogram of average rank improvement                                                                                                         (b) Average rank improvement

Figure 3: Average rank improvement for 100 resumes and 50 job descriptions, in white-box setting when recruit-
ment algorithm employs USE embeddings

   We implemented the rank attack and tested on                                                                                                                 by adding 20 keywords to attack the recruitment
all combinations of the 100 resumes and 50 job                                                                                                                  algorithm that employs USE, we observe an aver-
descriptions. For each job description, we first                                                                                                                age rank improvement of 18, 29, and 33 for uni-
obtained the ranking of the original resumes, and                                                                                                               gram, bigram and trigram, respectively. However,
then in each experiment, we assumed that a resume                                                                                                               by adding 10 keywords to attack the recruitment al-
is adversarial and therefore the best keywords for                                                                                                              gorithm that employs TF-IDF, they are 48, 32, and
that specific resume are added to it. To investigate                                                                                                            37 for unigram, bigram and trigram, respectively.
the impact of number of keywords on the ranking of
the resume, we tested with n ∈ (1, 2, 5, 10, 20, 50)                                                                                                            7        Black-Box Setting
of keywords. We also repeated these experiments
for bigram and trigram phrases.                                                                                                                                 In the back-box setting, an adversary does not have
                                                                                                                                                                access to the model specification but has access
                                                                                                                                                                to the recruitment algorithm as an oracle machine,
6.2                       Experimental Results
                                                                                                                                                                which receives an input, e.g., here a resume, and
Figure 4a shows the histogram of average rank im-                                                                                                               then generates a response, e.g., here accept/ reject,
provements for 100 resumes and 50 random job                                                                                                                    or the ranking among all the resumes in its pool.
descriptions. Most resumes had rank improvement                                                                                                                 We assume the resume dataset is static, i.e., the
and in most of them rank improvement was sig-                                                                                                                   dataset of resumes do not change during the attack,
nificant. For example, on average, an adversarial                                                                                                               and the adversary only modifies their input until it
resume moved up about 25 ranking positions, and                                                                                                                 gets accepted or obtains a better ranking.
more than 85 resumes moved up more than 10.                                                                                                                        We develop a neural network model to predict
   In Figure 4b, we investigated the effect of adding                                                                                                           keywords that improve the ranking of a target re-
bigram and trigram bags of words on average rank                                                                                                                sume for a specific job. Therefore, the input vector
improvement. Interestingly, in contrast to our re-                                                                                                              of our neural network model is the resumes, and
sults for recruitment algorithms that employ USE                                                                                                                the output is a vector that demonstrate the best key-
embedding, in these experiments we achieved bet-                                                                                                                words. In our experiments, we examine attacks
ter results for unigrams compared to bigrams and                                                                                                                against two recruitment algorithms: a binary classi-
trigrams. Note that TF-IDF approach is only based                                                                                                               fier, which label a resume as accept or reject, and a
on word similarity and does not consider the con-                                                                                                               ranking algorithm that provides a ranking based on
text. This might explain the results. No matter                                                                                                                 similarity scores between the job description and
what n-gram is used, by increasing the number of                                                                                                                the resumes, where they are vectorized by USE em-
keywords, the ranks also improve. Comparing re-                                                                                                                 beddings. The attacks against both algorithms have
sults in Figure 3b and Figure 4b also shows that it                                                                                                             two phases: phase 1: pre-processing which pre-
is easier for the adversary to attack a recruitment al-                                                                                                         pares the data for the neural network, and phase 2:
gorithm that employs TF-IDF compared to USE, as                                                                                                                 a neural network model.
the rank improvement is larger for attacks against                                                                                                                 Phase 1: Pre-processing. To provide an ac-
the the recruitment algorithm that employs TF-IDF,                                                                                                              ceptable format for the neural network input, we
specially in the case of unigrams. For example,                                                                                                                 applied one-hot encoding (Harris and Harris, 2012)
50   ●       Unigram
                     6                                                         ●                                                                                          ●        ●                                                                   Bigram
                                                                                                                                                                                                                                                       Trigram
 Number of Resumes

                     5                                                                                                  ●   ●                      ●        ●    ●            ●                                                           40

                                                                                                                                                                                                               Average Rank Improvement
                     4                                                                                                                                               ●                      ●                                             30

                     3           ●                                ●                              ●             ●                               ●                                       ●
                                                                                                                                                                                                                                          20

                     2   ●                   ●       ●                              ●        ●        ●                               ●                 ●
                                                                                                                                                                                                                                          10

                     1       ●       ●   ●       ●       ●   ●             ●                                       ●             ●        ●
                                                                                                                                                                                                                                          0
                         1 2 3 4 5 6 7 8 9                   11       13       15       17       19       21       23       25   27       29       31       33       35       37       39       41   43   45
                                                                                                                                                                                                                                                   1             2      5        10       20   50
                                                                                        Average Rank Improvement                                                                                                                                                     Number of Keywords

                                                 (a) Histogram of average rank improvement                                                                                                                           (b) Average rank improvement

 Figure 4: Average rank improvement in white-box setting when recruitment algorithm employs TF-IDF vectors

following these steps:                                                                                                                                           other classes.
   (1) Tokenization: The job description and re-                                                                                                                     For the optimization of loss function (train-
sumes are tokenized and a dictionary of words is                                                                                                                 ing the neural network), we used stochastic
created, where key is the token/word, and value is                                                                                                               gradient descent-based optimization algorithm
the frequency of the words in these documents.                                                                                                                   (Adam; (Kingma and Ba, 2014)).
   (2) Vectorization of Resumes: Tokens/words are                                                                                                                    For the regularization technique, to avoid over-
encoded as a one-hot numeric array, where for each                                                                                                               fitting (Goodfellow et al., 2016), we tested dropout
word a binary column is created. As the result,                                                                                                                  with different rates [0 to 0.5].
a sparse matrix is returned, which represents the                                                                                                                    Dropout was applied for all hidden layers, and
resumes in rows and the words in columns. If a                                                                                                                   our assessment showed that the dropout rate (0.1)
resume includes some word, the entry for that row                                                                                                                yielded better results.
and column is 1, otherwise it is 0.
                                                                                                                                                                 7.1               Experiments for the Binary Classifier
   Neural Network Architecture. Neural net-
works have been applied in many real-world prob-                                                                                                                 This is a simpler model, where the recruitment pro-
lems                                                                                                                                                             cess is defined as a binary classification algorithm,
    (Nielsen, 2015; Goodfellow et al., 2016; Has-                                                                                                                and a resume is accepted based on some rules. In
sanzadeh et al., 2020; Ghazvinian et al., 2021;                                                                                                                  our experiments, we defined simple rules , e.g., if
Krizhevsky et al., 2012). We propose a deep neural                                                                                                               python as a skill is in the resume.
network architecture                                                                                                                                                After tokenization of the resume and the job
   which consists of an input layer, three dense                                                                                                                 description, instead of generating the one-hot en-
layers as hidden layers, and an output layer rep-                                                                                                                coding for all the words obtained, we chose 20 of
resenting the labels. In the output vector, ones                                                                                                                 the most frequent words of all resumes and job
indicate an index of words in the dictionary that                                                                                                                descriptions. This is to test with a low dimension
adding them to a resume will increase the rank                                                                                                                   vector. We then concatenated the vectors of the
of the resume. For the first two hidden layers we                                                                                                                resume and job description.
used rectified linear unit (ReLU) (Nair and Hinton,                                                                                                                 Creating the groundtruth dataset.
2010) as the activation function. models (Nair and                                                                                                                  We employ two steps:
Hinton, 2010). Due to their unique formulations,                                                                                                                    (1) Xtrain: 5000 records of 40-dimension vec-
ReLUs provide faster training and better conver-                                                                                                                 tors, each vector is a resume that is concatenated to
gence relative to other activation functions (Nair                                                                                                               a job description and is coded by one-hot format.
and Hinton, 2010). For the output layer we used sig-                                                                                                             In Section 4 we explained the method for obtaining
moid activation function to map the output to lie in                                                                                                             these resumes.
the range [0,1], i.e., actual probability values (Han                                                                                                               (2) Ytrain: 5000 records of 20-dimension vector.
and Moraga, 1995). As our problem was multil-                                                                                                                       If the value of a word index is set to one then
abel problem, we applied binary cross-entropy loss.                                                                                                              adding this word to resume makes the resume be
In contrast to softmax cross entropy loss, binary                                                                                                                accepted.
cross-entropy is independent in terms of class, i.e.,                                                                                                               Model Training. To train our neural network
the loss measured for every class is not affected by                                                                                                             model, we split our data into train and test (70%
train set, 30% test set); to evaluate our training re-                    output label is also a encoded vector by one-hot
sults, we used validation set approach (we allocated                      format.
30% of training set for validation) and trained on                           Model Training. To train our neural network
2,450 samples, validate on 1,050 samples. We also                         model, we split our data into train and test (70%
set batch size equal to 50 and number of epochs                           train set, 30% test set).
equal to 100.                                                                Our results showed that the model is trained well
   Our results shows that the model is trained well                       through epochs. After 10 epochs, the recall, preci-
through epochs. After 50 epochs, the recall, preci-                       sion and F1-score of this model over the validation
sion and F1-score of this model over the validation                       set reached their maximum values, i.e., 0.62, 0.68
set reached their maximum, which are 0.6, 0.7 and                         and 0.75, respectively. To test the performance of
0.82, respectively.                                                       our trained neural network model, we added the
   We also examined the performance of the trained                        50 predicted words in the neural network to each
neural network model on test data. For that, we                           related resume and submitted each resume to a
added predicted words in neural network to each                           recruitment algorithm, and obtained the response
related resume and submitted each resume to the                           from recruitment oracle, in the form of a ranking
recruitment algorithm and obtained the response                           score. In Figure 6, we see most of the resumes
from recruitment oracle in the form of a binary                           have a significant rank improvement. For example,
value. Figure 6 shows that the success rate of                            more than 200 resumes had a rank improvement of
getting accepted significantly increases by using                         more than 400, while more than 400 resumes had
suggested keywords. While without adding the                              an rank improvement between 150 and 200 after
keywords, the acceptance rate was 20%, after the                          adding the suggested keywords.
attack the acceptance rate increased to about 100%.

                                                                                                         400
                                                                NEWrank
                               1400                             OLDrank
                               1200
           number of resumes

                                                                                                         300
                                                                                     Number of resumes

                               1000
                               800
                                                                                                         200
                               600
                               400
                                                                                                         100
                               200
                                 0    acceptance                reject                                    0
                                                   statistics
black-box attacks.                                     Daniel Cer, Yinfei Yang, Sheng-yi Kong, Nan Hua,
   A recent work (Cheng et al., 2020) attacked a         Nicole Limtiaco, Rhomni St John, Noah Constant,
                                                         Mario Guajardo-Cespedes, Steve Yuan, Chris Tar,
text summarization, that is based on seq2seq mod-
                                                         et al. 2018. Universal sentence encoder for english.
els. In our paper, however, we propose crafting an       In Proceedings of the 2018 Conference on Empirical
adversarial text for ranking algorithms. Attacks         Methods in Natural Language Processing: System
against resume search. A recent work (Schuster           Demonstrations, pages 169–174.
et al., 2020) showed that applications that rely on    Minhao Cheng, Jinfeng Yi, Pin-Yu Chen, Huan Zhang,
word embeddings are vulnerable to poisoning at-          and Cho-Jui Hsieh. 2020. Seq2sick: Evaluating the
tacks, where an attacker can modify the corpus that      robustness of sequence-to-sequence models with ad-
the embedding is trained on and modify the mean-         versarial examples. In AAAI, pages 3601–3608.
ing of new or existing words by changing their         Javid Ebrahimi, Anyi Rao, Daniel Lowd, and Dejing
locations in the embedding space.                         Dou. 2018. Hotflip: White-box adversarial exam-
   However, our attack is not about poisoning the         ples for text classification. In Proceedings of the
                                                          56th Annual Meeting of the Association for Compu-
corpus.                                                   tational Linguistics (Volume 2: Short Papers), pages
   In addition,                                           31–36.
   we focus on the recruitment process that em-
                                                       Massimo Esposito, Emanuele Damiano, Aniello Minu-
ploys cosine similarity for ranking applicants’ re-     tolo, Giuseppe De Pietro, and Hamido Fujita. 2020.
sumes compared to a job description.                    Hybrid query expansion using lexical resources and
                                                        word embeddings for sentence retrieval in question
9    Conclusion                                         answering. Information Sciences, 514:88–105.

In this project we found that an automatic recruit-    Excellarate. Resume Ranking using Machine Learn-
                                                         ing.      https://medium.com/@Excellarate/
ment algorithm, as an example of ranking algo-           resume-ranking-using-machine-learning-
rithms, is vulnerable against adversarial examples.      implementation-47959a4e5d8e.
   We proposed a successful adversarial attack in
                                                       Ji Gao, Jack Lanchantin, Mary Lou Soffa, and Yan-
two settings: white-box and black-box. We pro-            jun Qi. 2018. Black-box generation of adversarial
posed a new approach for keyword extraction based         text sequences to evade deep learning classifiers. In
on USE.                                                   2018 IEEE Security and Privacy Workshops (SPW),
   We observed the majority of resumes have sig-          pages 50–56. IEEE.
nificant rank improvements by adding more influ-       Mohammadvaghef Ghazvinian, Yu Zhang, Dong-Jun
ential keywords. Finally, in a black-box setting, we    Seo, Minxue He, and Nelun Fernando. 2021. A
proposed multilabel neural network architecture to      novel hybrid artificial neural network - paramet-
                                                        ric scheme for postprocessing medium-range pre-
predict the proper keyword for each resume and          cipitation forecasts. Advances in Water Resources,
job description.                                        151:103907.
                                                       Zhitao Gong, Wenlu Wang, Bo Li, Dawn Song, and
                                                         Wei-Shinn Ku. 2018. Adversarial texts with gradient
References                                               methods. arXiv preprint arXiv:1801.07175.
Eneko    Agirre.     STSbenchmark.     https:
                                                       Ian Goodfellow, Yoshua Bengio, Aaron Courville, and
    //ixa2.si.ehu.eus/stswiki/index.php/
                                                          Yoshua Bengio. 2016. Deep learning, volume 1.
    STSbenchmark.
                                                          MIT press Cambridge.
Moustafa Alzantot, Yash Sharma, Ahmed Elgohary,        Kathrin Grosse, Nicolas Papernot, Praveen Manoharan,
 Bo-Jhang Ho, Mani Srivastava, and Kai-Wei Chang.        Michael Backes, and Patrick McDaniel. 2017. Ad-
 2018. Generating natural language adversarial ex-       versarial examples for malware detection. In Euro-
 amples. arXiv preprint arXiv:1804.07998.                pean Symposium on Research in Computer Security,
                                                         pages 62–79. Springer.
N Bika. Recruiting costs FAQ: Budget and cost per
  hire.     https://resources.workable.com/            Jun Han and Claudio Moraga. 1995. The influence
    tutorial/faq-recruitment-budget-                     of the sigmoid function parameters on the speed of
    metrics.                                             backpropagation learning. In International Work-
                                                         shop on Artificial Neural Networks, pages 195–201.
Antoine Bordes, Jason Weston, and Nicolas Usunier.       Springer.
  2014. Open question answering with weakly super-
  vised embedding models. In Joint European confer-    David Harris and Sarah Harris. 2012. Digital design
  ence on machine learning and knowledge discovery       and computer architecture (2nd ed.). Morgan Kauf-
  in databases, pages 165–180. Springer.                 mann.
Yousef Hassanzadeh, Mohammadvaghef Ghazvinian,            Roei Schuster, Tal Schuster, Yoav Meri, and Vitaly
  Amin Abdi, Saman Baharvand, and Ali Joza-                 Shmatikov. 2020. Humpty dumpty: Controlling
  ghi. 2020.    Prediction of short and long-term           word meanings via corpus poisoning. In 2020 IEEE
  droughts using artificial neural networks and hydro-      Symposium on Security and Privacy (SP), pages
  meteorological variables.                                 1295–1313. IEEE.

Robin Jia and Percy Liang. 2017. Adversarial exam-        D Sumathi and SS Manivannan. 2020. Machine
  ples for evaluating reading comprehension systems.        learning-based algorithm for channel selection uti-
  In Proceedings of the 2017 Conference on Empiri-          lizing preemptive resume priority in cognitive radio
  cal Methods in Natural Language Processing, pages         networks validated by ns-2. Circuits, Systems, and
  2021–2031.                                                Signal Processing, 39(2):1038–1058.

Diederik P Kingma and Jimmy Ba. 2014. Adam: A             Mengying Sun, Fengyi Tang, Jinfeng Yi, Fei Wang,
  method for stochastic optimization. arXiv preprint       and Jiayu Zhou. 2018. Identify susceptible loca-
  arXiv:1412.6980.                                         tions in medical records via adversarial attacks on
                                                           deep predictive models. In Proceedings of the 24th
Alex Krizhevsky, Ilya Sutskever, and Geoffrey E Hin-       ACM SIGKDD International Conference on Knowl-
  ton. 2012. Imagenet classification with deep convo-      edge Discovery & Data Mining, pages 793–801.
  lutional neural networks. Advances in neural infor-
  mation processing systems, 25:1097–1105.                Glassdoor Team.       glassdoor.    https:
                                                            //www.glassdoor.com/employers/blog/
Jinfeng Li, Shouling Ji, Tianyu Du, Bo Li, and Ting         50-hr-recruiting-stats-make-think.
   Wang. 2018. Textbugger: Generating adversarial
   text against real-world applications. arXiv preprint   Indeed Editorial Team. 2021. How To Tailor
   arXiv:1812.05271.                                        Your Resume To a Job Description. https:
                                                            //www.indeed.com/career-advice/resumes-
Bin Liang, Hongcheng Li, Miaoqiang Su, Pan Bian,            cover-letters/tailoring-resume.
  Xirong Li, and Wenchang Shi. 2017.          Deep
  text classification can be fooled. arXiv preprint       Melanie Tosik, Antonio Mallia, and Kedar Gangopad-
  arXiv:1704.08006.                                        hyay. 2018. Debunking fake news one feature at a
                                                           time. CoRR, abs/1808.02831.
Vinod Nair and Geoffrey E Hinton. 2010. Rectified
  linear units improve restricted boltzmann machines.     Chenguang Wang, Mu Li, and Alexander J. Smola.
  In Icml.                                                  2019a. Language models with transformers.

Jun-Ping Ng and Viktoria Abrecht. 2015. Better sum-       Xiaosen Wang, Hao Jin, and Kun He. 2019b. Natural
  marization evaluation with word embeddings for            language adversarial attacks and defenses in word
  rouge. arXiv preprint arXiv:1508.06034.                   level. arXiv preprint arXiv:1909.06723.

Michael A Nielsen. 2015. Neural networks and deep         Guangyou Zhou, Tingting He, Jun Zhao, and Po Hu.
  learning, volume 25. Determination press San Fran-        2015. Learning continuous word embedding with
  cisco, CA.                                                metadata for question retrieval in community ques-
                                                            tion answering. In Proceedings of the 53rd Annual
Nicolas Papernot, Patrick McDaniel, Ananthram               Meeting of the Association for Computational Lin-
  Swami, and Richard Harang. 2016. Crafting adver-          guistics and the 7th International Joint Conference
  sarial input sequences for recurrent neural networks.     on Natural Language Processing (Volume 1: Long
  In MILCOM 2016-2016 IEEE Military Communica-              Papers), pages 250–259.
  tions Conference, pages 49–54. IEEE.

Pranav Rajpurkar, Jian Zhang, Konstantin Lopyrev, and
  Percy Liang. 2016. Squad: 100,000+ questions for
  machine comprehension of text. In Proceedings of
  the 2016 Conference on Empirical Methods in Natu-
  ral Language Processing, pages 2383–2392.

Gaetano Rossiello, Pierpaolo Basile, and Giovanni
  Semeraro. 2017. Centroid-based text summariza-
  tion through compositionality of word embeddings.
  In Proceedings of the MultiLing 2017 Workshop
  on Summarization and Summary Evaluation Across
  Source Types and Genres, pages 12–21.

Pradeep Kumar Roy, Sarabjeet Singh Chowdhary, and
  Rocky Bhatia. 2020. A machine learning approach
  for automation of resume recommendation system.
  Procedia Computer Science, 167:2318–2327.
You can also read