Ideas for research from social bots to beer - Thesis proposals - Florian Daniel

Page created by Alfred Brown
 
CONTINUE READING
Ideas for research from social bots to beer - Thesis proposals - Florian Daniel
Thesis proposals
Ideas for research from
social bots to beer

Florian Daniel
florian.daniel@polimi.it   February 25, 2019
Ideas for research from social bots to beer - Thesis proposals - Florian Daniel
Detection of harmful social bots

Identification of harmful communication patterns

Conversational screen readers

Domain-specific content extraction
Ideas for research from social bots to beer - Thesis proposals - Florian Daniel
(Social) Bot = algorithmically driven entity that behaves like a human
in online communications
Ideas for research from social bots to beer - Thesis proposals - Florian Daniel
Empirical study shows: bots may cause harm to humans
                                                                      Examples                 When abuse happens
                What bots do
What                                                                          rce, Cu sto merS vc   Disclose
                                                                                                                  What else can             1
                                                                      eCom me

else can              Chat
                                  Talk with user
                                                                                   AASlang
                                                                                                  sensitive facts
                                                                                                                  go wrong?              10
                                                                  Boo                                                        Denigrate
                                                                      s   tJui
they do?                          Redirect user    SM                          c
                                                      Ss                            e
                                                        ex                                                             a     Be grossly 3
                                                                                                                  m
                                                                                                               Pu             offensive          Regulated
                                                                              o
                                                                          Ore                                               Be indecent 4         by law
                                                      Tay
                                                    MS                                                                      or obscene
                                   Write post                                                      DeathThreat                              2
                                                                  o                                                        Be threatening
                                                             Geic
                      Post        Comment post                                                       Da                     Make false 6
                                                                                                        t    ing            allegations
                                                        SethRich                                                Iva
                                  Forward post                                                Sp                      na
                                                                                                am
     Action                                                                                                                                                     Abuse
                                                                                                                              Deceive

                                                                                              Pola
                                                                          W                        rBo                         Spam
                                                                              ise                        t
                                  Like message      Instag                       Sh
                                                           res                          ib
                                                                                          e                                   Spread
                    Endorse                                                                                                misinformation
                                                   Trump, Colluding                                                                             Not regulated
                                   Follow user                         Bots
                                                                                                                           Mimic interest

                                                                                                                            Clone profile
                                                               , Jason      Slotkin6
                                                    InstaClone
                   Participate     Create user                                                                             Invade space

F. Daniel, C. Cappiello, B. Benatallah. Bots Acting Like Humans: Understanding and Preventing Harm. IEEE Internet Computing,
2019, accepted for publication. https://ieeexplore.ieee.org/document/8611348
Ideas for research from social bots to beer - Thesis proposals - Florian Daniel
Detection of harmful social bots

                                           Tweets
                 Account

                                            How do we
                                            identify and
                                            classify bots
        Harm1                               according to the
                                   Human    harm they may
                Harm2      HarmN            cause?
Ideas for research from social bots to beer - Thesis proposals - Florian Daniel
What has been done so far?
Ideas for research from social bots to beer - Thesis proposals - Florian Daniel
Not Safe For Work

                                      Profile picture stolen from
                                      existing people (usually women)

                                                                      Personalized profile preferences
                                                                      aimed at emulating actual users
 Catchy profile
 description
 paired with
 unsafe URL

                                                                      Tweets aim to
                                                                      redirect users
                                                                      to untrusted URLs

Lorenzo Cannone, Matteo Di Pierro. Detection and Classification of Harmful Bots in Human-Bot Interactions on Twitter.
MSc Thesis, Politecnico di Milano, December 2018.
Ideas for research from social bots to beer - Thesis proposals - Florian Daniel
News-Spreader

                                 Propagandistic profile
                                 picture and tile

   Propaganda                                       High number of interactions
   hashtags and
   messages in
   description

                                                                Lots of (political) news
                                                                retweeting

Lorenzo Cannone, Matteo Di Pierro. Detection and Classification of Harmful Bots in Human-Bot Interactions on Twitter.
MSc Thesis, Politecnico di Milano, December 2018.
Ideas for research from social bots to beer - Thesis proposals - Florian Daniel
Spam-Bot
                                                                                         Profile tile with catchy
                                  Profile picture with                                   messages
                                  corporate logo

 Corporate link
                                                             High number of tweets
 in description
                                                             with high frequency
 paired with
 messages of
 job offers
 (product sales)

                                                              Repetitive tweets aimed at
                                                              spamming URLs

Lorenzo Cannone, Matteo Di Pierro. Detection and Classification of Harmful Bots in Human-Bot Interactions on Twitter.
MSc Thesis, Politecnico di Milano, December 2018.
Ideas for research from social bots to beer - Thesis proposals - Florian Daniel
Fake-Follower

                        Optional profile picture                                               Poor profile
                        (usually missing)                                                      customization

  Poor                                             Low number of content posting (usually null)
  description                                      Following as main interaction
  (usually
  empty)

                                                           Optional (re)tweeting
                                                           activity

Lorenzo Cannone, Matteo Di Pierro. Detection and Classification of Harmful Bots in Human-Bot Interactions on Twitter.
MSc Thesis, Politecnico di Milano, December 2018.
Thesis ingredients

               Creation of
               datasets

                                  Multi-class ensemble classifier

      Feature selection and engineering

Can we add more types of harm?
How to train classes as users use BotBuster?                        Web app
                                                                Figure 7.3: Client - Server architecture

                                            Inception neural network. Users with less than 10 tweets, or less than 10
Identification of harmful communication patterns

                     Bot code repositories

           GitHub

             Which potential harms can we identify
             inside the code of the bots?
ne
                                                                                                                 ts

                                                                                                                                                                                           n
                                                                                                                                                    s
                                                                                                                                                  ce
                                                                                                               fac

                                                                                                                                                                                        tio
                                                                                                                                                 on
                                                                                                                                                 e

                                                                                                                                              bs

                                                                                                                                                                                     ma
                                                                                                                                             siv

                                                                                                                                             ati
                                                                                                            ive

                                                                                                                                           ro

                                                                                                                                        lleg
                                                                                                                                          en

                                                                                                                                                                                 for
                                                                                                           sit

                                                                                                                                          g
                                                                                                                                      to

                                                                                                                                                                                  t
   Harmful code

                                                                                                                                     off

                                                                                                                                                                              ce
                                                                                                                                        n

                                                                                                                                                                                s
                                                                                                                                                                            isin
                                                                                                          en

                                                                                                                                    ea

                                                                                                                                                                            file
                                                                                                                                                                          ere
                                                                                                                                      i
                                                                                                                                  ten
                                                                                                                                   en

                                                                                                                                                                          pa
                                                                                                                                sly
                                                                                                        es

                                                                                                                                                           dm

                                                                                                                                                                        pro
                                                                                                                                 s
                                                                                                               e

                                                                                                                                                                     int
                                                                                                                              ec

                                                                                                                             ea

                                                                                                                              fal

                                                                                                                                                                       es
                                                                                                            rat

                                                                                                                            ive
                                                                                                                       s
                                                                                                       s
                                                                                               e

                                                                                                                   gro

                                                                                                                          ind
                                                                                                   clo

                                                                                                                          thr

                                                                                                                         am

                                                                                                                                                                   ic
                                                                                                                                                          rea

                                                                                                                                                                    ne
                                                                                                           nig

                                                                                                                         ke
                                                                                             us

                                                                                                                                                                    ad
                                                                                                                         ce

                                                                                                                                                                   m
                                                                                                                                                                Clo
   patterns

                                                                                                  Dis

                                                                                                                      Ma

                                                                                                                                                                Inv
                                                                                                                      De

                                                                                                                      Sp

                                                                                                                                                          Sp
                                                                                           Ab

                                                                                                                      Be

                                                                                                                      Be
                                                              Pattern

                                                                                                         De

                                                                                                                 Be
                                                   Action

                                                                                                                                                                Mi
                                                    Follow    Indiscriminate follow
                                                              Whitelist-based follow
                                                              Blacklist-based follow

                Twitter                                       Phantom follow

                                                    Like      Indiscriminate like
                                                              Whitelist-based like

                   Python
                                                              Blacklist-based like
                                                              Mass like

                                                    Tweet     Fixed-content tweet

   Next: patterns search                                      AI-generated tweet
                                                              Trusted source tweet

   engine + web site for                            Mention   Indiscriminate mention

   users                                                      Opt-in mention
                                                              Targeted mention
                                                              Whitelist-based mention
                                                              Blacklist-based mention

   What else can we                                 Retweet   Indiscriminate retweet

   learn from the code of
                                                              Whitelist-based retweet
                                                              Blacklist-based retweet

   bots? Is it possible to
                                                              Mass retweet

                                                    Talk to   Indiscriminate talk

   trace back from                                            Fixed-content talk
                                                              AI-generated talk

   messages to code?                                          Talk with opt-in
                                                              Targeted talk

                                                    Pause     Mimic human
Andrea Millimaggi and Florian Daniel. On Social               Satisfy API contraints
Bots Behaving Badly: Empirical Study of Code
Patterns on GitHub. Submitted for publication to    Store     Store persistently

ICWE 2019, under review.                                                Enables        Prevents          Vulnerable to content abuse              Vulnerable to trust abuse
Conversational screen readers

                                       Amazon
                                       Alexa

                                Google Assistant

 Can we extract conversational knowledge from webpages?
 What about “talking” to websites?
Domain-specific content extraction

                                     How effectively
                                     can we extract
                                     recipes from
                                     free text? And
                                     how do we do
                                     it?
Typical data science steps

              Domain understanding

              Data collection

              Manual inspection of data

              Hypotheses formulation

              Feature engineering, data labeling

              Algorithm engineering (from AI/machine learning
              to statistics)

Online tool   Validation and hypotheses verification
Soft skills

Proposals

Template

Florian Daniel
http://www.floriandaniel.it
You can also read