Highlights of EARL 2018 - Adnan Fiaz Julian Ferry Hannah Frick Dragoș Moldovan-Grünfeld - LondonR

Page created by Alex Newman
 
CONTINUE READING
Highlights of EARL 2018 - Adnan Fiaz Julian Ferry Hannah Frick Dragoș Moldovan-Grünfeld - LondonR
Highlights of
EARL 2018
Adnan Fiaz
Julian Ferry
Hannah Frick
Dragoș Moldovan-Grünfeld
Highlights of EARL 2018 - Adnan Fiaz Julian Ferry Hannah Frick Dragoș Moldovan-Grünfeld - LondonR
Agenda

   Facts

   Highlights

   Next
Highlights of EARL 2018 - Adnan Fiaz Julian Ferry Hannah Frick Dragoș Moldovan-Grünfeld - LondonR
Facts
Highlights of EARL 2018 - Adnan Fiaz Julian Ferry Hannah Frick Dragoș Moldovan-Grünfeld - LondonR
EARL London 2018
•   5th EARL London Conference
•   3 Keynote speakers
•   5 Workshops
•   3 Streams
•   56 Presentations
    – lightning talks for the first time
• 1 Panel Discussion
• 2 Evening Networking Events
Highlights of EARL 2018 - Adnan Fiaz Julian Ferry Hannah Frick Dragoș Moldovan-Grünfeld - LondonR
The Workshops
•   R in 6 Hours
•   Shiny – Beyond the Basics
•   Deep Learning with Keras in R
•   A Crash Course in Python for R Users
•   Functional Programming with purrr
Highlights of EARL 2018 - Adnan Fiaz Julian Ferry Hannah Frick Dragoș Moldovan-Grünfeld - LondonR
Attendees
Highlights of EARL 2018 - Adnan Fiaz Julian Ferry Hannah Frick Dragoș Moldovan-Grünfeld - LondonR
Speakers
Highlights of EARL 2018 - Adnan Fiaz Julian Ferry Hannah Frick Dragoș Moldovan-Grünfeld - LondonR
Highlights of EARL 2018 - Adnan Fiaz Julian Ferry Hannah Frick Dragoș Moldovan-Grünfeld - LondonR
Reception
Highlights of EARL 2018 - Adnan Fiaz Julian Ferry Hannah Frick Dragoș Moldovan-Grünfeld - LondonR
On the way to the IWM
Data Driven Decision-
      Making
        Adnan Fiaz
Data Driven Decision-making
 Keynotes:
  • Winning in a data-driven world, Edwina Dunn
  • Building a Data Driven Company, Rich Pugh

 Talks:
  • Decision Lead Data Science, Steven Wilkins
  • A brief history of Data at Autotrader, Paul
    Owens
  • R – The tool for Screwfix, Gavin Jackson
Have a (data) strategy
“Focus on the data you need rather than the
data you have” – Edwina Dunn

“Know how to build the ‘engine’, now it needs
to drive the car” – Rich Pugh
Not madmen but math (wo)men
“A key differentiator for businesses…is a culture
of continuous learning” – Edwina Dunn

“The key role of data scientists in the coming
years is one of educator” – Rich Pugh
Special mention
Finding out what Parliament thinks, Sam
Tazzyman (Ministry of Justice)

• Explaining complex topics simply
• Show your code in action (and link to it)
• Why so serious?
Machine Learning
      Julian Ferry
      Hannah Frick
Balancing model complexity
and interpretability
 In defence of complexity:
  • The power of machine learning in segmenting
    CRM databases, Jeremy Horne
  • The making of a real-world Moneyball – finding
    undervalued players with h2o, Jo-Fai Chow

 In defence of interpretability:
  • Understanding your model, Kasia Kulma
  • Measuring Marketing Performance, Wojtek
    Kostelecki
Complex models in CRM
segmentation - Jeremy Horne
• How do we identify the customers on a
  CRM database who are most likely to
  make a purchase this month?
• Most databases are dominated by lower
  value segments
Separating low value segments
             • Tools used:
                – Kernlab package
                – Boosting to focus on outliers –
                  outcomes that are not ‘normal’

             Key takeaway:

             Machine learning models can help
             us differentiate between customers
             within the same group, where
             decision-tree type rules fail.
In defence of interpretability –
Kasia Kulma
In defence of interpretability –
Kasia Kulma
In defence of interpretability –
Kasia Kulma
LIME – Local Interpretable Model-
Agnostic Explanations
Predicting baseball player
performance with h2o, Jo-Fai Chow
• Problem: Finding undervalued baseball
  players in Major League Baseball (MLB)
End result – Shiny + LIME
The beauty of linear models, Wojtek
Kostelecki
• Modelling contributions to mileage
The beauty of linear models, Wojtek
Kostelecki
Using a linear model we can extract the individual
contribution of each variable to sales
David Smith – Not Hotdog
• Not Hotdog: Image recognition with R and
  the Custom Vision API
David Smith – Not Hotdog
David Smith – Not Hotdog
David Smith – Not Hotdog
David Smith – Not Hotdog
R Code:
https://github.com/revodavid/nothotdog
Lars Kjeldgaard - modelgrid
• A ‘caret’-based Framework for
   Training Multiple Tax Fraud
   Detection Models
• Framework for creating,
   managing and training
   multiple caret models
• Pipe-friendly
Lars Kjeldgaard - modelgrid
library(modelgrid)

# create model grid object
credit_default_models % pull(Class),
    x = GermanCredit %>% select(-Class),
    metric = "ROC",
    trControl = tr_control
  )
Lars Kjeldgaard - modelgrid
# add a random forest model
credit_default_models %
  add_model(model_name = "Funky Forest",
            method = "rf",
            tuneGrid = data.frame(mtry = c(10, 20)))

# add an eXtreme gradient boosting model
credit_default_models %
  add_model(model_name = "Big Boost",
            method = "xgbTree",
            nthread = 8)
Lars Kjeldgaard - modelgrid
# train models and evaluate
credit_default_models %
  train(.)

credit_default_models$model_fits %>%
  resamples(.) %>%
  bwplot(.)
Reproducibility and R in
     Production
         Dragoș Moldovan-Grünfeld
Reproducibility & R in Production
• Keynote:
  – RMarkdown: The Bigger Picture, Garrett
    Grolemund, RStudio
• Talks:
  – Beyond Prototypes. A Journey to The
    Production Land, Omayma Said, Freelance
  – Bridging the gap between Data Scientists and
    Engineers; using R in production, Leanne
    Fitzpatrick, HelloSoda
Garrett Grolemund (RStudio)
• Reproducibility crisis:
  – ”We created a cargo cult by
    confusing math with science.
    Now we must undo it.”

  – “Create maps, not proofs”

  – “Reproducibility is an
    opportunity”
Leanne Fitzpatrick (HelloSoda)
• “Bridging the gap between
  Data Scientists and Engineers;
  using R in production”
• Barriers to entry (R in
  production)
  – Engineering
  – Infrastructure
  – Data science
  – Cultural
Overcoming barriers
• Deployment:
  – central to the data science process
  – Solution: Docker
• Plumbing/ integration
  – Solution: code as a service with Plumber
• Package and dependency management
  – Solution: pacman
Overcoming barriers (cont’d)
• Reproducible framework
  – Solution: Project Template http://projecttemplate.net
• Stability & error handling
  – Solution: testing & CI
  – testthat and usethis
• Scaling
  – Solution: docker
• Culture
  – Solution: collaboration
Omayma Said (Freelance)
• “Beyond Prototypes. A Journey
  to The Production Land”
• Challenges: reproducibility,
  portability, and accessibility
• Docker
• Use/Modify available
  Dockerfiles
• Use helper packages
Helper packages
• containerit
   – Package an R workspace and all
     dependencies as a Docker container
• liftr
   – Containerize R Markdown documents
   for continuous reproducibility
• rize
   – A robust method to automagically dockerize
     your Shiny application
Special mention
Using R and Shiny to improve hospital
operations, Christian Moroy and Jonathan Bruce
(Edge Health)

• Predict how long operations take using R
• Recommend free slots that should be filled
  via Shiny
• Disseminate daily reports via markdown +
  email (from R)
• Saved a predicted £4m in 2017/18
Next?
EARL US Roadshow
7 November 2018, Seattle, WA
            Julia Silge
            Data Scientist @ Stack Overflow
            Co-author Text Mining with R with David Robinson
            Co-author tidytext package
EARL US Roadshow
9 November 2018, Houston, TX
           Robert Gentleman
           Vice President of Computational Biology @ 23andMe
           One of the designers of the R programming language

           Hadley Wickham
           Chief Scientist @ RStudio
           Author of numerous books on R
           Prolific R package author
EARL US Roadshow
13 November 2018, Boston, MA
           Bob Rudis (@hrbrmstr)
           Chief Security Data Scientist @ Rapid7
           Prolific tweeter, package author and blogger
The End
You can also read