2021 Research Boot Camp Series - Presented by: Research in Outcomes for Children's Surgery (ROCS) Center for Children's Surgery

Page created by Charles Simpson
 
CONTINUE READING
2021 Research Boot Camp Series - Presented by: Research in Outcomes for Children's Surgery (ROCS) Center for Children's Surgery
2021 Research Boot
Camp Series

Presented by:
Research in Outcomes for Children’s Surgery (ROCS)
Center for Children’s Surgery
2021 Research Boot Camp Series - Presented by: Research in Outcomes for Children's Surgery (ROCS) Center for Children's Surgery
Day 2:
Working with a Biostatistician, Data
Collection Practices, and REDCap
            Overview
          Maxene Meier, MS
2021 Research Boot Camp Series - Presented by: Research in Outcomes for Children's Surgery (ROCS) Center for Children's Surgery
From Jill:
Series Schedule
     1) ask people for successful grants, aims pages,
❑ Dayand
      1, FridayIRB
                 August protocols
                   th

      ֍ Study Design
                         7 : “Designing Your Study and Protocol”
                                                                       Thank
     2)    getQuestions
      ֍ Research   helpandwith  Aims      your research idea as soon as you
❑ Day 2, Friday August 15 : “IRB Submission and Data Collection”
     can
      ֍ Humanso
                    th
                     that
                Subject Researchyou can have help outing the projects
                                                                      you Jill!
      ֍ Required Training and Submission Guidelines
     and       howBasicstoandpull
      ֍ Chart Review             databasetogether
                                           variable types a research program
      ֍ Collecting Data and Database Building: REDCap
     3)    aims and proposals are an interactive process
❑ Day 3, Friday August 21 : “Data Analysis and Manuscript Writing”
                    st

     and       are done simultaneously throughout the grant
      ֍ Basics of Manuscript Writing
      ֍ Working with a Statistician

     writing         process
      ֍ Preparing Data for Analysis
      ֍ Presenting Your Results: figures and tables
    ֍ Authorship
2021 Research Boot Camp Series - Presented by: Research in Outcomes for Children's Surgery (ROCS) Center for Children's Surgery
Meet your ROCS Biostatisticians…

Suhong Tong, MS                     Maxene Meier, MS            Kaci Pickett, MS           Emily Cooper, BS
Senior Research Instructor,         Research Instructor,        Research Instructor,       Student Research Assistant,
Department of Pediatrics            Department of Pediatrics    Department of Pediatrics   Biostatistics & Informatics
Area of Expertise:                  Areas of expertise:         Areas of expertise:        Area of Expertise:
• Large data                        • Large data                • Survival analysis        • Statistical consulting
• Statistical consulting and data   • Assessing categorical &   • Dynamic prediction
   analysis                            continuous forms of      • Statistical consulting
• Longitudinal data and Time           agreement
   series                           • Reproducible research
• Survey analysis with population      techniques
   complex design                   • Statistical consulting
• Structural Equation Modeling
• Quality Improvement Analysis
2021 Research Boot Camp Series - Presented by: Research in Outcomes for Children's Surgery (ROCS) Center for Children's Surgery
Steps to working with
your Biostatistician
                        1. Involve us early

                        2. Education

                        3. Collaboration and Communication

                        4. Sample Size

                        5. Time

                        6. Scruples

                        7. Acknowledgement
2021 Research Boot Camp Series - Presented by: Research in Outcomes for Children's Surgery (ROCS) Center for Children's Surgery
1: Involve Us Early
•Involve the biostatistician early,
starting with the design phase of your
research

•Reach out to us early and often!
2021 Research Boot Camp Series - Presented by: Research in Outcomes for Children's Surgery (ROCS) Center for Children's Surgery
1: Involve Us Early
2021 Research Boot Camp Series - Presented by: Research in Outcomes for Children's Surgery (ROCS) Center for Children's Surgery
2: Education
• Plan for two-way educational process

• Often need to teach the biostatistician
  about your field of inquiry- disease process,
  anatomy, measurements/key variables

• Provide the biostatistician with background
  articles in your field
2021 Research Boot Camp Series - Presented by: Research in Outcomes for Children's Surgery (ROCS) Center for Children's Surgery
2: Education
• Plan for two-way educational process

• Often need to teach the biostatistician
  about your field of inquiry- disease process,
      We measurements/key
  anatomy,  are excited to          learn about
                                 variables        your
                             topic!
• Provide the biostatistician with background
  articles in your field
2021 Research Boot Camp Series - Presented by: Research in Outcomes for Children's Surgery (ROCS) Center for Children's Surgery
2: Education
• Do I have more than one primary outcome?
• Which of my outcomes is the focus of interest?
• Do I have preliminary data from previous
  studies?
• What does the literature review tell me I could
  expect in my control group?
2: Education

• What size of difference am I likely to observe?

• Do I have a restriction on patient numbers
 (due to timescale, rare disease, etc.)?

• What kind of data do I have?
3: Collaboration & Communication

•    Background                     • Baseline/follow-up data
•    Objectives                       collection
•    Basic study design             • Quality control
•    Study population               • Sample Size
•    Stratification/randomization   • Statistical Analysis
•    Definition of                  • Organization
     treatments/endpoints           • Budget
3: Collaboration & Communication

• Confirmation you received and understand the information
• Keep us updated on the submission process for abstract, poster, and
  paper
    •   Even if it to say it is not being submitted
• Submission for funding results
4: Sample Size
4: Sample Size
• Feasibility:
     • Are the numbers doable?
     • Do you have the time it will take to get that number?
• Everything is related:
     • Given a sample size and power, what is the effect
       size I can detect?
     • Given an effect size and sample size, what is the
       power?
• Ethics and safety of sample size
5: Time
• You need to give the biostatistician time to
  learn and work through the solutions

• They are typically working on a number of
  projects simultaneously

• Their time is usually much in demand
5: Time
• Plan ahead!
• We need at least 4 weeks from receiving data
• Timing can change depending on data quality and unforeseen complexities

Request                                        General Timeframe

Power/Sample size alone                        5-10 hours

Writing proposal and analysis plan             20-40 hours

Analysis (depending on complexity)             40-120 hours +
6: Scruples
Beware:
• Do not go on a fishing expedition with your biostatistician.
  Stick to the protocol objectives
• “If you torture the data enough, they will confess”
7. Authorship and Acknowledgement
• The biostatistician should be considered a co-investigator
• Budget for their time in grants
• Collaborate regularly with them throughout the study
• Consider them as co-authors on final papers- usually 2nd author
• Authorship guidelines :
 http://www.ucdenver.edu/academics/colleges/PublicHealth/research/centers/CBC/gra
 nts/Pages/Authorship.aspx
Data Variable types
How comfortable are you with data?
           1. Super comfortable, could build a
Data?         database right now
           2. Semi comfortable but would like
              more instruction
           3. You know nothing (Jon Snow)
Data?

If you google
“types of data”
you get this:
Types of data
This came from
“6 types of data”
Types of data
                    Categorical
This came from
“6 types of data”

                    Continuous
Types of data: Categorical
Categories or groups

Examples:
• Gender: Male, Female, Other
• Eye Color: Blue, Brown, Hazel
• Grade: 1st, 2nd, 3rd…
Types of data: Categorical
Categories or groups

Examples:
• Gender: Male, Female, Other
• Eye Color: Blue, Brown, Hazel
• Grade: 1st, 2nd, 3rd…
Types of data: Continuous
Numeric values

Examples:
• Height
• Weight
• Temperature
Types of data: Continuous
Numeric values

Examples:
• Height
• Weight
• Temperature
This is an example of:
                         1. Categorical Data
                         2. Continuous Data
What type of data?       3. This isn’t even data
                         4. Do not know
1. Age in years
This is an example of:
                                     1. Categorical Data
                                     2. Continuous Data
What type of data?                   3. This isn’t even data
                                     4. Do not know
1. Age in years
2. Age (0 – 5 years, 6 – 10 years, 11-15 years, etc.)
This is an example of:
                                     1. Categorical Data
                                     2. Continuous Data
What type of data?                   3. This isn’t even data
                                     4. Do not know
1. Age in years
2. Age (0 – 5 years, 6 – 10 years, 11-15 years, etc.)
3. Blood Pressure
This is an example of:
                                       1. Categorical Data
                                       2. Continuous Data
What type of data?                     3. This isn’t even data
                                       4. Do not know
1.   Age in years
2.   Age (0 – 5 years, 6 – 10 years, 11-15 years, etc.)
3.   Blood Pressure
4.   Blood type
This is an example of:
                                       1. Categorical Data
                                       2. Continuous Data
What type of data?                     3. This isn’t even data
                                       4. Do not know
1.   Age in years
2.   Age (0 – 5 years, 6 – 10 years, 11-15 years, etc.)
3.   Blood Pressure
4.   Blood type
5.   Income
This is an example of:
                                       1. Categorical Data
                                       2. Continuous Data
What type of data?                     3. This isn’t even data
                                       4. Do not know
1.   Age in years
2.   Age (0 – 5 years, 6 – 10 years, 11-15 years, etc.)
3.   Blood Pressure
4.   Blood type
5.   Income
6.   Income ($30,000 - $50,000, $51,000 - $70,000,
     etc.)
This is an example of:
                                   1. Categorical Data
                                   2. Continuous Data
What type of data?                 3. This isn’t even data
                             Income is a sensitive
                                   4. Do not know
1. Age in years                        question
2.                            Others
   Age (0 – 5 years, 6 – 10 years,        include:
                                   11-15 years, etc.) drug
3. Blood Pressure
4. Blood type                usage, abortion, and
5. Income                        sexual behavior
6. Income ($30,000 - $50,000, $51,00 - $70,000,
   etc.)
This is an example of:
                                     1. Categorical Data
                                     2. Continuous Data
What type of data?                   3. This isn’t even data
                                     4. Do not know
1. Age in years
2. Age (0 – 5 years, 6 – 10 years, 11-15 years, etc.)
3. Blood Pressure
4. Blood type
5. Income
6. Income ($30,000 - $50,000, $51,00 - $70,000,
   etc.)
7. Midge
This is an example of:
                                     1. Categorical Data
                                     2. Continuous Data
What type of data?                   3. This isn’t even data
                                     4. Do not know
1. Age in years
2. Age (0 – 5 years, 6 – 10 years, 11-15 years, etc.)
3. Blood Pressure
4. Blood type
5. Income
6. Income ($30,000 - $50,000, $51,00 - $70,000,
   etc.)
7. Midge
Data

       Continuous → Categorical
Data

   Continuous → Categorical
   Age in years → Age groups
Data

    Continuous → Categorical
    Age in years → Age groups
   1, 2, 3, 3, 3, 4, 5, 7, 10, 11
Data

    Continuous → Categorical
    Age in years → Age groups
   1, 2, 3, 3, 3, 4, 5, 7, 10, 11
        Median [IQR]: 3.5 [3, 6.5]
        Mean (SD): 4.9 (3.4)
Data

    Continuous → Categorical
    Age in years → Age groups
   1, 2, 3, 3, 3, 4, 5, 7, 10, 11
Data

    Continuous → Categorical
    Age in years → Age groups
   1, 2, Under
          3, 3,5 3, 4, 5, 57,and10,
                                 over11

          Under 5: 6 (60%)
          5 and Over: 4 (40%)
Data

    Continuous / Categorical
    Age in years / Age groups
   1, 2, Under
          3, 3,5 3, 4, 5, 57,and10,
                                 over11
Data
Think about how your question will be answered

Is it a sensitive topic (i.e. will your patient answer it
truthfully/at all)? Will they give you the truthful
answer or one they think you want to hear?

Will you need to find the mean?
How will you accurately gather it?
Data Management
Data         1. PHI                  5. Excel and REDCap
Management
Overview     2. Things to consider   6. HIPAA Compliance
                before patients      7. Tips and Tricks
             3. REDCap

             4. Data Hygiene
1. PHI
• Protected Health Information
• Patients’ privacy comes first
• What is PHI?
    • Name, MRN, Birthdate, address (even zip code)
    • Visit Dates, phone numbers, email address
    • Diagnosis, treatment information, medical test results, prescriptions
• Electronic or physical records
• Can you identify someone from it?
1. PHI
• Protected Health Information
• Patients’ privacy comes first
• What is PHI?
    • Name, MRN, Birthdate, address (even zip code)
    • Visit Dates, phone numbers, email address
    • Diagnosis, treatment information, medical test results, prescriptions
• Electronic or physical records
• Can you identify someone from it?
2. Things to Consider Before Patients
Consider everything you might want to look at
•   Any potential confounders?
•   Baseline demographics and clinical characteristics
•   ID variables – if you have multiple data sources, how will
    they be linked?
•   Longitudinal variables?
2. Things to Consider Before Patients
Consider everything you might want to look at
•   Any potential confounders?
•   Baseline demographics and clinical characteristics
•   ID variables – if you have multiple data sources, how will
    they be linked?
•   Longitudinal variables? variables – e.g., visit dates
2. Things to Consider Before Patients
Consider everything you might want to look at
•
•
     When in doubt:
    Any potential confounders?
    Baseline demographics and clinical characteristics
CONSULT A BIOSTATISTICIAN
•   ID variables – if you have multiple data sources, how will
    they be linked?
    Process variables – e.g., visit dates
•
    or Methodologist
2. Things to Consider Before Patients
Continuous variables

•   Anything measured on a numeric scale with an infinite
    number of possible values – weight, time, concentrations

•   Make sure units are consistent throughout the dataset

•   Build in range checks, if possible
2. Things to Consider Before Patients
Categorical variables

•   Data that can be grouped – race, sex, region

•   Two levels = Dichotomous

•   Be careful to not breakdown into too many levels (makes it
    hard to ensure enough data in each level)
2. Things to Consider Before Patients
Before collection, determine how data will be managed
• Does you data have a unique identifier (ideally a study id
   which is linked to their MRN or HAR)
• Do you have a clear description of how the data were
   generated and processed – if you were unable to continue for
   any reason could someone come in and understand exactly
   what you did? Also important for methods section!
2. Things to Consider Before Patients
Before collection, determine how data will be managed
• Where will it be stored?
• Will it be accessible by everyone that needs to access it (and
   only those individuals)? Do people who are not on your IRB
   have access to your files?
• Do you have a codebook?
2. Building a Data Dictionary/Codebook
•   Fill dictionary should be made prior to collection
•   Stored on a new sheet or separate excel file
2. Building a Data Dictionary/Codebook
2. Building a Data Dictionary/Codebook
•   Fill dictionary should be made prior to collection
•   Stored on a new sheet or separate excel file
2. Building a Data Dictionary/Codebook
•   Fill dictionary should be made prior to collection
•   Stored on a new sheet or separate excel file
3. REDCap

Research Electronic Data Capture
• Developed at Vanderbilt University
• Secure way to store data
• Must be invited to access the database
• HIPAA Compliant
• On campus: Amanda Miller
3. REDCap

A few capabilities
• Email out surveys (linked to main database)
• Branching logic
• Customize user rights
• Easy to use – tons of online tutorials
• Tracks all changes
4. Data Hygiene
• 10 minutes on the front end saves hours (and $$$!)
  on the back end
• The first variable on the first form is a unique
  record identifier
• Create a data dictionary as you go (if not in
  REDCap)
• Validate and spot check
• Do double data entry if possible
4. Data Hygiene
• Validate text fields and define minimum/maximum
  ranges
• Similar variables go together on short, separate
  forms
• Use field notes to describe units, formats, etc.
• Determine units and coding before data
  collection; don’t change halfway through
• BE CONSISTENT
4. Data Hygiene
5. Excel vs REDCap
Data Storage
• Most preferred = REDCap
• More convenient = Excel
• More secure = REDCap
• Data dictionary automatic generation = REDCap
5. Excel vs REDCap
Data Storage
• Most preferred = REDCap
• More convenient = Excel
• More secure = REDCap
• Data dictionary automatic generation = REDCap
5. Excel vs REDCap
Data Storage
• Most preferred = REDCap
• More convenient = Excel REDCap
• More secure = REDCap
• Data dictionary automatic generation = REDCap

REDCap also automatically tracks changes, can easily
adjust user rights, and it automatically HIPAA compliant
6. HIPAA Compliance
• Excel require additional steps
• REDCap is HIPAA compliant automatically
6. HIPAA Compliance- Your Responsibilities
•   Never share your username or password
•   Never share an API token (unique token used to link applications)
•   Mark all identifiers as identifiers (PHI)
•   Strip identifiers before exporting if not necessary or if exporting
    to unsecured device
•   Give minimum required user rights to all users
•   Never give Project Design or User Management rights to users with
    basic accounts
7. Tips and Tricks
•   Provide only Patient ID numbers: Protected Health Information (PHI)
    should not be included (do NOT include patient names)
•   Character (A, AB, O) and numeric values (0, 1, 2) should not be
    mixed within one column.
•   Use range checks when setting up the database
•   Missing values should either be marked with "NA" or simply left blank
     •   BE CONSISTENT
•   No units in continuous variable entries – Should be in data dictionary
     •   BE CONSISTENT- units should be same through entire column (“kg” vs
         “lbs”)
7. Tips and Tricks
•No punctuation or spaces (e.g. commas, quotes, ) in data values
•No special formatting:
    •colored text

    •Italics
    •bolding,
    •super or sub scripting
    •the “comment” feature
•No capitalization/ spelling variations
7. Tips and Tricks
•No punctuation or spaces (e.g. commas, quotes, ) in data values
•No special formatting:
    •colored text                         Programming languages
                                          do not recognize these
    •Italics
    •bolding,                             features which could
    •super or sub scripting               result in lost information
    •the “comment” feature
•No capitalization/ spelling variations
7. Tips and Tricks
•No punctuation or spaces (e.g. commas, quotes, ) in data values
•No special formatting:
    •colored text

    •Italics
    •bolding,
    •super or sub scripting
    •the “comment” feature                    Creates additional
•No capitalization/ spelling variations       unintentional levels
7. Tips and Tricks

    ID        Sex      Age   Weight
    1         Male     12    90
    2         male     3     14.2
    3         F        2     28
    4         Female   50    40
    5         Male     5     17.9
7. Tips and Tricks

    ID        Sex      Age   Weight
    1         Male     12    90
    2         male     3     14.2
    3         F        2     28
    4         Female   50    40
    5         Male     5     17.9
7. Tips and Tricks
•Choose appropriate format for the variable (text vs. numbers)
     •Avoid text boxes and un-validated fields
•For categorical variables consider whether a check box (multiple
selection) or radio button (single selection) is more appropriate
     •Consider what the variable is asking, can you ‘check’ more
     than one
•Make sure the variable labels are meaningful
7. Tips and Tricks
•Name – how the computer recognizes the variable
     •Unique
     •Short but descriptive
     •No spaces
     •No special characters except “_”
     •E.g., SCR_WT
•Label – (slightly) longer description of the variable including units
     •E.g., “screening weight (kg)”
7. Tips and Tricks

This is a learning process and no dataset
          is perfect (right away)
Questions?
You can also read