2021 Research Boot Camp Series - Presented by: Research in Outcomes for Children's Surgery (ROCS) Center for Children's Surgery
←
→
Page content transcription
If your browser does not render page correctly, please read the page content below
2021 Research Boot Camp Series Presented by: Research in Outcomes for Children’s Surgery (ROCS) Center for Children’s Surgery
Day 2: Working with a Biostatistician, Data Collection Practices, and REDCap Overview Maxene Meier, MS
From Jill: Series Schedule 1) ask people for successful grants, aims pages, ❑ Dayand 1, FridayIRB August protocols th ֍ Study Design 7 : “Designing Your Study and Protocol” Thank 2) getQuestions ֍ Research helpandwith Aims your research idea as soon as you ❑ Day 2, Friday August 15 : “IRB Submission and Data Collection” can ֍ Humanso th that Subject Researchyou can have help outing the projects you Jill! ֍ Required Training and Submission Guidelines and howBasicstoandpull ֍ Chart Review databasetogether variable types a research program ֍ Collecting Data and Database Building: REDCap 3) aims and proposals are an interactive process ❑ Day 3, Friday August 21 : “Data Analysis and Manuscript Writing” st and are done simultaneously throughout the grant ֍ Basics of Manuscript Writing ֍ Working with a Statistician writing process ֍ Preparing Data for Analysis ֍ Presenting Your Results: figures and tables ֍ Authorship
Meet your ROCS Biostatisticians… Suhong Tong, MS Maxene Meier, MS Kaci Pickett, MS Emily Cooper, BS Senior Research Instructor, Research Instructor, Research Instructor, Student Research Assistant, Department of Pediatrics Department of Pediatrics Department of Pediatrics Biostatistics & Informatics Area of Expertise: Areas of expertise: Areas of expertise: Area of Expertise: • Large data • Large data • Survival analysis • Statistical consulting • Statistical consulting and data • Assessing categorical & • Dynamic prediction analysis continuous forms of • Statistical consulting • Longitudinal data and Time agreement series • Reproducible research • Survey analysis with population techniques complex design • Statistical consulting • Structural Equation Modeling • Quality Improvement Analysis
Steps to working with your Biostatistician 1. Involve us early 2. Education 3. Collaboration and Communication 4. Sample Size 5. Time 6. Scruples 7. Acknowledgement
1: Involve Us Early •Involve the biostatistician early, starting with the design phase of your research •Reach out to us early and often!
2: Education • Plan for two-way educational process • Often need to teach the biostatistician about your field of inquiry- disease process, anatomy, measurements/key variables • Provide the biostatistician with background articles in your field
2: Education • Plan for two-way educational process • Often need to teach the biostatistician about your field of inquiry- disease process, We measurements/key anatomy, are excited to learn about variables your topic! • Provide the biostatistician with background articles in your field
2: Education • Do I have more than one primary outcome? • Which of my outcomes is the focus of interest? • Do I have preliminary data from previous studies? • What does the literature review tell me I could expect in my control group?
2: Education • What size of difference am I likely to observe? • Do I have a restriction on patient numbers (due to timescale, rare disease, etc.)? • What kind of data do I have?
3: Collaboration & Communication • Background • Baseline/follow-up data • Objectives collection • Basic study design • Quality control • Study population • Sample Size • Stratification/randomization • Statistical Analysis • Definition of • Organization treatments/endpoints • Budget
3: Collaboration & Communication • Confirmation you received and understand the information • Keep us updated on the submission process for abstract, poster, and paper • Even if it to say it is not being submitted • Submission for funding results
4: Sample Size
4: Sample Size • Feasibility: • Are the numbers doable? • Do you have the time it will take to get that number? • Everything is related: • Given a sample size and power, what is the effect size I can detect? • Given an effect size and sample size, what is the power? • Ethics and safety of sample size
5: Time • You need to give the biostatistician time to learn and work through the solutions • They are typically working on a number of projects simultaneously • Their time is usually much in demand
5: Time • Plan ahead! • We need at least 4 weeks from receiving data • Timing can change depending on data quality and unforeseen complexities Request General Timeframe Power/Sample size alone 5-10 hours Writing proposal and analysis plan 20-40 hours Analysis (depending on complexity) 40-120 hours +
6: Scruples Beware: • Do not go on a fishing expedition with your biostatistician. Stick to the protocol objectives • “If you torture the data enough, they will confess”
7. Authorship and Acknowledgement • The biostatistician should be considered a co-investigator • Budget for their time in grants • Collaborate regularly with them throughout the study • Consider them as co-authors on final papers- usually 2nd author • Authorship guidelines : http://www.ucdenver.edu/academics/colleges/PublicHealth/research/centers/CBC/gra nts/Pages/Authorship.aspx
Data Variable types
How comfortable are you with data? 1. Super comfortable, could build a Data? database right now 2. Semi comfortable but would like more instruction 3. You know nothing (Jon Snow)
Data? If you google “types of data” you get this:
Types of data This came from “6 types of data”
Types of data Categorical This came from “6 types of data” Continuous
Types of data: Categorical Categories or groups Examples: • Gender: Male, Female, Other • Eye Color: Blue, Brown, Hazel • Grade: 1st, 2nd, 3rd…
Types of data: Categorical Categories or groups Examples: • Gender: Male, Female, Other • Eye Color: Blue, Brown, Hazel • Grade: 1st, 2nd, 3rd…
Types of data: Continuous Numeric values Examples: • Height • Weight • Temperature
Types of data: Continuous Numeric values Examples: • Height • Weight • Temperature
This is an example of: 1. Categorical Data 2. Continuous Data What type of data? 3. This isn’t even data 4. Do not know 1. Age in years
This is an example of: 1. Categorical Data 2. Continuous Data What type of data? 3. This isn’t even data 4. Do not know 1. Age in years 2. Age (0 – 5 years, 6 – 10 years, 11-15 years, etc.)
This is an example of: 1. Categorical Data 2. Continuous Data What type of data? 3. This isn’t even data 4. Do not know 1. Age in years 2. Age (0 – 5 years, 6 – 10 years, 11-15 years, etc.) 3. Blood Pressure
This is an example of: 1. Categorical Data 2. Continuous Data What type of data? 3. This isn’t even data 4. Do not know 1. Age in years 2. Age (0 – 5 years, 6 – 10 years, 11-15 years, etc.) 3. Blood Pressure 4. Blood type
This is an example of: 1. Categorical Data 2. Continuous Data What type of data? 3. This isn’t even data 4. Do not know 1. Age in years 2. Age (0 – 5 years, 6 – 10 years, 11-15 years, etc.) 3. Blood Pressure 4. Blood type 5. Income
This is an example of: 1. Categorical Data 2. Continuous Data What type of data? 3. This isn’t even data 4. Do not know 1. Age in years 2. Age (0 – 5 years, 6 – 10 years, 11-15 years, etc.) 3. Blood Pressure 4. Blood type 5. Income 6. Income ($30,000 - $50,000, $51,000 - $70,000, etc.)
This is an example of: 1. Categorical Data 2. Continuous Data What type of data? 3. This isn’t even data Income is a sensitive 4. Do not know 1. Age in years question 2. Others Age (0 – 5 years, 6 – 10 years, include: 11-15 years, etc.) drug 3. Blood Pressure 4. Blood type usage, abortion, and 5. Income sexual behavior 6. Income ($30,000 - $50,000, $51,00 - $70,000, etc.)
This is an example of: 1. Categorical Data 2. Continuous Data What type of data? 3. This isn’t even data 4. Do not know 1. Age in years 2. Age (0 – 5 years, 6 – 10 years, 11-15 years, etc.) 3. Blood Pressure 4. Blood type 5. Income 6. Income ($30,000 - $50,000, $51,00 - $70,000, etc.) 7. Midge
This is an example of: 1. Categorical Data 2. Continuous Data What type of data? 3. This isn’t even data 4. Do not know 1. Age in years 2. Age (0 – 5 years, 6 – 10 years, 11-15 years, etc.) 3. Blood Pressure 4. Blood type 5. Income 6. Income ($30,000 - $50,000, $51,00 - $70,000, etc.) 7. Midge
Data Continuous → Categorical
Data Continuous → Categorical Age in years → Age groups
Data Continuous → Categorical Age in years → Age groups 1, 2, 3, 3, 3, 4, 5, 7, 10, 11
Data Continuous → Categorical Age in years → Age groups 1, 2, 3, 3, 3, 4, 5, 7, 10, 11 Median [IQR]: 3.5 [3, 6.5] Mean (SD): 4.9 (3.4)
Data Continuous → Categorical Age in years → Age groups 1, 2, 3, 3, 3, 4, 5, 7, 10, 11
Data Continuous → Categorical Age in years → Age groups 1, 2, Under 3, 3,5 3, 4, 5, 57,and10, over11 Under 5: 6 (60%) 5 and Over: 4 (40%)
Data Continuous / Categorical Age in years / Age groups 1, 2, Under 3, 3,5 3, 4, 5, 57,and10, over11
Data Think about how your question will be answered Is it a sensitive topic (i.e. will your patient answer it truthfully/at all)? Will they give you the truthful answer or one they think you want to hear? Will you need to find the mean? How will you accurately gather it?
Data Management
Data 1. PHI 5. Excel and REDCap Management Overview 2. Things to consider 6. HIPAA Compliance before patients 7. Tips and Tricks 3. REDCap 4. Data Hygiene
1. PHI • Protected Health Information • Patients’ privacy comes first • What is PHI? • Name, MRN, Birthdate, address (even zip code) • Visit Dates, phone numbers, email address • Diagnosis, treatment information, medical test results, prescriptions • Electronic or physical records • Can you identify someone from it?
1. PHI • Protected Health Information • Patients’ privacy comes first • What is PHI? • Name, MRN, Birthdate, address (even zip code) • Visit Dates, phone numbers, email address • Diagnosis, treatment information, medical test results, prescriptions • Electronic or physical records • Can you identify someone from it?
2. Things to Consider Before Patients Consider everything you might want to look at • Any potential confounders? • Baseline demographics and clinical characteristics • ID variables – if you have multiple data sources, how will they be linked? • Longitudinal variables?
2. Things to Consider Before Patients Consider everything you might want to look at • Any potential confounders? • Baseline demographics and clinical characteristics • ID variables – if you have multiple data sources, how will they be linked? • Longitudinal variables? variables – e.g., visit dates
2. Things to Consider Before Patients Consider everything you might want to look at • • When in doubt: Any potential confounders? Baseline demographics and clinical characteristics CONSULT A BIOSTATISTICIAN • ID variables – if you have multiple data sources, how will they be linked? Process variables – e.g., visit dates • or Methodologist
2. Things to Consider Before Patients Continuous variables • Anything measured on a numeric scale with an infinite number of possible values – weight, time, concentrations • Make sure units are consistent throughout the dataset • Build in range checks, if possible
2. Things to Consider Before Patients Categorical variables • Data that can be grouped – race, sex, region • Two levels = Dichotomous • Be careful to not breakdown into too many levels (makes it hard to ensure enough data in each level)
2. Things to Consider Before Patients Before collection, determine how data will be managed • Does you data have a unique identifier (ideally a study id which is linked to their MRN or HAR) • Do you have a clear description of how the data were generated and processed – if you were unable to continue for any reason could someone come in and understand exactly what you did? Also important for methods section!
2. Things to Consider Before Patients Before collection, determine how data will be managed • Where will it be stored? • Will it be accessible by everyone that needs to access it (and only those individuals)? Do people who are not on your IRB have access to your files? • Do you have a codebook?
2. Building a Data Dictionary/Codebook • Fill dictionary should be made prior to collection • Stored on a new sheet or separate excel file
2. Building a Data Dictionary/Codebook
2. Building a Data Dictionary/Codebook • Fill dictionary should be made prior to collection • Stored on a new sheet or separate excel file
2. Building a Data Dictionary/Codebook • Fill dictionary should be made prior to collection • Stored on a new sheet or separate excel file
3. REDCap Research Electronic Data Capture • Developed at Vanderbilt University • Secure way to store data • Must be invited to access the database • HIPAA Compliant • On campus: Amanda Miller
3. REDCap A few capabilities • Email out surveys (linked to main database) • Branching logic • Customize user rights • Easy to use – tons of online tutorials • Tracks all changes
4. Data Hygiene • 10 minutes on the front end saves hours (and $$$!) on the back end • The first variable on the first form is a unique record identifier • Create a data dictionary as you go (if not in REDCap) • Validate and spot check • Do double data entry if possible
4. Data Hygiene • Validate text fields and define minimum/maximum ranges • Similar variables go together on short, separate forms • Use field notes to describe units, formats, etc. • Determine units and coding before data collection; don’t change halfway through • BE CONSISTENT
4. Data Hygiene
5. Excel vs REDCap Data Storage • Most preferred = REDCap • More convenient = Excel • More secure = REDCap • Data dictionary automatic generation = REDCap
5. Excel vs REDCap Data Storage • Most preferred = REDCap • More convenient = Excel • More secure = REDCap • Data dictionary automatic generation = REDCap
5. Excel vs REDCap Data Storage • Most preferred = REDCap • More convenient = Excel REDCap • More secure = REDCap • Data dictionary automatic generation = REDCap REDCap also automatically tracks changes, can easily adjust user rights, and it automatically HIPAA compliant
6. HIPAA Compliance • Excel require additional steps • REDCap is HIPAA compliant automatically
6. HIPAA Compliance- Your Responsibilities • Never share your username or password • Never share an API token (unique token used to link applications) • Mark all identifiers as identifiers (PHI) • Strip identifiers before exporting if not necessary or if exporting to unsecured device • Give minimum required user rights to all users • Never give Project Design or User Management rights to users with basic accounts
7. Tips and Tricks • Provide only Patient ID numbers: Protected Health Information (PHI) should not be included (do NOT include patient names) • Character (A, AB, O) and numeric values (0, 1, 2) should not be mixed within one column. • Use range checks when setting up the database • Missing values should either be marked with "NA" or simply left blank • BE CONSISTENT • No units in continuous variable entries – Should be in data dictionary • BE CONSISTENT- units should be same through entire column (“kg” vs “lbs”)
7. Tips and Tricks •No punctuation or spaces (e.g. commas, quotes, ) in data values •No special formatting: •colored text •Italics •bolding, •super or sub scripting •the “comment” feature •No capitalization/ spelling variations
7. Tips and Tricks •No punctuation or spaces (e.g. commas, quotes, ) in data values •No special formatting: •colored text Programming languages do not recognize these •Italics •bolding, features which could •super or sub scripting result in lost information •the “comment” feature •No capitalization/ spelling variations
7. Tips and Tricks •No punctuation or spaces (e.g. commas, quotes, ) in data values •No special formatting: •colored text •Italics •bolding, •super or sub scripting •the “comment” feature Creates additional •No capitalization/ spelling variations unintentional levels
7. Tips and Tricks ID Sex Age Weight 1 Male 12 90 2 male 3 14.2 3 F 2 28 4 Female 50 40 5 Male 5 17.9
7. Tips and Tricks ID Sex Age Weight 1 Male 12 90 2 male 3 14.2 3 F 2 28 4 Female 50 40 5 Male 5 17.9
7. Tips and Tricks •Choose appropriate format for the variable (text vs. numbers) •Avoid text boxes and un-validated fields •For categorical variables consider whether a check box (multiple selection) or radio button (single selection) is more appropriate •Consider what the variable is asking, can you ‘check’ more than one •Make sure the variable labels are meaningful
7. Tips and Tricks •Name – how the computer recognizes the variable •Unique •Short but descriptive •No spaces •No special characters except “_” •E.g., SCR_WT •Label – (slightly) longer description of the variable including units •E.g., “screening weight (kg)”
7. Tips and Tricks This is a learning process and no dataset is perfect (right away)
Questions?
You can also read