Measuring working dog performance - Dr Nicola Rooney Anthrozoology Institute
←
→
Page content transcription
If your browser does not render page correctly, please read the page content below
3 Value of quantifying performance • Determine relative impact of factors • e.g. genetics vs rearing environment Series of steps to ensure this !!! • Confirm ideas on best practice • e.g. ideal frequency of training, • Measure trends in performance, the effect of changes in procedure • e.g. procurement sources, different breeds robust, validated and evidence based performance • Need to ensure measures are consistent measurement tools • They must be valid and reliable
Are you sure that they are fully validated and reliable measures? Human pain in patients (Masters1974) depression (McCormack et al 1988) customer satisfaction (Westbrook and Oliver 1981) Animals lameness in sheep (Kaler at al 2009) horses (Popescu and Diugan 2013) quality of life in dogs (Wiseman et al 2006) Dog Mentality Questionnaire based approach (Serpell 2015) Assessment (Svartberg and Forkman 2002)
STEP 1: Identify which aspects of performance are What ? most important Step 2: Standardise vocabulary How? Step 3: Determine how best to measure the behaviours Where? Step 4: Consider which contexts are Why? optimal to measure Step 5: Check measures are reliable and valid Feedback to Who? Step 6 : Determine who best to measure them raters!!! Step 7: Optimise presentation method Step 8:Consider whether training is required - if so design
Step 1: Identify which aspects of performance are most important • Greyhound – winning race Measuring multiple traits • different aspects of performance affected differentially • e.g. rearing in outdoor pens • Increase fitness • Increase fearfulness
What makes the best search dog? Using psychosocial techniques: Interviews, Workshops, Surveys Drive” “ Motivation or “Determination” enthusiasm to Stamina throughout search search “Compulsion ” “Focus” Independence. Ability of the dog Control to search without guidance (responsiveness to verbal and/or Step 2: physical commands) 37 experts – Standardise 100 characteristics! vocabulary
Prioritise attributes: quantitatively 1 Health 2 Acuity of sense of smell VERY HIGH 3 Incentive to find an object which is out of sight Health 4 Ability to learn from being rewarded Acuity of sense of smell 5 Stamina Incentive to find an object 6 Intelligence – ability to act on own initiative which is out of sight 7 Tendency to hunt by smell alone Ability to learn from being 8 Motivation to chase an object rewarded 9 Agility 10 Obedience to human command Stamina 11 Independenc e – ability to work ion own initiative LOW 12 Consistency of behaviour from day to day 13 Tendency to be distracted when searching 14 Interest in toys or objects 15 Travel ability Motivation to obtain food 16 Motivation to retain possession of an object 17 Boldness Reaction to sudden loud noises 18 Playfulness 19 Fear of specific things Level of aggression towards 20 Level of aggression towards humans 21 Excitability humans 22 Ease of adaptation to kennel environment 23 Friendliness to people Level of aggression towards other dogs 24 Willingness to bring an object back to a person 25 Reaction to sudden loud noises 26 Body sensitivity - reactivity to touch and contact with objects 27 Ease of adaptation to new handler VERY LOW 28 Level of aggression towards other dogs Tendency to be distracted when searching 29 Size Fear of specific things 30 Motivation to obtain food Rooney, Bradshaw & Almey, 2004. Journal of F orensic Science 49 (2) 300-306. Rooney & Bradshaw 2004 Applied Animal Behaviour Science 86 123-135
Step 3: Determine how best to measure performance Free search Systematic search Indication Longitudinal assessments • Expert opinion • Consider longer time period Standardised task • Rating scales – need to be • Point sample robust • Can film, to compare raters • Can use objective measures
Rate search Subjective ratings 2 underlying traits General search ability Ability to work without false indications Objective measures 4 measurable traits Free search thoroughness Location ability Systematic search ability Mean number of false indications
How did trainers’ ratings compare to the standard search task? Standardised Standardised search Trainer Trainer ranking search task task ranking Subjective Subjective General General search search ability ability *** *** *p
Step 4: Consider which contexts are optimal to measure Is measuring performance in one context indicative of general performance?
Ability measures varies between searches Style measures were relatively consistent: Natural tendency to re-search Presentation rate Mean reward duration Multiple roles may need a test with multiple elements
16 Need to measure performance • Research and Development / Controlled trials • Test effects of specific factors on performance • Example of our rearing study • Day to day – during normal operations and training • Methods devised during trials can be adapted for this
What affects operational performance day to day? ©Commonwealth of Australia, Department of Defence
Interviews and survey reveal important measures differ between different dog roles Vehicle Arms and Explosives High Assurance Stamina Stamina Motivation or enthusiasm to search Motivation or enthusiasm to search Control (responsiveness Control (responsiveness tto o ccommands) ommands) Thoroughness / coverage of search area Strength of indication Motivation or enthusiasm to search Motivation or enthusiasm to search Control (responsiveness to c ommands) Ability to follow search pattern Boldness or confidence in the environment Boldness or confidence in the environment Ability to follow search pattern Thoroughness / coverage of search area Tendency Tendency tto falsely indicate finds o falsely indicate finds Strength of indication Stamina Strength of indication Strength of indication Stamina Control (responsiveness to c ommands) Independence Independence Boldness / confidence in the environment Agility Stress Boldness / confidence in the Stress dduring transit uring transit Tendency to falsely indicate finds Stress during transit Tendency to falsely indicate finds environment Tolerance to work Tolerance to work aaround other dogs round other dogs Distraction when searching Tolerance to work around other dogs Distraction when searching Consistency from day to day Rooney and Clark (submitted) Pictures © Crown copyright 2012
19 Lone workers Step 5: Check the measures are reliable and valid can they use scales consistently and reliably can they rate their own dogs’ ability?
Trainers and handlers rating large numbers of searches 8 Handler C 6 Number o f Handler B times s core used 4 2 0 1 2 3 4 5 Score (1-‐5) for d istraction Differences between handlers in use of scales Some show range restriction
Are their ratings reliable and valid? • Reliable - Is there agreement between raters? • Valid - Do their ratings relate to an underlying construct?
How similar are handlers in their ratings to each other? Characteristic Agreement Control Motivation Stamina 9 trainees AES handlers Distraction Confidence Independence Indication strength False indications Overall rating Clark, Sibbald and Rooney submitted
People have trouble recognising signs of fear and anxiety (Tami and Gallagher 2009;; Correia et al. 2007;; Mariti et al. 2012). • tail tucked • lip licking • yawning • panting • paw-raising • lowered body • ears flat • averted gaze • hackles up • dilated pupils • startle • wrinkled muzzle • lips curled & mouth pulled back
Training in recognising signs Loftus , Casey Rooney 2013 http://www.bristol.ac.uk/vetscience/services/behaviour- clinic/dogbehaviouralsigns/index.html
What about more expert handlers?: inter-class correlations Characteristic average agreement predicted single Characteristic Control average 0.75 agreement observer reliabilities Motivation 0.93 Stamina Control 0.87 Motivation Distraction 0.83 Confidence Stamina 0.91 Independence Distraction 0.69 Consistency Confidence 0.78 Search pattern Independence 0.80 Thoroughness Consistency 0.82 Speed Search pattern 0.88 Detect and locate Thoroughness 0.72 Strength Speed of indication • 0.80 Compare overall agreement (ICC) may seem high Detect and locate • Examine single raters – variable in their ability Strength of indication • Important to look at single-rater reliability Clark, Sibbald and Rooney submitted
Can the raters discriminate between traits? are their ratings for the behaviour independent of each other? Construct Validity – high discriminant validity (DV) and low convergent validity (CV) Stamina Motivation Group level : Motivation with Stamina important to consider the incidence of raters who showed But if look at individual raters’ CV low construct validity
27 Lone workers can they use scales consistently and reliably? can they rate their own dogs’ ability?
Handlers rated their own dogs differently to their colleagues’ dogs Handler A B C D E F G H I J K L ↑ Control ↑ ↑ ↑ ↑ ↑ Motivation ↑ ↑ ↑ ↑ Suggestion of Stamina ↑ leniency Distraction ↑ ↑ ↑ Confidence ↑ ↓ ↑ ↓ ↑ ↑ ↓ ↓ ↑ Independence ↓ ↑ ↑ ↑ Indication s trength ↓ ↓ ↑ ↑ ↑ Overall rating ↓ ↑ ↑ Clark, Sibbald and Rooney submitted
Naïve raters Step 6: Determine who best to measure them • Evidence of leniency • Range restriction - in some raters’ use of the scales • Different levels of reliability between measures • Variability between raters in their reliability and understanding of measures of performance
Step 7: Optimise presentation method Control - response to your commands 1 Rarely responds to commands Very Low 2 Often ignores, or responds slowly Low 3 Sometimes ignores or responds slowly Intermediate 4 Usually responds quickly High Always responds, responds and generally very Very High 5 quickly
31 Subtle aspects of presentation are important Verbal anchors on rating scales (Melchers et al., 2011). Search Detect & Control Motivation Stamina Distraction Confidence Independence Thoroughness Speed Indication pattern locate Benchmarking can help handlers score certain traits Search Detect & Control Motivation Stamina Distraction Confidence Independence Thoroughness Speed Indication pattern locate (Clark and Rooney submitted)
Step 8:Consider training raters • Evaluative accuracy training • increasing validity by moving respondent ratings closer to a gold standard (Woehr & Huffcutt, 1994) • Frame-of-reference (FOR) training – • teaching raters to use a common conceptualisation when evaluating performance (Melchers et al., 2011) • provide gold-standard examples of the different Guided training on use of scales levels - improve rater accuracy (Noonan & Sulsky, 2001;; Interactive teaching device - Turning Point Schlientz et al., 2009)
How would you rate this search for Control? 50% 1- Very low Answer Now 2 - Low 25% 25% 3 - Intermediate 0% 0% 4 - High ow .. gh ow igh a. Hi L i y h y l 5 - Very high ed -‐ -‐ r 2 r m 4 Ve Ve r -‐ te -‐ 1 In 5 -‐ 3 Rater-‐ training : 0 of 3 Improve reliability and validity, Overcome leniency, Increase handler’s motivation to provide accurate assessments,
STEP 1: Identify which aspects of performance are most important Step 2: Standardise vocabulary Step 3: Determine how best to measure the behaviours Step 4: Consider where optimal to measure them Step 5: Check measures are reliable and valid Feedback to Step 6 : Determine who will measure them raters!!! Step 7: Optimise presentation method Step 8:Consider whether training is required - if so design
Feedback to raters!!! Rooney. Morant & Guest, (2013). PLoS Client(alerts) 1 (110) 2 (194) ONE, 8: 3A+3B (898) 5 ( 5) 7 ( 22) 8 ( 97) 10 (333) 13 (260) 15 (306) 16 (223) Overall 0.1 1.0 10.0 100.0 1000.0 10000.0 Hypoglycaemia alert dogs Odds ratio (Alert/Routine) for glucose outside target range 35
Rolling analysis 100% 90% 80% 26 34 18 77 22 41 74 137 158 51 10 70% 44 98 27 54 41 Correct 60% 50% 1 4 40% Wrong 30% 34 48 27 105 23 32 64 114 149 52 12 20% 30 84 23 36 19 10% 0% 01 06 07 08 09 10 11 12 13 14 15 16 17 18 19 20 21 22 Hour of Day Raters can see the value of the data they collect
Commonalities between working dogs
Companion dogs •Allows us to determine effect of different factors •e.g. breed vs rearing environment •As selection pressures change e.g. select against disease, •Important to measure aspects of performance at the companion role
39 Conclusions • Value in trials and in long-term recording of performance • Performance traits to be measured need to specific to the role and task in hand • Measuring tools take investment in time and refining to be meaningful, valid and reliable • No simple “off-the-shelf” tool • Stakeholder consultation and training is critical
Collecting good data can… • Improve future dog capability • Select for meaningful ©Crown copyright 2012 performance traits • Ensure we breed and rear for optimal performance Good data = honest, reliable, repeatable and validated
Acknowledgements Collaborators: • Corinna Clark • John Bradshaw, • Nicola Sibbald, • Liz Paul, • Sam Gaines, • Rachel Casey for funding All agencies who have taken part in these studies: Defence Animal Centre , HMP, TSA, UK Army & RAF, Medical Detection Dogs ©Crown copyright 2012 many images
You can also read