Persistence and Attrition among Participants in a Multi-Page Online Survey Recruited via Reddit's Social Media Network
←
→
Page content transcription
If your browser does not render page correctly, please read the page content below
$ £ ¥€ social sciences Article Persistence and Attrition among Participants in a Multi-Page Online Survey Recruited via Reddit’s Social Media Network Dirk H.R. Spennemann School of Agricultural, Environmental and Veterinary Sciences, Charles Sturt University, P.O. Box 789, Albury, NSW 2640, Australia; dspennemann@csu.edu.au Abstract: Participant attrition is a major concern for the validity of longer or complex surveys. Unlike paper-based surveys, which may be discarded even if partially completed, multi-page online surveys capture responses from all completed pages until the time of abandonment. This can result in different item response rates, with pages earlier in the sequence showing more completions than later pages. Using data from a multi-page online survey administered to cohorts recruited on Reddit, this paper analyses the pattern of attrition at various stages of the survey instrument and examines the effects of survey length, time investment, survey format and complexity, and survey delivery on participant attrition. The participant attrition rate (PAR) differed between cohorts, with cohorts drawn from Reddit showing a higher PAR than cohorts targeted by other means. Common to all was that the PAR was higher among younger respondents and among men. Changes in survey question design resulted in the greatest rise in PAR irrespective of age, gender or cohort. Keywords: Reddit; survey methodology; social media; online surveys; participant attrition Citation: Spennemann, Dirk H.R. 1. Introduction 2022. Persistence and Attrition Online conducted surveys have become popular due to their ease of administration, among Participants in a Multi-Page low cost of dissemination and distributed data entry, as well as for their geographic reach. Online Survey Recruited via Reddit’s Like any other type of survey, online surveys suffer from participant attrition (‘survey Social Media Network. Social Sciences break off,’ ‘drop out’), i.e., the phenomenon that participants will abandon the survey once 11: 31. https://doi.org/10.3390/ they get distracted or bored; no longer perceive the questions to be relevant, or simply run socsci11020031 out of the amount time that they had set aside for it. In paper-based surveys, this may Academic Editor: Nigel Parton lead to incomplete surveys or, more likely, to the survey form not being returned at all. This directly affects the unit response rate (i.e., fully completed surveys). Online surveys, Received: 13 December 2021 where the questions are delivered as a set of discrete pages (screenfuls) with the respondent Accepted: 14 January 2022 actively moving from one to the next, will save the response that has been submitted on Published: 18 January 2022 the previous page. This allows for partial responses to be recorded even when respondents Publisher’s Note: MDPI stays neutral abandon the effort part way through completion. Thus, while the entire survey may not with regard to jurisdictional claims in have been answered, the sets of questions on the saved discrete pages will have been, published maps and institutional affil- leading to different item response rates (Edwards 2002). iations. A number of studies have commented on participant attrition in online surveys (Monroe and Adams 2012), but only a few studies have been carried out to examine the underlying patterns (Hochheimer et al. 2016; Hochheimer et al. 2019; Zhou and Fishbach 2016). Copyright: © 2022 by the author. Participant attrition may introduce a bias in the survey responses and thus their Licensee MDPI, Basel, Switzerland. implied representativeness, either within the same survey cohort (Liu and Wronski 2018; This article is an open access article Zhou and Fishbach 2016) or between survey cohorts of different years (longitudinal studies). distributed under the terms and (Khadjesari et al. 2011) Factors that may cause participant attrition are questions that are conditions of the Creative Commons deemed irrelevant to the respondent (Zhou and Fishbach 2016) as well as the complexity Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ and length of the survey instrument (Hoerger 2010; Kato and Miura 2021; Liu and Wronski 4.0/). 2018; Mirta and Michael 2009; Robb et al. 2017). In addition to outright attrition, studies Soc. Sci. 2022, 11, 31. https://doi.org/10.3390/socsci11020031 https://www.mdpi.com/journal/socsci
Soc. Sci. 2022, 11, 31 2 of 35 have shown that the quality of responses provided in the later part of a survey can be less detailed, whereby respondents provided faster and more uniform responses (i.e., with less reflection) than answers to questions earlier in the survey (Mirta and Michael 2009). This paper will report on participant attrition in a multi-page online survey (examining the perceptions of risk in outdoor creation activities) as administered to cohorts recruited on the social media network Reddit. Reddit as a Sampling Universe Reddit is a social media network, touting itself as the ‘front page of the internet.’ It is, in essence, an array of multi-channel discussion board-type groupings and online communities (sub-Reddits), where users (‘redditors’) congregate, express opinions, ask questions and share images, videos and links to other social media and websites (Amaya et al. 2019; Gaffney and Matias 2018; Shatz 2017). These sub-Reddits can be topical and thematic (e.g., covering activities, hobbies, TV shows, etc.), geographic and country-specific (e.g., Brazil, Kenya), event-specific (e.g., COVID-19) or generic (e.g., AskReddit). While the primary language on Reddit is English, there are sub-Reddits in all other languages and scripts supported by ASCII-based standard character encodings. The members of the online community drive the nature and extent of content as well as the volume and frequency of discussion threads and the posted comments. The nature of the content ranges from semi- professional advice in Q&A formats to flippant postings or internet memes. The content of each sub-Reddit is managed by sub-Reddit-specific moderators, who are guided by a general system-wide Reddit code of conduct (Almerekhi et al. 2020), which is also enforced by an artificial intelligence-driven auto-moderation bot. (Jhaver et al. 2019) Moderators have the discretion to add sub-Reddit-specific usage rules and codes of conduct. (Moore and Chuang 2017; Squirrell 2019) This is not the place to discuss the politics of content moderation, or lack thereof, in some of the sub-Reddits, which has recently attracted some media attention (Copland 2020; Gaudette et al. 2020; Potter 2021). The breadth and specificity of the various Reddit communities make them an attractive target for data mining, primarily the mining of discussion comments, such as in the fields of public health (Balsamo et al. 2021; Bunting et al. 2021; Lu et al. 2019; Okon et al. 2020; Wang et al. 2015), private finance (Glenski et al. 2019), education (Staudt Willet and Carpenter 2020), or public information and disinformation management (Achimescu and Chachev 2021; Balalau and Horincar 2021; Dosono et al. 2017; Duguay 2021). The use of Reddit as a data set, however, is not without its critics, as the sociodemographic characteristics of participants on Reddit are not comparable to the general population (Amaya et al. 2019). The participation is skewed towards younger males from more affluent English-speaking backgrounds (Amaya et al. 2019; Shatz 2017). Descriptive demographic data exist for the representativeness of the overall Reddit universe as it manifested itself in early 2019. The percentage of U.S. citizens using Reddit decreases with age, with 22% of 18–29 years U.S. adults using Reddit, compared to 6% of people over 50 (Tankovska 2021d). Reddit users are twice as likely to be male and tend to be better educated, have a higher income, are more than three times as likely to be Caucasian or Hispanic than African American, and are less likely to reside in rural areas (Tankovska 2021a, 2021b, 2021c, 2021e, 2021h). While the uptake of Reddit as a platform has increased, the pattern of demographics has not changed much since 2013, with the exception being that the dominance of male participants has decreased from a ratio of 3:1 in 2013 to 2:1 in 2019 (Duggan and Smith 2013). The frequency of access has implications on survey responses and the latency of posts. In mid-2020, 52% of Reddit users reputedly accessed the site on a daily basis, while 82% accessed Reddit at least weekly (Tankovska 2021f). The consumption of Reddit was less in other countries, such as Finland (35% daily, 77% weekly) (Tankovska 2020a), Sweden (36% daily, 71% weekly), (Tankovska 2020b) Norway (41% daily, 78% weekly) (Tankovska 2020c), and Denmark (49% daily) (Tankovska 2020d).
Soc. Sci. 2022, 11, 31 3 of 35 In terms of geographical reach, U.S. citizens were the primary consumers of Reddit, with 49.3% of the desktop traffic originating in the U.S.A. This was followed by Canada and the United Kingdom (both 7.8%), Australia (4.3%) and Germany (3.1%) (Tankovska 2021g). The main reason for consumption of Reddit posts (by U.S. residents) was to ‘get entertainment’ (72%) followed by ‘news’ (43%) and other (17%) (Tankovska 2020b). Reddit usage varies between the time of day and day of the week. (Shatz 2017) Given the diversity of Reddit communities, usage patterns cannot be generalized but will depend on the specific sub-Reddit(s) targeted. Factors that are known to influence this are related to global geographic location (time zones), age structure and socio-economics (Moore and Chuang 2017; Shatz 2017). Limited data exist on the nature and depth of engagement on Reddit, which may have an influence on survey participation. Analyses of discussion comments showed that older users, as well as women, tend to provide more detailed comments in a discussion thread (Finlay 2014), while others favor the apparent anonymity that allows for voicing contentious opinions (Kilgo et al. 2018). 2. Methodology Between 2019 and 2021, the author carried out a survey into the perceptions and attitudes towards risk in outdoor recreation activities. The findings of that survey will be discussed elsewhere. The purpose of this paper is to examine a number of methodological aspects associated with the administration of the survey. This section describes the purpose of the main study (merely to provide context), the survey instrument, the sampling frames and modes of administration, and the limitations. 2.1. Purpose of the Study Adventure recreation encompasses a broad range of outdoor activities that require physical and mental participation, as well as an element of risk of injury and misadventure. Examples are SCUBA diving, mountain biking, mountaineering or hang gliding. Adventure recreation includes internal motivations such as fear, control, skill development, and a sense of achievement, as well as external motives such as social-based factors defined as friends, image, escape, and competition with others or the environment (Buckley 2012). While there is an abundance of literature on motivations for participation in out- door recreation and adventure tourism (Albayearak and Caber 2017; Buckley 2012; Caber and Albayearak 2016; Holm et al. 2017; Pomfret 2011; Yang et al. 2017), the vast major- ity of research into the motivations for participation and perception of risk in adventure recreation has drawn on participants during activities or their instructors. (Ewert et al. 2013; Maria Gstaettner et al. 2017) While valid in their own right, these are predefined samples that do not consider the range of motivations exhibited by the general public, nor do they explore the barriers to participation. Moreover, most of these studies rarely included or explored social determinants for participation. A systematic literature sur- vey (Yang et al. 2017), for example, showed that most of the surveys limit themselves to querying gender without considering aspects of spousal status and responsibility of care for children. Possible determinants such as education, occupation and ethnicity are also rarely explored (Naidoo et al. 2015). The research project explores the attitudes towards personal motivation and perceived risk among a broad range of participants, both those who have and those who have not (yet) participated in outdoor adventure recreation. This survey specifically looks at samples of the general population. The study was approved for general distribution by Charles Sturt University’s Human Research Ethics Committee for the period from 6 February 2019 to 3 February 2024. The study was also approved by the Institutional Review Board of the University of Guam, for dissemination among the University of Guam staff and student population for the period from 22 April 2021 to 31 May 2022.
Soc. Sci. 2022, 11, 31 4 of 35 The survey was first administered between March and May 2019 to various Australian cohorts using a two-page paper form or its PDF version which could be disseminated and completed electronically. The survey was repeated between March and May 2020 with respondents from now on offered a completion via an online survey form disseminated via the Survey Monkey platform. 2.2. The Survey Instrument The survey instrument, containing the same questions, exists in three versions. A two- page paper survey (Appendix A), a PDF version of the paper survey that could be dis- seminated and completed electronically (Appendix B), and a multi-page online version disseminated via the Survey Monkey platform. The survey instrument comprises three sections: (1) demographics; (2) general at- titudes towards risk and social determinants of risk-taking; and (3) questions related to specific activities. In the paper/PDF-based survey, Section (1) and Section (2) formed the front page, while Section (3) formed the obverse (see Appendix A). The participant information sheet was provided as a separate document. On conversion for delivery via the online platform Survey Monkey, the survey in- strument was broken up into a number of individual screens. In the online form, page 1 comprised participant information that had to be agreed to in order to progress. Section (1) (demographics) comprised two pages with a third conditional page collecting ZIP codes if the answer to Q1 (In which country do you live?) was either Australia or the U.S.A. This demographic section asked a total 11 (12) questions. Conditional on individuals stating a disability, an additional page was inserted, asking the nature of the disability. Subsequent Section (2) and Section (3) comprised eight pages each. Figure 1 shows the arrangement of the discreet pages as delivered by Survey Monkey. To ensure that the survey was as similar as possible across the three modes of appli- cation (paper, PDF, SurveyMonkey), the responses in the online surveys to the responses were not forced, i.e., a respondent could choose to not answer a question but still be able to progress to the next page. When filling out the paper-based survey, participants could estimate the required survey effort at any given time. The presence of a progress bar notwithstanding, the screen-by-screen online delivery caused fatigue among some respondents, leading to the abandonment of the survey partway through. As the progression from one question screen to another entailed the saving of the information that had been entered on that screen, partial responses could be captured even when a respondent abandoned the survey. To ensure that all activity sets had an even chance of being assessed even if the survey was abandoned partway through, the pages of Section (3) were presented in a random sequence until all pages were exhausted. As Survey Monkey records the page order, it is possible to verify the randomization. The maximum deviation from the average number of responses for the randomly delivered P11 to P19 was 6.5% ± 3.5%. 2.3. Sampling Frames The project used two different sampling frames, a semi-random sample of the gen- eral population (2019–2021) and a sample drawn from Reddit users (2021). None of the respondents were offered incentives or rewards. The participant population was restricted to persons aged 18 years of age or older. This was clearly spelled out in the information provided to prospective participants.
Soc. Sci. Soc. Sci. 2021, 11, x31FOR PEER REVIEW 2022, 10, 5 5ofof 35 35 P1 participant information Australia P2 country USA other country P2a PostCode / ZIP P3 other demographic P4 perceptions of self P5 fears P6 motivations P7 motivations (ct’d) P11 activity set 2 (9 options) P8 consequences P12 activity set 3 (6 options) P9 peer pressure P13 activity set 4 (7 options) P10 activity set 1 (6 options) P14 activity set 5 (6 options) activity sets 2–8 delivered in randomised P15 activity set 6 (3 options) sequence P16 activity set 7 (6 options) P18 Thank you screen P17 activity set 8 (5 options) Figure 1. Figure Flow chart 1. Flow chart showing showing the the arrangement arrangement of of pages pages as as delivered delivered by Survey Survey Monkey. 2.3.1. General Population 2.3. Sampling Frames The survey was administered by students enrolled in the subject Social Psychology The project used two different sampling frames, a semi-random sample of the general is Risk taught by the author. The subject forms part of the Bachelor of Applied Science population (2019–2021) and a sample drawn from Reddit users (2021). None of the re- (Outdoor Recreation and Ecotourism) offered by Charles Sturt University (Australia). This spondents were offered incentives or rewards. The participant population was restricted is a specialized degree offered in face-to-face and distance (online) mode of instruction, to persons aged 18 years of age or older. This was clearly spelled out in the information drawing participants from across Australia with a higher representation of southeast provided to prospective participants. Australian states (Queensland, New South Wales, Victoria). The students were required to administer 15 copies through direct contact (digital or in-person) to their social circle
Soc. Sci. 2022, 11, 31 6 of 35 of friends and family, for which they received course credit. While the course credit was applied to attracting 15 completed questionnaires, the students had no control over whether these surveys were fully completed or abandoned partway through. While each student carried out a purposive sample selection, the aggregate sample across all students enrolled in the subject generates a random sample of the population. The survey was also administered by the author to the general public. This occurred through direct contact (digital or in-person) with his personal and national, and interna- tional professional networks with a separate response collector URLs for Australian and overseas cohorts. In addition, participants were recruited through snowballing, i.e., inviting contacts to send out invitations through their own networks. In addition, purposive sam- pling occurred by targeting underrepresented participant classes, such as people over 65, who were sampled at events through Rotary Clubs and retirement villages, or people with below-school-age children, who were sampled through placement of surveys in waiting areas of childcare centers and preschools. In addition, to obtain a larger cohort of a different cultural group, students of the University of Guam were also sampled using invitations disseminated through the university’s centralized mail system. Several Reddit users engaged with the author in offline discussions about the project after the call for participation had been posted on the sub-Reddits (see below). These users were sent links to the overseas participant URL and invited to distribute this to their non-Reddit social networks. 2.3.2. Reddit Reddit users were sampled to obtain cohorts of the general public, but also cohorts of participants who self-identified as having an interest in the general outdoors or in specific adventure recreation activities. To adequately assess differences in the perception of risk in outdoor recreation activi- ties, five conceptual sampling frames were chosen on Reddit. Two frames were specific to outdoor recreationists, i.e., adventure activity-specific sub-Reddits (e.g., canyoning, div- ing) and general outdoor activity sub-Reddits (e.g., outdoors, hiking). In addition, two general sampling frames were chosen to circumscribe the general population: mental gen- eral and research-related sub-Reddits (e.g., sample size, psychology) and country-specific sub-Reddits (e.g., Brazil, Pakistan). The latter was chosen as an attempt to address the heavy North American-centered imbalance in Reddit responses. The fifth sampling frames were health- and phobia-related sub-Reddits (e.g., depression, acrophobia). There is an increasing realization that participation in adventure has mental health benefits. Thus it was desirable to understand the perception of risk held by that cohort. The online survey form was ‘cloned,’ with a series of cohort-specific URLs feeding into the same dataset as discrete collectors (Appendix C). The calls for participation were posted directly in the Reddit discussion forums (Figure 2) unless where sub-Reddit rules required that pre-approval for surveys had to be sought from moderators. For the sub-Reddits related to mental health, phobias and disabilities, prior moderator approval was sought as a matter of principle, irrespective of stated rules. While this approval was not always granted, some moderators promoted the survey on their sub-Reddits by ‘pinning’ the thread to the top of the page for a set time. The survey was progressively posted between 11 March and 19 April 2021. Attempts were made to post early Saturday morning U.S. east coast time to ensure that the posts would be read over the weekend. A reminder was sent out (as a repost) one week after original posting, except in instances where the overall volume of posts was low, and the original post was still within the ten most recent posts. A second reminder was sent one week after the first reminder for all but the forums that were posted after 5 April. The data collection was concluded on 28 April 2021.
Soc. Sci. Soc. Sci. 2021, 11, x31FOR PEER REVIEW 2022, 10, 7 7of of 35 35 Figure 2. Example of a call for participation on a sub-Reddit (canyoneering in this instance). Figure 2. Example of a call for participation on a sub-Reddit (canyoneering in this instance). The survey Critical for awas progressively comprehensive posted design arebetween 11 March and regular reminders (Fan19 April and Yan2021. 2010).Attempts Dillman were et al. advocate two, if not three, reminders (Dillman et al. 2009). Posting that made to post early Saturday morning U.S. east coast time to ensure the posts reminders, as would repostsbe ofread over theonweekend. the survey A reminder Reddit, tended wasthe to incur sent out (as wrath of aforum repost) one week after participants who original consideredposting, except posting any repeat in instances where the even as spamming, overall volume though of posts formal was low, reference was and madethe original post was still within the ten most recent posts. A second that this was standard survey methodology. Resistance by Reddit forum members (and reminder was sent one week some after the first reminder moderators) emerged for afterallthe butfirst the forums reminder that were and posted after occasionally 5 April. became The after vocal data collection was concluded on 28 April 2021. the second reminder. If a pre-notification were to have been posted a few days prior to the Critical survey, for would that too a comprehensive have attracteddesign to ireare of regular reminders which vocal participants, (Fan and Yan would in turn 2010). have Dill- man et al. affected theadvocate two,toif participate. willingness not three, reminders (Dillman et al. 2009). Posting reminders, as reposts of the the To increase survey on Reddit, perception tended toof of credibility incur the wrath the survey, theofresearcher forum participants who did not merely considered post the survey of the forum, but engaged regularly and timely, responding to anythat any repeat posting as spamming, even though formal reference was made dis- this was comments cussion standard survey methodology. that were posted in the Resistance by Reddit discussion thread,forum as wellmembers as, where (and some required moderators) or appropriate, emerged offlineafter withthe first reminder specific users whoand hadoccasionally commented. became vocal As noted, after the several sec- Reddit ond usersreminder. engaged Ifwitha pre-notification were todiscussions the author in offline have been posted about thea few days after project prior the to the callsur- for vey, that too would participation had been have attracted posted on the tosub-Reddits. ire of vocal participants, As many users which in turn would are participants have in more affected than onethe willingness sub-Reddit, to participate. they were invited to distribute a generic Reddit participation URL To increase the perception of credibility of the survey, the researcher did not merely post the survey of the forum, but engaged regularly and timely, responding to any
Soc. Sci. 2022, 11, 31 8 of 35 to their Reddit social networks beyond the specific sub-Reddit that generated the offline discussions. 2.4. Data Cleaning and Statistical Analysis The data used for this paper are a subset of the full data set provided by the Survey- Monkey data collectors. 2.4.1. Data Cleaning For the purposes of this paper, the full dataset was imported to MS Excel and reduced from a total of 294 columns to 32 columns by retaining the survey administrative (collector, timestamps) and demographic data and by replacing the answer columns with a set of columns whether the respondents had progressed to a given page. The Survey Monkey platform provides timestamp data that give the time of the submission of the first page (in this case, the agreement with the participation information) and a timestamp for the last page submitted, which can be the final survey page or any page in between. This provides the opportunity of computing the time spent on the survey, which ranged from 8 s (respondent did not progress past the country demographic) to a maximum of two days, 13 h and 6 min (incomplete), which is clearly an unrealistic time. In total, 2.6% of the respondents took longer than 2 h to complete (or abandon) the survey, suggesting that they were interrupted or chose to set the survey aside and return to it later. In each case, the final timestamp represents an active submission of that page, irrespective of whether any questions were answered on that page, and not a mere closing of the browser window (confirmed by testing). As these extreme data points would distort the findings, all times longer than 2 h were excluded from analyses that included completion times. Careless and/or mischievous responders are known factors that are more prevalent online than in paper-based surveys. (Robinson-Cimpian 2014; Ward et al. 2017) Cross- checking of responses, taking into account age, and free-form responses to country of origin, profession and cultural background identified some mischievous responses, which were removed from the data set. 2.4.2. Statistical Analysis The correlation between the various participation attrition curves of different cohorts or survey methods was determined using the CORREL function in MSWord. Given that the PAR continually declines as the users progress from page to page, it is inevitable that the curves will always show some level of positive correlation. Thus, for the purposes of this paper, a very high level (***) of correlation was arbitrarily attributed to r ≥ 0.995, a high level (**) to r ≥ 0.985, and a moderate level (*) to r ≥ 0.95. A paired sample T-TEST was used to compare the PAR between different cohorts or survey methods. 2.5. Limitations There are a number of limitations to the survey, both of a general and a Reddit-specific nature, which are placed on record here. 2.5.1. Data Quality Since all data were self-reported, they are subject to a recall bias. While this does not affect the data collected in Section (1) (demographic) and Section (2) (general attitudes towards risk and social determinants of risk-taking), recall bias may affect responses to the activity-specific questions (Section (3)), in particular the rating of the risk posed by and apprehension of s participant in activities they had participated in their past. The granularity of options (participated in the past year, prior, never) is (by necessity) coarse, which allows for recall bias to creep in among those who answered ‘prior.’ In addition, all responses from the 2021 cohort may be affected by the prolonged period of enforced inactivity due to COVID-19.
Soc. Sci. 2022, 11, 31 9 of 35 2.5.2. Participation and Response Rate The literature notes low response rates for online surveys in general (Monroe and Adams 2012). To boost response rates, Dillman et al., as well other authors drawing on their work, advocated the approach of a personalized and repeated contact (Cook et al. 2000; Dillman et al. 2009; Fan and Yan 2010; Koitsalu et al. 2018). While this is possible with fixed, well-circumscribed cohorts of known potential participants (Monroe and Adams 2012), it was not possible in the general public cohorts or the Reddit cohorts. Other modes to boost response rates are perceptions of scarcity (i.e., those surveyed are a group of a select few) (Fan and Yan 2010) pre-notification, (Fan and Yan 2010; Koitsalu et al. 2018) and reminders (Koitsalu et al. 2018). Although there are dissenting opinions (Brown and Knowles 2019), remunerative incentives are frequently commented upon favorably, (Fan and Yan 2010; Monroe and Adams 2012) in particular for longitudinal studies, (Choga 2019; Khadjesari et al. 2011) with better response rates resulting from uniform monetary incentives rather than prize draws (Brown and Knowles 2019; Robb et al. 2017) and higher incentive values for longitudinal studies (Khadjesari et al. 2011). The main limitation to assessing participation is that the mode of survey administration does not give the opportunity to adequately assess the response rate. Participation and uptake on survey invitations were voluntary, and the selection of the cohorts for the general population (see Section 2.3.1) was opportunistic. Thus, it can be surmised that the fact of participation entails a bias of general interest in either outdoor activities (signaled via the title of the survey), interest in the general issue of risk behavior, or social desirability bias with participants feeling compelled to support research in general or the individual disseminators of the survey. Among Reddit users, the number of actual participants in a survey is subject to a range of filters and represents a small fraction of the overall population registered for a specific Sub-Reddit (Figure 3). While the total number of registered users in each sub-Reddit is publicly posted, and while the number of participants reading the same sub-Reddit as a user is also visible, the number of people actively (posting) engaging with a sub-Reddit is not readily discernible. It can be posited that the total number of persons consuming the content of a specific sub-Reddit will be greater than the number who registered for the sub-Reddit. The universe of readers looking at a sub-Reddit at any given time is subject to time richness of the participant population due to employment and social/family factors, the time of day at the user’s location (day, night) and the geographic mix of the sub-Reddit ’s users, i.e., whether it is primarily a single nation (with associated time zone implications) or truly global. The readers looking at a sub-Reddit need to be sufficiently interested to click on the headline of the specific post and then remain engaged to read that post. Only a fraction of readers will be further motivated to click on the link that takes them to the survey form hosted on SurveyMonkey. For reasons of survey ethics, all relevant participation information needs to be posted on the first page of an online survey. A downside of the lengthy required text is that it may further discourage participation. On the other hand, that step may have filtered out some user which would have commenced, but then quickly abandoned the survey after a handful of questions. It can be assumed that reading the initial post and subsequent participation entails a bias of general interest in the outdoor activities of the targeted sub-Reddits and/or a social desirability bias with participants feeling compelled to support research in general or the survey in particular. The commercial version of SurveyMonkey subscribed to by Charles Sturt University records the answers but does not record the number of times the survey form was called up but was not progressed beyond reading the participation information section.
Soc. Soc.Sci. 2021,11, Sci.2022, 10,31x FOR PEER REVIEW 10 of 10 of35 35 registered users users reading the subReddit users reading the post users clicking on the link users starting the survey Figure3.3.Sampling Figure Samplingvs. vs.participant participantuniverse universeofofaasub-Reddit. sub-Reddit. It canapproximations Some be assumed thatof reading the initial participation poststage, at that and subsequent however, can participation be made. entails The lit-a bias of suggests erature general interest that 90%in the of anoutdoor internet activities community,of thesuch targeted sub-Reddits as Reddit, are pure and/or a social consumers desirability (readers), 9%bias with participants contribute in general,feeling compelled and only to support 1% contribute and research in general engage heavily or the (Carron- survey in particular. The commercial version of SurveyMonkey subscribed Arthur et al. 2014; Gasparini et al. 2020; Glenski et al. 2017; Van Mierlo 2014). In December to by Charles SturtReddit 2020, University claimedrecords the 52 to have answers millionbut does daily notout users record of a the totalnumber of times population themillion, of 430 survey form 2020) (Patel was called up butthat suggesting was12%notofprogressed beyond the users visit daily.reading As thisthe participation usage cuts across information the entire section. site, the participation percentage will vary between sub-Reddits. Using the 1% rule, we can estimateSome thatapproximations 0.12% of registeredof participation users will be atactive that stage, however, can be made. The liter- participants. ature suggests that System-wide data90% of anthat suggest internet community, the average user willsuch asthe visit Reddit, area 10-minute site for pure consumers dura- tion (SimilarWeb (readers), 2021), butinitgeneral, 9% contribute can be assumed and only that 1%the durationand contribute is longer engage for heavily specific-interest (Carron- sub-Reddits Arthur et al.which have developed 2014; Gasparini into Glenski et al. 2020; their own ecosystems. et al. 2017; VanMoreover, Mierlo 2014). theInduration December of active 2020, users Redditwill far exceed claimed to havethe52average. million daily users out of a total population of 430 million, (PatelA glimpse of the user-reader-participant 2020) suggesting that 12% of the usersrelationship visit daily. As of this sub-Reddits usage cuts used in the across thesurvey entire can site,bethe gleaned from Table participation 1. Most will percentage sub-Reddit forumssub-Reddits. vary between allow creating poststhe Using with 1%arule, single- we question, can estimate fixed choice that 0.12%polls with a maximum of registered users willof besix short active answer options, running for participants. a duration of between System-wide data one andthat suggest seventhedays. average A user simplewillpoll visitofthe three-day site for a duration 10-minutewas du- administered on various adventure bicycling-related sub-Reddits ration (SimilarWeb 2021), but it can be assumed that the duration is longer for specific- asking respondents to choose interesta sub-Reddits primary motivation which havefor their participation developed in the into their ownbicycling ecosystems.activities. The number Moreover, the du- ofration usersofreading the sub-Reddit active users was recorded will far exceed at six instances between 20:00 h and 8:00 h the average. GMT A (7 glimpse a.m. andof9the p.m. ADST) for three days, user-reader-participant which allows relationship us to calculate of sub-Reddits usedaninaverage the sur- percentage of registered users reading the discussions at any vey can be gleaned from Table 1. Most sub-Reddit forums allow creating posts with aone time. The percentages range from 0.1% to single-question, 0.7% fixed (Tablepolls choice 1). While with athis is the average maximum at anyanswer of six short given time, options,it does not running
Soc. Sci. 2022, 11, 31 11 of 35 allow to estimate the cumulative total over a single day or the total three-days exposure period of the poll. Neither does it indicate the duration of participation. Table 1. Participation statistics of a Reddit poll. Poll Participants in % Average % of Poll Participants Registered Users Total of Poll of Average Number Sub-Reddit Registered in % of Registered in the Sub-Reddit Participants of Registered Users Users Online Users Online bmx 38,200 0.585 64 0.168 28.6 cyclocross 18,000 0.266 54 0.300 112.9 dirtjumping 5900 0.538 36 0.610 113.4 fatbike 8900 0.465 49 0.551 118.4 fixed gear 69,100 0.539 140 0.203 37.6 gravelcycling 45,000 0.628 97 0.216 34.3 mountainbiking 86,200 0.283 71 0.082 29.1 MTB 223,000 0.707 170 0.076 10.8 single speed 6600 0.105 100 1.515 1440.0 xbiking 35,800 0.499 88 0.246 49.2 When considering the participation in the poll, the average percentage of registered users doing so ranges from 0.08% to 1.51% (Table 1). A formal response rate can be calculated for the student cohort recruited through the University of Guam mail system. The total e-mail list contains 3082 addresses. In total 210 responses were received, resulting in a response rate of 6.8%. 3. Results 3.1. Demographics In total 4198 surveys were commenced, 422 by general online (not Reddit) users and 3776 by Reddit users. The two online cohorts show a gender bias with male respondents significantly overrepresented both among the non-Reddit (χ2 = 3.92, df = 1, p = 0.0476) and the Reddit population (χ2 = 1758.59, df = 1, p < 0.0001). When examining the gender differential among the major Reddit cohorts, women respondents are significantly better represented among the mental health Reddits (53.3%, n = 210; χ2 = 39.65, df = 1, p < 0.0001) than among the general population (30.0%, n = 793) and the outdoor activities related Reddits (31.0%, n = 786; χ2 = 35.87, df = 1, p < 0.0001). The representation of female respondents among adventure activities related Reddits is a sixth that of of the male respondents (14.3%, n = 1613, χ2 = 723.74, df = 1, p < 0.0001). The gender representation varies between five-year age cohorts, with female repre- sentation among the non-Reddit respondents rising from 31.3% among the 16–19 years old age cohort to 70% among the 65–69 years old age cohort. No such trend is observable among Reddit respondents (Table 2). When looking at the age structure of the Reddit respondent population by gender, differences emerge (Figure 4). While the age curves generally track in a similar fashion, the general respondent cohorts tend to be younger than those in the adventure cohorts and those in the general outdoor cohort. Among both genders, adventure cohort respondents show a peak in the 25–29 year age bracket, while the outdoor cohort respondents peak in the 30–34 year age bracket. Among the general Reddit population, the age structure of female respondents shows a distinct peak in the 16–19 year age bracket, while among men, it is more diffuse, spanning the 16–34 age range (Figure 4).
genders, adventure cohort respondents show a peak in the 25–29 year age bracket, while the outdoor cohort respondents peak in the 30–34 year age bracket. Among the general Reddit population, the age structure of female respondents shows a distinct peak in the 16–19 year age bracket, while among men, it is more diffuse, spanning the 16–34 age range (Figure 4). Soc. Sci. 2022, 11, 31 12 of 35 Table 2. Gender and age breakdown of the Reddit and non-Reddit respondent population. Reddit Table 2. Gender and age breakdown Non-Reddit of the Reddit and non-Reddit respondent population. Men Women n Men Women n 16–19 77.3 Reddit 22.7 242 68.8 Non-Reddit 31.3 16 Men Women n Men Women n 20–24 74.4 25.6 687 67.3 32.7 55 16–19 25–29 77.3 73.4 22.7 26.6 242804 68.8 64.1 31.3 35.9 16 64 20–24 74.4 25.6 687 67.3 32.7 55 30–34 25–29 76.0 73.4 24.0 26.6 804 663 54.2 64.1 45.8 35.9 64 48 35–39 30–34 77.5 76.0 22.5 24.0 663364 40.8 54.2 59.2 45.8 48 49 40–44 35–39 81.8 77.5 18.2 22.5 364242 46.4 40.8 53.6 59.2 49 28 40–44 45–49 81.8 81.5 18.2 18.5 242130 46.4 52.0 53.6 48.0 28 25 45–49 81.5 18.5 130 52.0 48.0 25 50–54 78.6 21.4 117 48.4 51.6 31 50–54 78.6 21.4 117 48.4 51.6 31 55–59 55–59 69.8 69.8 30.2 30.2 63 63 46.2 46.2 53.8 53.8 26 26 60–64 60–64 64.6 64.6 35.4 35.4 48 48 47.4 47.4 52.6 52.6 19 19 65–69 65–69 88.2 88.2 11.8 11.8 17 17 30.0 30.0 70.0 70.0 10 10 70+ 100.0 0.0 4 37.5 62.5 8 70+ 100.0 0.0 4 37.5 62.5 8 All All 75.8 75.8 24.2 24.2 33813381 53.6 53.6 46.4 46.4 379379 (a) (b) Figure4.4. Age Figure Age structure structure of of the theReddit Redditrespondent respondentpopulation populationbybygender. (a) (a) gender. men; (b) women. men; (the (b) women general category includes all sub-Reddits not classified as adventure, outdoor or mental health). (The general category includes all sub-Reddits not classified as adventure, outdoor or mental health). Thenon-Reddit The non-Reddit user user respondents respondents camecame fromfrom 25 countries, 25 countries, primarily primarily the(48.65), the U.S.A. U.S.A. (48.65), Australia Australia (35.1%) (35.1%) and Canadaand Canada (4%). The(4%).Reddit The Reddit respondents respondents camecamefromfrom 68 differ- 68 different ent countries, countries, primarily primarily the U.S.A. the U.S.A. (66.6%), (66.6%), CanadaCanada (8%), (8%), the UKthe UK and (5.8%) (5.8%) and Australia Australia (4.2%). (4.2%). On On ageographic a major major geographic scale, thescale, the population Reddit Reddit population is dominated is dominated by participants by participants from North from America (64.6%),(64.6%), North America followed by Europe followed (14.5%) and by Europe Australia/New (14.5%) ZealandZealand and Australia/New (5.2%). Least represented (5.2%). are the Middle Least represented are theEast, Latin Middle America East, Latin and Southand America EastSouth Asia (0.2% each), East Asia as (0.2% well as East each), Asia as well as and EastAfrica (0.4% Asia and each). Africa (0.4% each). 3.2. 3.2.Participant ParticipantAttrition Attrition Participant Participantattrition attritionrates (PAR) rates were (PAR) assessed were by establishing assessed howhow by establishing many pages many of theof pages multi-page onlineonline the multi-page survey a given survey a participant completed given participant before they completed abandoned before the survey. they abandoned the In the online form delivered by SurveyMonkey, Page 1 was the participant information documentation and the invitation of the survey. The count started when participants progressed from page 1 to page 2, thus equating the start of page 2 as 100% participation. Pages 2 to 9 covered demographics and questions related to general attitudes towards risk and social determinants of risk taking (coded as P2–P9 in the graphs). P10 was the first page with questions related to specific activities, which also explained what was asked. The following seven pages are related to specific activities. These were presented to the participant in a randomized fashion to ensure that each had an equal chance of being
pants progressed from page 1 to page 2, thus equating the start of page 2 as 100% pation. Pages 2 to 9 covered demographics and questions related to general attitu wards risk and social determinants of risk taking (coded as P2–P9 in the graphs). P the first page with questions related to specific activities, which also explained w asked. The following seven pages are related to specific activities. These were pr Soc. Sci. 2022, 11, 31 13 of 35 to the participant in a randomized fashion to ensure that each had an equal ch being answered in those cases where participants did not complete the survey (c R1–R7 in the graphs). In survey forms using the standard page layout (in paper o answered in those cases where participants did not complete the survey (coded as R1–R7 the first page equates to P2 to P9 and the obverse page to P10 and R1–R7. in the graphs). In survey forms using the standard page layout (in paper or PDF), the first page equates to P2 to P9 and the obverse page to P10 and R1–R7. 3.2.1. Effects of the Mode of Submission 3.2.1. Effects of the Mode The of Submission different modes of submission resulted in different PARs. In the case of th ical survey, The different modeswhich followedresulted of submission a standard page layout in different PARs. (inInpaper or PDF), the case of thethe resp physical survey, tendedwhichto followed a standard fully or almost fullypage layout the complete (in paper or PDF), first page (P2 the respondents to P9 equivalent), but t tended to fully or almost dropped fully after the complete first the first related set of questions page (P2 to to P9 equivalent), specific but equivalent) activities (P10 the PAR dropped 5).after the first the Thereafter, set of PARquestions remainedrelated to specific stable among activities (P10 equivalent) paper surveys, whereas it conti (Figure 5). Thereafter, drop, albeit the gradually PAR remained stable (final PARamong 16.3%), paper amongsurveys, whereas filling respondents it continued out the PDF v to drop, albeit gradually (final PAR 16.3%), among respondents (final PAR 12.8%). The difference between the two PAR trajectoriesfilling out the PDF versions is very sig (final PAR 12.8%). The difference between the two PAR trajectories is very significant (paired t-test, p = 0.0017). By comparison, the PAR of respondents using onlin (paired t-test, p = 0.0017). By comparison, the PAR of respondents using online forms dropped following the first set of demographic questions (P3), remained stable u dropped following the first set of demographic questions (P3), remained stable until the end of the section dealing with questions related to general attitudes towards r end of the section dealing with questions related to general attitudes towards risk and social determinantssocial determinants of risk-taking (P4–P9) of but risk-taking (P4–P9) then dropped off but thenfor steeply dropped off steeply the questions relatedfor the qu to specific activities (Figure 5). The same trajectory was observed among Reddit users, among related to specific activities (Figure 5). The same trajectory was observed users, except that the PARexcept alreadythat the PAR dropped alreadyamong continually dropped continually the section dealingamong the section deali with questions questions related to general related attitudes to general towards attitudes risk and towards riskofand social determinants social determinants risk-taking (final PAR of risk 42.1%). The (final declinePARin PAR42.1%). for theThe decline related questions in PARtofor the questions specific activities related to specific was steeper than activit that of non-Reddit participants dropping to a final PAR value of 61.1%. While the PAR of 61.1% steeper than that of non-Reddit participants dropping to a final PAR value decay curves the(Figure 5) show PAR decay a very curves high level (Figure of correlation 5) show a very high (r =level 0.998), the difference of correlation (r = 0.998), between the ference two PARbetween trajectories the is twohighly PARsignificant trajectories(paired t-test, is highly p < 0.0001). significant (paired t-test, p < 0. Figure 5. Differences in Differences Figure 5. participant attrition between in participant variousbetween attrition modes ofvarious submission. modes of submission. The gender differences in participant attrition for paper and PDF versions are shown in Figure 6, with the greater final PAR by men filling out paper versions (92.6%) and the least loss among men filling out PDF versions (98.4%).
The gender differences in participant attrition for paper and PDF versions are shown in Figure 6, with the greater final PAR by men filling out paper versions (92.6%) and the least loss among men filling out PDF versions (98.4%). Soc. Sci. 2022, 11, 31 14 of 35 The remainder of the discussion of results focuses solely on participant attrition rates observed using online surveys hosted on Survey Monkey. (a) (b) Figure 6. Differences in participant attrition by mode of submission and gender. (a) paper and pdf Figure 6. Differences in participant attrition by mode of submission and gender. (a) paper and pdf submission; (b) Reddit and non-Reddit online cohorts. submission; (b) Reddit and non-Reddit online cohorts. 3.2.2. TheEffects of Gender remainder of theondiscussion Participant ofAttrition amongsolely results focuses Reddit onand Non-Reddit participant Cohorts attrition rates observedGenderusingdifferentiation online surveysinhosted PAR on canSurvey be observed Monkey. both for Reddit and non-Reddit online cohorts (Figure 6b). While the PAR trajectories among men and women show a 3.2.2. high Effects level ofofcorrelation Gender oninParticipant Attrition the non-Reddit among online Reddit cohort and Non-Reddit (r = 0.987) and a veryCohorts high level of correlation the Reddit cohort Gender differentiation in PAR(rcan = 0.999), women be observed have both foraReddit very significantly and non-Reddit lower PAR online cohorts than men (Figure 6b). the in both While the PAR trajectories non-Reddit and the Reddit among men and cohorts (bothwomen at p
Soc. Soc. Sci.Sci. 2021, 2022, 11,10, 31x FOR PEER REVIEW 15 15 of of 3535 (a) (b) Figure7.7.Differences Figure Differences in in participant participant attrition attrition among amongonline onlinesurvey surveyrespondents respondentsbetween betweenmajor majorsub- sub- Reddit cohort groups. The curve for non-Reddit online surveys is shown for comparison. (a) Men; Reddit cohort groups. The curve for non-Reddit online surveys is shown for comparison. (a) Men; (b) Women. (b) Women. 3.2.3.Effects 3.2.3. Effectsof ofAge Age on on Participant Participant Attrition Attrition among among RedditRedditand andNon-Reddit Non-RedditCohorts Cohorts To test whether a participant’s age has an influence on To test whether a participant’s age has an influence on attrition rates, male and attrition rates, male and fe- female male respondents were grouped into ten-year age cohorts respondents were grouped into ten-year age cohorts (Figure 8). Among both genders, (Figure 8). Among both gen- ders, increasing increasing age had ageahad a positive positive effecteffect on survey on survey completion completion rates. rates. ForFormen men of of thethe gen- general eral online cohorts (Figure 8a), the PAR of the age group 55+ online cohorts (Figure 8a), the PAR of the age group 55+ was significantly less than all was significantly less than all other other age cohorts age cohorts (paired(paired t-test; t-test; range:range: p = 0.0001 p = 0.0001 for 35−44 for 35–44 year–p year–p = 0.0096 = 0.0096 for 44–54 for 44–54 year). year). For men Forofmen of the Reddit the Reddit cohorts cohorts (Figure (Figure 8b), the 8b),PAR the PAR of theofagethegroup age group55+ was55+ was alsoalso very very significantly less than all other age cohorts with p < 0.0001 for significantly less than all other age cohorts with p < 0.0001 for all except for the 44–45 year all except for the 44– 45 year cohort (pcohort = 0.01).(pAmong = 0.01). women, Among women,the samethe same trend cantrend can be observed be observed among the among Reddit thecohorts Red- (Figure 8d), where the age group 55+ was also very significantly less than all other all dit cohorts (Figure 8d), where the age group 55+ was also very significantly less than age other age cohorts withcohorts with for p < 0.0001 p < all 0.0001 exceptfor for all except the 44–45 for year the 44–45 cohort year (p =cohort 0.0149),(p =but0.0149), not for butthe not foronline general the general cohortonline (Figurecohort (Figure 8c). Here 8c). Here the PAR for the theagePAR group for 55+ the was age group 55+ was significantly less significantly for all other age lesscohorts for all other (range: agep= cohorts 0.0002(range: for 45–54 p = year–p 0.0002 for 45–54 for = 0.0161 year–p34–44 = 0.0161 year) withfor 34–44 year) with the exception of the exception of the 25–34 cohort (p = 0.8723). the 25–34 cohort (p = 0.8723). Usingthe Using the55+ 55+ cohort, cohort, which which shows showsthe thesmallest smallestPAR PARas asthetheyardstick, yardstick,the thePARPARdecaydecay curves for the age groups of male (Figure 8b) and female (Figure curves for the age groups of male (Figure 8b) and female (Figure 8d) Reddit respondents 8d) Reddit respondents eachshow each showa averyveryhigh highlevel levelofofcorrelation correlation(r (r = =0.987–0.992 0.987–0.992for formenmen and and r =r 0.9787–0.9889 = 0.9787–0.9889 for for women), while the correlation for the male and female women), while the correlation for the male and female general online cohorts general online cohorts (Figure(Figure 8a,d) is8a,d) is not significant. not significant. Looking Looking at the at the trajectories trajectories of PAR ofamong PAR amongmen, men,the age thegroups age groups18–2418– and 24 and 25–34 show the greatest and most rapid decline PAR once 25–34 show the greatest and most rapid decline PAR once the activity set questions were the activity set questions were asked. asked. The PAR Thecurves PAR curves for thefor the Reddit Reddit cohortscohorts all follow all follow the same the same trajectory, trajectory, withwith the the PAR PAR decreasing with each increase in age cohort (Figure 8b). decreasing with each increase in age cohort (Figure 8b). The PAR curves show a high The PAR curves show a high to a to a very high level of correlation depending on the combination very high level of correlation depending on the combination of adjacent age cohorts tested, of adjacent age cohorts tested, ranging ranging from r =from 0.995r (35–44 = 0.995 vs. (35–44 vs. 45–54) 45–54) to r =(18–24 to r = 0.999 0.999 (18–24 vs. 25–34). vs. 25–34). The PAR curves for women Reddit cohorts follow similar trajectories compared to those of men but exhibit a more pronounced decrease once activity set questions were asked (Figure 8d). Differing from the men, however, the PAR curves for women respondents do not show a decrease in PAR with each increase in age cohort as the 35–44 year cohort shows a lower PAR than the 25–34 year cohort (Figure 8d). The PAR curves show a moderate to a high level of correlation depending on the combination of adjacent age cohorts tested, ranging from r = 0.978 (45–54 vs. 55+) to r = 0.992 (25–34 vs. 35–44). Among the general
Soc. Sci. 2022, 11, 31 16 of 35 online cohorts of women respondents, the PAR curves are much more diverse without a clear pattern (Figure 8c). While the 55+ cohort shows the smallest increase in PAR (final value 78.6), the 18–24 year cohort shows the steepest and greatest increase PAR (final value 41.1%). Compared to the Reddit cohorts, which showed a gradual increase in PAR even among the attitude questions (P4–P9), the women respondents of the general online cohorts exhibited a high level of perseverance, with the younger age groups (18–24 and 25–34) maintaining 100% until P8 and P9 respectively. Two other cohorts (35–44 and 55+) maintained a PAR of over 95% until P9. From P10 onwards, the PAR increased rapidly in these four age groups. Only the 44–54 year cohort showed a gradual, almost linear increase in PAR (Figure 8c). The correlations of the curves are not significant or only moderately Soc. Sci. 2021, 10, x FOR PEER REVIEW 16 of 35 significant. (a) (b) (c) (d) Figure 8. Differences in participant attrition among male respondents by age group for general Figure 8. Differences in participant attrition among male respondents by age group for general and and Reddit cohorts. (a) Men—General online cohorts, (b) Men—Reddit cohorts, (c) Women—Gen- Reddit cohorts. eral online (a) Men—General cohorts, onlinecohorts. (d) Women—Reddit cohorts, (b) Men—Reddit cohorts, (c) Women—General online cohorts, (d) Women—Reddit cohorts. The PAR curves for women Reddit cohorts follow similar trajectories compared to those of men but exhibit a more pronounced decrease once activity set questions were asked (Figure 8d). Differing from the men, however, the PAR curves for women respond- ents do not show a decrease in PAR with each increase in age cohort as the 35–44 year cohort shows a lower PAR than the 25–34 year cohort (Figure 8d). The PAR curves show
You can also read