Jump to content

Research:Knowledge Gaps Index/Measurement/Readers Survey 2023

From Meta, a Wikimedia project coordination wiki
Tracked in Phabricator:
Task T341890
Created
11:19, 6 September 2023 (UTC)
Duration:  2023-07 – ??-??

This page documents a research project in progress.
Information may be incomplete and change as the project progresses.
Please contact the project lead before formally citing or reusing results from this page.


This project aims to understand the demographics and motivations of Wikipedia Readers across language editions. It is part of the Knowledge Gaps Index focus on Readers of Wikipedias, and continues the work of the 2019 Readers survey.

Progress on this project can be followed at T341890.

Key Takeaways

[edit]

Here is a summary of the 2023 Global Reader Survey results. For the full results please check the Results section.

Demographics

[edit]

Age

[edit]
  • Wikipedia readers skew young, although this varies by project. However, readers aged 18-24 are a plurality of those 18+ in nearly all surveyed projects (the exceptions are dewiki and nlwiki).

Gender

[edit]
  • Wikipedia readers across all projects identify disproportionately as solely men. By project, readership of ukwiki, rowiki, and ruwiki are closest to gender parity.

Education

[edit]
  • Wikipedia readers are highly-educated. Many readers are current students. Across each surveyed project, a majority of readers under age 30 are current students.

Language

[edit]
  • Wikipedia readers are highly multilingual: a majority speak two or more languages fluently. However, readers of any given Wikipedia are overwhelmingly reading in a primary language. Readers of enwiki are most likely to be non-native language speakers.

Reader Behavior

[edit]
  • Consistent with previous research, we observe a pronounced gender gap in the length of reading sessions. However, an updated gender identity survey item allows us to specify that readers who identify solely as women read substantially fewer articles per reading session when compared to readers of any other gender identity.
  • We observe some mixed evidence for gender-based differences in topical preferences. More research is needed to better understand this relationship.
  • Watch this page for planned future analysis!

Data Collection

[edit]

This project employed simple random sampling of Wikipedia readers using the QuickSurveys extension. The QuickSurveys opt-in was displayed to non-logged in users and asked them whether they would like to participate in a survey to help improve Wikipedia. Survey responses were collected using LimeSurvey, an external survey tool.

The goals of the survey are to make demographic estimates of Wikipedia readers across different language projects within the scope of the Knowledge Gaps Index, to understand motivations for reading Wikipedia, and to analyze whether there are differences by motivation and demographics in who reads which type of content.

Analyses of the survey data will primarily follow the 2019 edition of the survey.

Timeline

[edit]
Date Milestone
September 22—October 4, 2023 enwiki pilot survey
November 14—November 22, 2023 enwiki full sample survey
November 30—December 14, 2023 arwiki, cswiki, dewiki, elwiki, eswiki, fawiki, frwiki, hewiki, hiwiki, jawiki, kowiki, nlwiki, plwiki, ptwiki, rowiki, ruwiki, trwiki, viwiki, ukwiki, zhwiki full sample surveys

Policy, Ethics and Human Subjects Research

[edit]

This survey is governed by the Global Readers Survey privacy statement.


Survey Administration Results

[edit]

Surveys were fielded across 23 projects from November 14--December 18, 2023. A total of 80,242 complete survey responses were collected.

Global Readers Demographic Survey (2023) Fielding Summary
Project Fielding Dates QuickSurvey Sampling Ratio Total LimeSurvey Initiations Total Completes
arwiki (Arabic) 28/11- 18/12 12.4% 40526 5186
cswiki (Czech) 28/11- 11/12 10.0% 4592 1618
dewiki (German) 28/11- 11/12 5.2% 23797 9589
elwiki (Greek) 28/11- 13/12 20.0% 5557 1537
enwiki (English) 14-22/11 2.0% 40497 9479
eswiki (Spanish) 28/11- 18/12 6.6% 39071 8769
fawiki (Farsi) 28/11- 11/12 2.1% 7689 1850
frwiki (French) 28/11- 11/12 9.7% 23368 6617
hewiki (Hebrew) 28/11- 13/12 8.6% 5044 1609
hiwiki (Hindi) 28/11- 18/12 20.0% 62278 715
idwiki (Indonesian) 28/11- 13/12 15.0% 13671 1516
itwiki (Italian) 28/11- 11/12 2.5% 5855 1996
jawiki (Japanese 28/11- 18/12 1.6% 7023 1905
kowiki (Korean) 28/11- 18/12 20.0% 12800 1575
nlwiki (Dutch) 28/11- 18/12 7.5% 5420 1528
plwiki (Polish) 28/11- 11/12 7.5% 12341 3672
ptwiki (Portuguese) 28/11- 18/12 15.0% 32457 4619
rowiki (Romanian) 28/11- 11/12 21.1% 9820 2399
ruwiki (Russian) 28/11- 18/12 1.5% 15637 5357
trwiki (Turkish) 28/11- 11/12 7.5% 8568 1792
ukwiki (Ukrainian) 28/11- 11/12 6.4% 6576 2094
viwiki (Vietnamese) 28/11- 18/12 7.5% 6111 1075
zhwiki (Simplified and Traditional Chinese) 28/11- 18/12 7.1% 15841 3745

Responses Results

[edit]

Age Screener

[edit]

Only readers aged 18 years and older were considered eligible for the survey. As a result, all readers who opted into the survey were first shown an age-based screener question. Those who indicated they were under 18 had their survey sessions terminated.

Are you at least 18 years of age?
○ Yes
○ No

Unfortunately, legal protections for people under 18 mean we cannot survey you. Thank you for your interest!

Reader Motivation

[edit]

Consistent with previous survey research conducted by the Wikimedia Foundation [1] [2] , we asked readers about their motivations for reading Wikipedia. However, in this survey, we allowed respondents to select multiple motivations and to write-in other motivations that were not listed as answer options.

I am reading this article because ...

Please select all answers that apply

□ I have a work or school-related assignment
□ I need to make a personal decision based on this topic (e.g., buy a book, choose a
travel destination)
□ I have a work or school-related assignment
□ I need to make a personal decision based on this topic (e.g., buy a book, choose a
travel destination)
□ I want to know more about a current event (e.g., a soccer game, a recent earthquake,
somebody’s death)
□ the topic was referenced in a piece of media (e.g., TV, radio, article, film, book)
□ the topic came up in a conversation
□ I am bored or randomly exploring Wikipedia for fun
□ this topic is important to me and I want to learn more about it (e.g., to learn about a culture)
□ Other:__________________________

When asked what motivated them to read the article they were sampled from during the 2023 Global Readers Survey, respondents were overall most likely to say the article topic was "personally important" to them. (Respondents were able to select multiple motivations).

Barchart showing percentage of Wikipedia readers across 22 surveyed projects who selected each listed motivation for reading the article they were sampled from (readers could select more than one motivation)
Motivations cited by Wikipedia readers for reading the article they were sampled from

Similarly, at the project level, readers of all surveyed projects except for Korean Wikipedia were most likely to say they were reading the article because it is personally important to them. Korean Wikipedia readers were most likely to say they were "bored or randomly exploring Wikipedia for fun".

Faceted barchart showing percentage of readers citing each listed motivation for reading articles in each of 22 surveyed projects.
Cited motivations for reading article by readers of each surveyed Wikipedia project

Reader Information Needs

[edit]

Again, following previous survey research, we asked readers about the specific information needs that motivated them to read the article from which they were sampled.

I am reading this article to …
○ look up a specific fact or to get a quick answer
○ get an overview of the topic
○ get an in-depth understanding of the topic

Overall, Wikipedia readers are most likely to say they are are reading to "get an overview of the topic". However, reader information needs are fairly evenly distributed with 41.2% saying they are reading for an "overview", 32.1% to "look up a specific fact or to get a quick answer", and 26.0% to "get an in-depth understanding of the topic".

Barchart showing Wikipedia readers' information needs
Information sought in Wikipedia articles by readers

At the project level, Farsi Wikipedia readers are most likely to say they are looking for "an in-depth understanding" (52.5%), Hebrew Wikipedia readers are most likely to say they are seeking "an overview" (50.0%), and Vietnamese Wikipedia readers are most likely to say they need to "look up a specific fact or...get a quick answer" (42.2%).

Faceted barchart showing Wikipedia readers' information needs in each of 22 surveyed projects
Wikipedia readers' information needs by project

Reader Topic Prior Knowledge

[edit]

We presented readers with the same survey question measuring their prior knowledge of the topic of the article they were reading that was used in previous readers surveys.

Prior to visiting this article … 
○ I was already familiar with the topic
○ I was not familiar with the topic, and I am learning about it for the first time

Overall, Wikipedia readers are more likely to say that they are already familiar with the topic they are reading about (55.0%) than not (44.2%).

Barchart showing Wikipedia readers' level of prior familiarity with the topic of the article they were reading when sampled
Wikipedia readers' topic familiarity

Readers of most language projects are similarly more likely to be reading articles on topics with which they are already familiar. However, there are some exceptions: readers of Chinese Wikipedia are particularly likely to be reading on unfamiliar topics (59.5%). In contrast, Dutch Wikipedia readers are most likely to read on familiar topics (74.6%).

Faceted bar chart showing reader topic familiarity across each of 22 surveyed projects
Wikipedia readers' topic familiarity by project

Reader Age

[edit]

Age has been robustly associated with a broad range of social attitudes and behaviors[1][2] (and even survey response quality[3]) in addition to internet use and digital proficiency. Moreover, previous Wikimedia Foundation research has found that Wikipedia readers are disproportionately young. We measured age with the following item drawn from the Community Insights survey.

What is your age?

○ 18-24
○ 25-29
○ 30-39
○ 40-49
○ 50-59
○ 60-69
○ 70+
○ I prefer not to say

Of those 18 and older, respondents across all surveyed projects are most likely to be aged 18-24 (27.9% of readers 18+). However, the age distribution of readers varies considerably across the surveyed projects.

Bar chart showing the distribution of Wikipedia readers across all sampled projects (18+ respondents only)
Reader age across all sampled projects (18+ only)

In particular, readers of Vietnamese Wikipedia are most likely to be under the age of 30 (61.5% aged 18-29), while Dutch Wikipedia (21.8% aged 18-29) and German Wikipedia (21.0% aged 18-29) readers are least likely to be under the age of 30.

Faceted bar chart showing the age distribution of readers of 22 different Wikipedia projects (18+ readers only)
Age of Wikipedia readers by surveyed project (respondents 18+ only)
Bar chart showing the share of readers aged 18-29 across 22 surveyed Wikipedia projects
Wikipedia readers aged 18-29 by surveyed project

Reader Gender Identity

[edit]

Johnson et al. (2021)[4] demonstrate key gender differences in Wikipedia readership; specifically, that men are overrepresented among Wikipedia readers and read more frequently and for longer sessions and that men and women show distinctive topical preferences. This is consistent with the well-known and persistent gender-based bias of Wikipedia content and persistent overrepresentation of men among Wikipedia editors.

In order to facilitate comparisons between surveys of Wikipedia readers and contributors to Wikimedia projects, this research employed a gender identity survey item aligned with that used in the 2024 Community Insights survey. Note that respondents to the arwiki, fawiki, and inwiki surveys were not presented with the "transgender", "non-binary", and "genderfluid" response options.

Which of these categories describe your gender
identity? Select all that apply.

□ Man
□ Woman
□ Transgender
□ Non-binary
□ Genderfluid
□ Other: _________________
○ I prefer not to say

Across all surveyed projects, a clear majority (63.3%) of respondents identified solely as men, 25.1% identified solely as women, 6.4% identified as genderdiverse, and 5.1% declined to provide an answer.

Bar chart showing the distribution of gender identities selected by Wikipedia readers across 22 surveyed projects (responses have been recoded to mutually exclusive categories)
Gender identities of Wikipedia readers (responses recoded to mutually exclusive categories)

Readers identifying solely as men made up an outright majority in every surveyed project, but projects like Romanian Wikipedia (54.6% readers identifying as men only) and Ukrainian Wikipedia (51.7%) are substantially closer to gender parity than projects like Turkish Wikipedia (71.7%) or Indonesian Wikipedia (70.6%).

Bar chart showing the proportion of readers who identify solely as men across 22 surveyed Wikipedia projects
Share of readers who identify solely as men by surveyed Wikipedia project

Reader Education

[edit]

As summarized in the Taxonomy of Knowledge Gaps, a substantial body of research demonstrates that Wikipedia readers are disproportionately highly-educated. A related body of research suggests that English Wikipedia articles[5] may not be readable for less-highly-literate readers, particularly for health-related content[6][7], while more recent research suggests these findings can be extended to most other language versions.

Measuring educational attainment cross-nationally is a longstanding methodological challenge in survey research[8]. This is further complicated in our case by the fact that Global Readers surveys are designed and sampled by language project rather than by geography (e.g., enwiki respondents alone are educated under a wide variety of very different educational systems). We also sought to balance survey item simplicity with cross-system comparability. Together, these constraints made it difficult for us to substantially localize our measures of educational attainment.

In this survey, we measured education with two survey items: one asking whether respondents were currently enrolled as students and a subsequent item asking non-students to indicate their level of educational attainment based loosely on the ISCED-1997 classifications. We employed this scheme rather than years of education completed as used in previous readers survey research to facilitate more direct comparisons both cross-nationally[9] and with Community Insights data on contributors.

Are you currently enrolled as a student in school (for example, high school, vocational or trade school, a college or university)?
○ Yes
○ No
○ I'm not sure
○ I prefer not to say

Only shown to respondents who selected "No" above

What is the highest level of formal education you have completed?
○ I have no formal schooling
○ Some primary or elementary school
○ Primary or elementary school
○ Lower secondary or middle school
○ Upper secondary or high school
○ A post-secondary technical or vocational degree or certificate
○ A post-secondary or university degree
○ A post-graduate degree (e.g., master's, doctorate, or professional degree)
○ I'm not sure
○ I prefer not to say

Current students

[edit]

Substantial shares of readers in every surveyed project indicated that they are currently enrolled students, although this varies considerably from fewer than one-in-five overall among Dutch (19.7%) and German (19.5%) Wikipedia readers to an outright majority of Vietnamese Wikipedia readers (54.2%).

Bar chart showing the proportion of readers who are currently enrolled as students across 22 surveyed projects
Share of readers who are currently students by surveyed project

In addition, current students comprise a majority of younger readers (those 18-29) in each surveyed project.

Bar chart showing the proportion of respondents aged 18-29 who are currently enrolled as students across 22 surveyed projects
Proportion of readers aged 18-29 who are currently enrolled as students by project

Educational attainment (non-students)

[edit]

Overall, Wikipedia readers are highly-educated: a majority of non-students (56.0% total) have completed a Bachelors' degree (28.8%) or a post-graduate degree (27.2%).

Bar chart showing educational attainment for Wikipedia readers (non-students)
Proportional shares of Wikipedia readers by educational attainment (non-students only)

At the project level, Indonesian Wikipedia readers are most likely to report an educational attainment at the upper secondary (high school) level or lower, while Polish Wikipedia readers are most likely to report holding a post-graduate degree.

Faceted bar chart showing the distribution of educational attainment of Wikipedia readers (non-students only) across 22 surveyed projects
Reader educational attainment (non-students only) by project

Among non-students, Ukrainian Wikipedia readers (76.5%) are most likely overall to report having at least a Bachelor's degree, while Indonesian Wikipedia readers are the least likely (38.1%) relative to other surveyed projects.

Bar chart showing the proportion of readers holding a Bachelor's degree or Post-graduate degree across 22 surveyed projects (non-students only)
Proportion of readers with Bachelor's or Post-graduate degrees by surveyed project (non-students only)

Reader Languages

[edit]

In general, Wikipedia readers are highly multilingual. When asked what languages they speak fluently, fewer than half (44%) say they are fluent in only one language, while more than one-in-five (21.5%) say they speak three or more fluently. However, readers are overwhelmingly reading in (one of) their primary languange(s).

Bar chart showing the distribution of readers by the number of languages they speak fluently
Readers by the number of languages they speak fluently

In all but one surveyed project, about nine-in-ten (or more) readers say they are reading in one of their primary languages. The relative exception to this finding is English Wikipedia, where more than one-in-four say English is not one of their primary languages.


Bar chart showing the share of readers who are reading in (one of) their primary language(s) by project
Readers reading in (one of) their primary language(s) by project

In contrast, the prevalence of monolinguality in the project language varies considerably by project. In general, East Asian language projects (and Greek Wikipedia) show the highest levels of monolinguality among readers—especially readers of Japanese Wikipedia (90.1%). Conversely, readers of German Wikipedia (22.3%) and Turkish Wikipedia (22.8%) were least likely to say they were monolingual in the project language.

A bar chart showing the share of readers in each of 22 surveyed projects with monolingual fluency in the project language
Monolingual fluency among Wikipedia readers across 22 surveyed projects

Reader Identities

[edit]

In order to measure cultural background gaps, as described in the Taxonomy of Knowledge Gaps we employ survey items adapted from the European Social Survey[10] (and also used in the Community Insights survey of Wikimedia contributors) designed to measure whether respondents belong to:

  • A minority ethnicity in the country where they live
  • A group that is discriminated against in the country where they live
  • Why they are discriminated against (if applicable)

Minority Ethnicity

[edit]
Do you belong to a minority ethnic group in the country where you currently live?
○ Yes
○ No
○ I'm not sure
○ I prefer not to say

The UN Office of the High Commissioner for Human Rights (UNHCR), roughly estimates that 10-20 percent of the world population belongs ot a national, ethnic, religious, or linguistic minority. This is broadly consistent with our Global readers sample, where 15% of respondents indicate that they belong to an ethnic minority in the country where they live.

Bar chart showing proportion of Wikipedia readers belonging to a minority ethnic group in their country
Share of Wikipedia readers belonging to a minority ethnic group

At the project level, readers of idwiki (20.7%) and enwiki (19.6%) are most likely to identify as an ethnic minority. Conversely, readers of itwiki (4.0%) and elwiki (3.7%) are least likely to identify as belonging to a minority ethnic group.

Bar chart showing proportions of Wikipedia readers belonging to a minority ethnic group in their country in each of 22 surveyed projects
Share of Wikipedia readers belonging to a minority ethnic group in each surveyed project

Discriminated Group Belonging

[edit]
Sometimes people are discriminated against based on characteristics like abilities, physical appearance, or group belonging.

Would you describe yourself as a member of a group that has been discriminated against in the country where you currently live?

○ Yes
○ No
○ I'm not sure
○ I prefer not to say

One-in-four (25.0%) readers indicated that they belong to a group that is discriminated against in the country where they live. These findings are broadly similar to those reported in the 2023 Community Insights survey of Wikimedia contributors.

Bar chart showing proportion of Wikipedia readers who say they belong to a group discriminated against in their country
Share of Wikipedia readers belonging to discriminated groups

Readers of English wikipedia appear most likely to describe themselves as belonging to a discriminated group (31.9%). Readers of Vietnamese wikipedia are the least likely to identify that way (5.5%). Unfortunately, we are not able at this point to determine the extent to which project-level variation on this item is the product of different experiences, varying levels of willingness to identify as belonging to a marginalized group, or varying understandings of what it means to be discriminated against.

Bar chart showing proportion of Wikipedia readers who say they belong to a group discriminated against in their country in each surveyed project
Share of Wikipedia readers belonging to discriminated groups by project

Readers who indicated that they belonged to a discriminated group were then asked to indicate on what grounds their identity/identities are discriminated against. Respondents were able to select as many as applied. Overall, readers were most likely to say they were discriminated against due to their gender (29.4%) or their skin color or race (27.6%).

Bar chart showing the reasons Wikipedia readers experience discrimination, ordered by frequency named (respondents can select more than one option)
Reasons for discrimination named by Wikipedia readers, ordered by frequency

Methodology

[edit]

Sampling

[edit]

This project employed simple random sampling of Wikipedia readers using the QuickSurveys extension. Sampling rates vary by project and are shown above. The QuickSurveys opt-in was displayed to non-logged-in readers only and asked whether they would consent to "Take a short survey and help us improve Wikipedia". We chose to employ the QuickSurvey tool to sample readers (rather than e.g., a Central Notice Banner) both for consistency with previous readers research conducted by the Wikimedia Foundation and to avoid sampling readers from non-article pages (e.g., talk pages, community pages, Wikipedia home pages).

Global Readers Survey 2023 enwiki pilot Quicksurvey initiation message

Readers who consented to the survey were then linked out to a survey hosted on LimeSurvey, an open-source survey platform.

Screenshot of Limesurvey landing page for 2023 enwiki Global Readers Survey

Weighting

[edit]

In order to account for sampling design and to better match the global population of Wikipedia readers, we apply weights based on global population parameters following the method described in DeBell and Krosnick (2009)[11] implemented using the 'anesrake' software package written for R.

Survey responses were weighted at the project level by OS family (Android, iOS, Windows, other), referrer class (external via search engine, internal, other), session length (one, two, three or more), geography (weighting categories vary by project). For analyses at the global level, responses were also weighted by project shares of overall traffic during the time when the surveys were in the field.

Reader Behavior

[edit]

Overall Topic Prevalence

[edit]

Using webrequest data, survey responses can be linked to the pages each reader views during the reading session from which they were sampled. These pages can then be classified into one of 64 topics based on a language-agnostic topic classification method developed by Johnson, Gerlach, and Sáez-Trumper (2021)[12]. Topic classifications recorded for each respondent are not mutually exclusive at either the article or reading session/respondent level. That is, any given article may be classified into multiple topics and respondents may read multiple articles per session.

At the most general level, Wikipedia articles are classified into one of four "top-level" topics: "Culture", "Geography", "STEM", and "History & Society". Overall, a majority of Global Readers Survey 2023 respondents (53%) viewed a Culture-related article. Geography-related articles were viewed by 43% of respondents, while readers were much less likely to view either STEM-related (19%) or History & Society-related (18%) articles.

Bar plot showing the proportion of surveyed readers reading articles in each (top-level) topic during the reading session from which they were sampled
Bar plot showing the proportion of surveyed readers reading articles in each (top-level) topic during the reading session from which they were sampled

Next, the figure below depicts the proportion of readers who read articles on a given topic during the reading session from which they were sampled for the survey. Only "bottom-level" topics are shown to facilitate like-to-like comparisons. That is, because all Geography articles that are about the region within the Americas known as North America are also simultaneously Geography articles, Geography articles about a region, and Geography articles about a region within the Americas, we only show statistics for "Geography.Regions.Americas.North_America".

Dot plot showing the proportion of readers who read articles on a given topic (topics are not mutually exclusive)
Dot plot showing the proportion of readers who read articles on a given topic (topics are not mutually exclusive)

Overall, the most frequently-read Geography topic is "Geography.Regions.Americas.North_America" (visited by 10.3% of readers). The most-frequently read Culture topic is "Culture.Media.Music" (visited by 7.7% of readers), while the most frequently read History and Society Topic is "History_and_Society.Politics_and_government" (visited by 5.0% of readers) and the top STEM topic is "STEM.Biology" (visited by 3.8% of readers).


Reader Behavior by Gender Identity

[edit]

Building on findings from Johnson et al. (2021)[4], we examine how the way readers interact with Wikipedia varies by their gender identities.

Session length by Gender Identity

[edit]

For this research, we define session length as the number of webrequests logged within a reading session--including pages viewed both before and after the survey response. This follows the method described in Singer et al. (2017) and utilized in the 2019 Global Readers survey.[13] In other words, session length is measured in terms of articles viewed, not in time elapsed. Overall, the weighted mean session length for 2023 Global readers survey respondents across all projects was 4.2 articles viewed.

Consistent with findings reported by Johnson et al. (2021)[4], we again observe a significant and substantial gender gap in session length, where readers identifying as women view fewer articles per reading session. However, the 2023 Global Readers Survey introduced a new gender identity item, allowing respondents to select multiple gender identities. Thus, findings from this research suggest that the gender gap in session length may be due to readers who identify solely as women reading distinctively fewer articles per reading session.

Bar plot showing readers' session length by their gender identities (coded to be mutually exclusive).
Bar plot showing readers' session length by their gender identities (coded to be mutually exclusive).

Similarly, when genderdiverse identities are shown as non-mutually-exclusive categories, it becomes clear that it is readers who identify solely as women who have distinctively shorter reading session lengths. This should complicate our understanding of the nature of the Wikipedia readership gender gap—and how to close it. For example, product and policy interventions aimed at closing the gender gap may have to specifically target readers who identify solely as women.

Bar plot showing reader session length by gender identity (not mutually exclusive)
Reader session length by gender identity (not mutually exclusive)

Topical Preferences by Gender Identity

[edit]
Women's Biographies
[edit]

Johnson et al. (2021)[4] find that readers identifying as women are generally more likely to read biographies of women compared to readers identifying as men (although men, as a majority of Wikipedia readership nonetheless comprise an absolute majority of readers of women's biographies). We find a similar pattern in the 2023 Global Readers Survey: readers identifying (solely) as women are about 1.65× as likely to read women's biography articles as readers identifying (solely) as men. The figure below shows the probability that a reader will read a woman's biography article conditional on their gender identity. For example, 2.7% (±1.4%) of readers identifying as genderdiverse viewed a biographical article about a woman during the reading session from which they were sampled for the survey.

Bar plot showing reader session length by gender identity (not mutually exclusive)
Reader session length by gender identity (not mutually exclusive)

However, some of the apparent "self-focused" gender gap in readership for women's biography articles is due to a broader gender gap in interest in biographical articles in general. When we focus only on those readers who viewed any biographical article in the course of their reading session, the gender gap begins to close: readers identifying (solely) as women are about 1.3× as likely to read women's biographies as readers identifying (solely) as men.

Bar plot showing the probability that readers of biographies will read a biography of a woman, by reader gender identity
Bar plot showing the probability that readers of biographies will read a biography of a woman, by reader gender identity

What's more, when we incorporate covariate adjustments for session length, age, access method, educational attainment, and urbanity in a logistic regression predicting P(Reading women's biography), gender identity is not a statistically significant predictor. However, interpreting this finding is complicated by the fact that e.g., age, educational attainment, and urbanity are also likely to be (although this is unknowable with our data) systematically related to survey response propensity, raising the possibility that adjusting for these covariates amounts to conditioning on a collider. Thus, further research maybe required to better identify (and quantify) the extent of potential systematic nonresponse bias (e.g., by comparing demographic data collected via different sampling methods such as QuickSurvey, QuickSurvey link-out to Limesurvey, and Central Notice Banner link-out to Limesurvey).

A dot plot showing the predicted probability (conditional on covariates) of reading women's biographical articles
A dot plot showing the predicted probability (conditional on covariates) of reading women's biographical articles

Planned Future Analysis

[edit]

We plan to conduct the following further analyses of the 2023 Global Readers survey data and to share their results here:

References

[edit]
  1. Neundorf, Anja; Niemi, Richard G. (2014). "Beyond political socialization: New approaches to age, period, cohort analysis". Electoral Studies: 1–6. ISSN 0261-3794. doi:10.1016/j.electstud.2013.06.012. 
  2. Dinas, Elias; Stoker, Laura (2014). "Age-Period-Cohort analysis: A design-based approach". Electoral Studies: 1–6. ISSN 0261-3794. doi:10.1016/j.electstud.2013.06.006. 
  3. Andrews, Frank M.; Herzog, A. Regula (1986). "The Quality of Survey Data as Related to Age of Respondent". Journal of the American Statistical Association 81 (394): 403–410. doi:10.1080/01621459. 
  4. a b c d e Johnson, Isaac; Lemmerich, Florian; Sáez-Trumper, Diego; West, Robert; Strohmaier, Markus; Zia, Leila (2021). "Global Gender Differences in Wikipedia Readership". Proceedings of the International AAAI Conference on Web and Social Media, 15(1): 254–265. doi:10.1609/icwsm.v15i1.18058. 
  5. Lucassen, Teun; Dijkstra, Roald; Schraagen, Jan Maarten (2012). "Readability of Wikipedia". First Monday 17 (9). ISSN 1396-0466. doi:10.5210/fm.v0i0.3916. 
  6. Reavley, NJ; Mackinnon, AJ; Morgan, AJ; Alvarez-Jimenez, M; Hetrick, SE; Killackey, E; Nelson, B; Purcell, R; Yap, MBH; Jorm, AF (2012). "Quality of information sources about mental disorders: a comparison of Wikipedia with centrally controlled web and printed sources". Psychological Medicine 42 (8): 1753–1762. doi:10.1017/S003329171100287X. 
  7. Brezar, Aleksandar; Heilman, James (2019). "Readability of English Wikipedia's health information over time". WikiJournal of Medicine 6 (1): 1–6. ISSN 2002-4436. doi:10.15347/wjm/2019.007. 
  8. Connelly, Roxanne; Gayle, Vernon; Lambert, Paul S. (2016). "A review of educational attainment measures for social survey research". Methodological Innovations 9: 1–11. ISSN 2059-7991. doi:10.1177/2059799116638001. 
  9. Schneider, Silke L.; Gayle (2010). "Nominal comparability is not enough: (In-)equivalence of construct validity of cross-national measures of educational attainment in the European Social Survey". Research in Social Stratification and Mobility 28: 343–357. doi:10.1016/j.rssm.2010.03.001. 
  10. European Social Survey European Research Infrastructure (ESS ERIC) (2023), ESS round 10 - 2020. Democracy, Digital social contacts. Sikt - Norwegian Agency for Shared Services in Education and Research., doi:10.21338/NSD-ESS10-2020 
  11. DeBell, Matthew; Krosnick, Jon A. (2009). "Computing Weights for American National Election Study Survey Data" (PDF). ANES Technical Report series (nes012427): 1–14. 
  12. Johnson, Isaac; Gerlach, Martin; Sáez-Trumper, Diego (2021). "Language-agnostic Topic Classification for Wikipedia". WWW '21: Companion Proceedings of the Web Conference 2021: 594–601. doi:10.1145/3442442.3452347. 
  13. Singer, Phillip; Lemmerich, Florian; West, Robert; Zia, Leila; Wulczyn, Ellery; Strohmaier, Markus; Leskovec, Jure (April 3, 2017). "Why We Read Wikipedia". arXiv:2406.01835. 
  14. Trokhymovych, Mykola; Sen, Indira; Gerlach, Martin (June 3, 2024). "An Open Multilingual System for Scoring Readability of Wikipedia". arXiv:2406.01835. .
  15. Cruciani, Caterina; Joubert, Léo; Jullien, Nicolas; Mell, Laurent; Piccione, Sasha; Vermeirsche, Jeanne (2023-12-01). "Surveying Wikipedians: a dataset of users and contributors' practices on Wikipedia in 8 languages". arXiv:2311.07964.  Dataset: Cruciani, Caterina; Joubert, Léo; Jullien, Nicolas; Mell, Laurent; Piccione, Sasha; Vermeirsche, Jeanne (2023-12-01). Surveying Wikipedians: a dataset of users and contributors' practices on Wikipedia in 8 languages. doi:10.34847/nkl.4ecf4u8m.