How to weight survey data for non-binary gender questions
There are at least three broad categories of solutions for dealing with non-binary respondents in the absence of reliable population data: removal, aggregation and assignment.
Removal
Removal refers to excluding non-binary respondents; either by dropping those responses completely or by excluding them from weighting. In the latter case, these responses can still be used in analysis by assigning them a weight of 1.
Aggregation
Aggregation refers to collapsing gender categories. For example, we might collapse the “woman” and “not listed” categories and weight on “male” and “not male” categories. This is common practice when variables with many categories are used in weighting, particularly when some categories are very small. For example, many researchers will collapse a race question with 12 categories into one with only three or four broader categories.
Assignment
Assignment refers to assigning non-binary respondents to one of the two binary gender categories for weighting purposes. This could be done through random assignment, imputation or by some other method.
Which weighting solutions should survey researchers use?
Others have made thoughtful arguments about selecting a solution to this problem. In their excellent article, Kennedy et al. (2022) suggest considering ethics, accuracy, practicality and flexibility when evaluating potential solutions. These are all important considerations, and researchers should make their own decisions about which solution reaches the right balance.
In this analysis we will focus on accuracy; specifically, how well different weighting methods allow us to represent the non-binary population among the broader population.
To measure accuracy, we first searched for high-quality surveys of the non-binary population to use as a benchmark. We considered a number of surveys, including NIH’s PATH Study, the TransPop Study and the Household Pulse Survey, all of which measure gender identity beyond the two traditional response categories. We ultimately selected the Household Pulse Survey because it is recent, uses a high-quality sampling procedure and the Census may adopt its question-wording in the future.
Next, we weighted data from Morning Consult’s Intelligence survey using a variety of methods:
- Removal: Nonbinary respondents were excluded during weighting and assigned a weight of 1.
- Coin Flip: Nonbinary respondents were randomly assigned to the “man” or “woman” categories for weighting purposes.
- Imputation: Nonbinary respondents were assigned to the “man” or “woman” categories using an imputation model. The imputation model was trained on Household Pulse data and used age, race, ethnicity, education and census region to predict gender at birth for each respondent.
- Probabilistic Assignment: We calculated the probability that a given respondent was born Male conditional on age, education and race using Household Pulse data. We then used these probabilities to assign binary sex given each respondent’s demographic profile.
- Aggregation: Respondents who selected “woman” and “not listed” were collapsed into a single category.
We then benchmarked the weighted Morning Consult data against Household Pulse estimates for a variety of variables among the non-binary population. Following the framework used in Pew benchmarking studies, we calculated the difference between the population benchmark and the weighted sample distribution for each category of each question. We then calculated question-level bias by averaging category-level biases and a single summary of overall bias by averaging the question-level biases.
Results of each approach
The following chart shows the question-level bias for a variety of questions using each weighting method. There is no clear winner. The aggregation, imputation, coin flip and probabilistic methods perform very similarly for all variables except sex assigned at birth, where the coin flip method significantly underperforms the other methods. The removal method performed noticeably better than other methods on some variables but noticeably worse on others. Overall bias numbers are also very close, with probabilistic assignment and imputation narrowly outperforming other methods.
ncG1vNJzZmiln6e7qrrGnKanq6WhwW%2BvzqZma2hiaHxxgY5pamivlZ60qcDIp55mq6Wnw6bFjJ2YrZldo7yvecGipZqqqWK0prrDnqlo