compare 2 surveys:
1. 100% coverage, but response probability P[R = 1 | Y] differs a lot by Y
2. Only 5% coverage, but P[R = 1 | Y] is roughly constant across Y
which would you use ? both ?
compare 2 surveys:
1. 100% coverage, but response probability P[R = 1 | Y] differs a lot by Y
2. Only 5% coverage, but P[R = 1 | Y] is roughly constant across Y
which would you use ? both ?
we’ve focused on estimating means E[Y].
but say Y are openends ("describe how you feel about the candidate") and you want to read thru a few draws from the population, not only survey responders.
what should you do ?
we’ve focused on estimating means E[Y].
but say Y are openends ("describe how you feel about the candidate") and you want to read thru a few draws from the population, not only survey responders.
what should you do ?
so far we've talked about weights and MRP for E[Y], vote choice in the population overall.
but what if you want E[Y | V = 1], vote choice in the population of voters.
what are the weights and how do you modify MRP ?
so far we've talked about weights and MRP for E[Y], vote choice in the population overall.
but what if you want E[Y | V = 1], vote choice in the population of voters.
what are the weights and how do you modify MRP ?
We are looking for a teammate with expertise in both LLM tools and statistical modeling.
Someone who clearly communicates assumptions, results, and uncertainty. With care and kindness.
We are looking for a teammate with expertise in both LLM tools and statistical modeling.
Someone who clearly communicates assumptions, results, and uncertainty. With care and kindness.
typical machine learning loss looks at one individual at a time
but for MRP, we care about aggregates
typical machine learning loss looks at one individual at a time
but for MRP, we care about aggregates
you've got a survey collected by someone else, and they gave you weights.
how can you use those weights in the MRP (Multilevel Regression and Poststratification) ?
you've got a survey collected by someone else, and they gave you weights.
how can you use those weights in the MRP (Multilevel Regression and Poststratification) ?
you've done MRP.
someone asks you for survey weights.
how to get them ?
you've done MRP.
someone asks you for survey weights.
how to get them ?
in midterms, voters tend to support the out party for balance
do polls still help predict midterms ? yes
in midterms, voters tend to support the out party for balance
do polls still help predict midterms ? yes
Basu's Bears is a lesson in:
1) using auxiliary information (pre-salmon-feasting weights)
2) how bad an unbiased estimator can be
statmodeling.stat.columbia.edu/2025/09/23/s...
Basu's Bears is a lesson in:
1) using auxiliary information (pre-salmon-feasting weights)
2) how bad an unbiased estimator can be
statmodeling.stat.columbia.edu/2025/09/23/s...
we turned to response instrument Z because random sampling is "dead"
but does this method still rely on starting with random sampling ?
we turned to response instrument Z because random sampling is "dead"
but does this method still rely on starting with random sampling ?
we turned to response instrument Z because random sampling is "dead"
but does this method still rely on starting with random sampling ?
we turned to response instrument Z because random sampling is "dead"
but does this method still rely on starting with random sampling ?
we want E[Y|X] but X can be missing
@lucystats.bsky.social @sarahlotspeich.bsky.social @glenmartin.bsky.social @maartenvsmeden.bsky.social et al. say:
random imputation should use Y
deterministic imputation shouldn't
statmodeling.stat.columbia.edu/2025/09/09/s...
we want E[Y|X] but X can be missing
@lucystats.bsky.social @sarahlotspeich.bsky.social @glenmartin.bsky.social @maartenvsmeden.bsky.social et al. say:
random imputation should use Y
deterministic imputation shouldn't
statmodeling.stat.columbia.edu/2025/09/09/s...
split-plot designs are analogous to cluster sampling.
blocking is analogous to stratification.
featuring an experiment by Arjun Potter and colleagues at NM-AIST !
split-plot designs are analogous to cluster sampling.
blocking is analogous to stratification.
featuring an experiment by Arjun Potter and colleagues at NM-AIST !
what are the problems with using LLMs as survey respondents ?
how are these similar to problems with poststratification ?
CC @tslumley.bsky.social
what are the problems with using LLMs as survey respondents ?
how are these similar to problems with poststratification ?
CC @tslumley.bsky.social
2 weeks ago we learned about the CES employer survey that produces the jobs count.
we asked: why use employment size in stratification but not nonresponse adjustment ?
BLS responded !
statmodeling.stat.columbia.edu/2025/08/19/s...
2 weeks ago we learned about the CES employer survey that produces the jobs count.
we asked: why use employment size in stratification but not nonresponse adjustment ?
BLS responded !
statmodeling.stat.columbia.edu/2025/08/19/s...
in political surveys, we "logit shift" predictions to match known aggregates (e.g. total Democratic votes).
but what happens for multinomial outcomes ?
a fun excuse to review IPF/raking 🍂
statmodeling.stat.columbia.edu/2025/08/12/s...
in political surveys, we "logit shift" predictions to match known aggregates (e.g. total Democratic votes).
but what happens for multinomial outcomes ?
a fun excuse to review IPF/raking 🍂
statmodeling.stat.columbia.edu/2025/08/12/s...
let's learn about the CES employer survey that produces the jobs count.
late reporting (a form of nonresponse) results in revisions.
my first (naive !) question: why use employment size in stratification but not nonresponse adjustment ?
let's learn about the CES employer survey that produces the jobs count.
late reporting (a form of nonresponse) results in revisions.
my first (naive !) question: why use employment size in stratification but not nonresponse adjustment ?
whether you respond to a survey (R) may depend on outcome (Y), even after controlling for covariates (X)
what if we can expand this set of X to include interest in politics ?
whether you respond to a survey (R) may depend on outcome (Y), even after controlling for covariates (X)
what if we can expand this set of X to include interest in politics ?
so far we assumed response R is independent of outcome Y **within X**
but if R can depend on Y, what to do ?
one idea: use a response instrument Z
statmodeling.stat.columbia.edu/2025/07/22/s...
so far we assumed response R is independent of outcome Y **within X**
but if R can depend on Y, what to do ?
one idea: use a response instrument Z
statmodeling.stat.columbia.edu/2025/07/22/s...
panel data includes repeated surveys of the same people over time.
this structure can be incorporated into models using person-level effects.
but misspecifying the person-level effects distribution can cause bias.
panel data includes repeated surveys of the same people over time.
this structure can be incorporated into models using person-level effects.
but misspecifying the person-level effects distribution can cause bias.
With nonresponse worsening, we want to adjust for a lot of covariates.
This often means handling many missing covariates.
In theory, fit one big model for everything. But how can practitioners handle this ?
With nonresponse worsening, we want to adjust for a lot of covariates.
This often means handling many missing covariates.
In theory, fit one big model for everything. But how can practitioners handle this ?
love the name: Structural Zero.
Alan Agresti's Categorical Data Analysis book offers a good explanation (which I'm sure the amazing authors at @hrdag.org will get into):
love the name: Structural Zero.
Alan Agresti's Categorical Data Analysis book offers a good explanation (which I'm sure the amazing authors at @hrdag.org will get into):