Maximum Likelihood Estimation: Categorical and Limited Dependent Variables
Course Description
This course focuses on models for the analysis of categorical and other kinds of non-continuous dependent variables, which are typically estimated via maximum likelihood estimation methods. We will begin with logit and probit models for binary outcomes (e.g. individuals voting or abstaining in an election) and link those models to the broader “generalized linear model” framework. We’ll then cover maximum likelihood methods for estimating parameters in logit and probit (and many other) models, and discuss recent issues and problems in the specification, estimation and interpretation of these models. We’ll then move on to models for other common non-continuous dependent variables: ordinal outcomes (e.g., regimes being “repressive,” “semi-repressive,” or “free”); nominal outcomes with more than two unordered categories (e.g. voting for a “Left”, “Right”, “Populist” or “Green” party); count outcomes, e.g. the number of vetoes exercised by Presidents during a given term in office; and censored outcomes, where the values of observations above or below a certain point are unknown (e.g. “desired” campaign contributions where the observed amount is legally capped at a certain value). We will then consider models for categorical and limited variables in multilevel or longitudinal data structures (e.g., multiple observations over time for the same individuals, or multiple individuals observed in different countries). The final unit will cover methods that attempt to handle the problems of endogenous regressor and sample-selection biases in order to estimate causal effects in models for non-continuous outcomes.
The goals of the class are for you to be able to use these methods in your own research, and to understand and critique published works in the discipline that make use of these techniques.
Texts
- Long, J. Scott. 1997. Regression Models for Categorical and Limited Dependent Variables. Sage Publications, Inc.
- Long, J. Scott and Jeremy Freese. 2014. Regression Models for Categorical Dependent Variables Using Stata. 3rd Edition. Stata Press.
Supplemental References
- Allison, Paul D. 2009. Fixed Effects Regression Models. Sage Publications, Inc.
- Breen, Richard. 1996. Regression Models: Censored, Sample Selected or Truncated Data. Sage Publications, Inc.
- Liao, Tim Futing. 1994. Interpreting Probability Models: Logit, Probit and Other Generalized Linear Models. Sage Publications, Inc.
- Rabe-Hesketh, Sophia, and Anders Skrondal. 2012. Multilevel and Longitudinal Modeling Using Stata, Volume II: Categorical Responses, Counts, and Survival. 3rd Edition. Stata Press.
- Gill, Jeff and Michelle Torres. 2020. Generalized Linear Models: A Unified Approach, 2nd Edition. Sage Publications, Inc.
- Ward, Michael D. and John S. Ahlquist. 2018. Maximum Likelihood for Social Science: Strategies for Analysis. Cambridge University Press.
Course Requirements
Grades will be based on a 20-25 page research paper (40%), two homework exercises which relate to specific statistical methods and problems we will discuss (25% each), and an oral presentation of your research paper (10%). The paper will be a quantitative analysis using methods from this course of data that you will collect or access from social science archives or other sources.
The paper should have some substantive interest to you or be relevant to your studies in the graduate program; ideally, you can think of it as the first draft of a convention paper or possible journal publication. The paper will discuss your basic theoretical framework, your hypotheses, statistical models, results, possible problems with the analysis and what you may have done to correct or account for these problems. It will conclude with a discussion of the relevance of your findings for the general topic and for future research.
Course Outline
The course is organized by units and then topics within units. We will maintain a certain amount of flexibility with the schedule, so that we can spend more time on some topics/units and scale back on others as circumstances warrant.
Unit 1: Models for Dichotomous Dependent Variables
1. Logit, Probit and the Generalized Linear Model
- Long, J. Scott. 1997. Regression Models for Categorical and Limited Dependent Variables. Sage. Chapters 1 and 3.
- Long, J. Scott and Jeremy Freese. 2014. Regression Models for Categorical Dependent Variables Using Stata. 3rd Edition. Stata Press. Chapter 4.
- Fox, John. 2008. Applied Regression Analysis and Generalized Linear Models. 2nd Edition. Sage Publications, Inc. Chapter 15, pp. 379–385.
2. Maximum Likelihood: Estimation and Interpretation
- Long, J. Scott. 1997. Regression Models for Categorical and Limited Dependent Variables. Sage Publications, Inc. Chapters 2 and 4.
- Long, J. Scott and Jeremy Freese. 2014. Regression Models for Categorical Dependent Variables Using Stata. 3rd Edition. Stata Press. Chapters 3 and 5.
- Fox, John. 2008. Applied Regression Analysis and Generalized Linear Models. 2nd Edition. Sage Publications, Inc. Chapter 15, pp. 402–417.
3. Issues in the Estimation and Interpretation of Logit and Probit Models
- Liao, Tim Futing. 1994. Interpreting Probability Models: Logit, Probit and Other Generalized Linear Models. Sage Publications, Inc. Chapters 1–3.
- Long, J. Scott and Jeremy Freese. 2014. Regression Models for Categorical Dependent Variables Using Stata. 3rd Edition. Stata Press. Chapter 6.
- Hanmer, Michael J., and Kerem Ozan Kalkan. 2013. “Behind the Curve: Clarifying the Best Approach to Calculating Predicted Probabilities and Marginal Effects from Limited Dependent Variable Models.” American Journal of Political Science 57(1): 263–277.
- Breen, Richard, Kristian Bernt Karlson, and Anders Holm. 2018. “Interpreting and Understanding Logits, Probits, and Other Nonlinear Probability Models.” Annual Review of Sociology 44: 39–54.
- Rainey, Carlisle. 2016. “Dealing with Separation in Logistic Regression Models.” Political Analysis 24(3): 339–355.
Unit 2: Models for Ordered, Nominal, Count, and Censored Variables
1. Ordered Outcomes
- Long, J. Scott. 1997. Regression Models for Categorical and Limited Dependent Variables. Sage. Chapter 5.
- Liao, Tim Futing. 1994. Interpreting Probability Models: Logit, Probit and Other Generalized Linear Models. Sage Publications, Inc. Chapter 5.
- Long, J. Scott and Jeremy Freese. 2014. Regression Models for Categorical Dependent Variables Using Stata. 3rd Edition. Stata Press. Chapter 7.
- Fullerton, Andrew S. 2009. “A Conceptual Framework for Ordered Logistic Regression Models.” Sociological Methods & Research 38(2): 306–347.
2. Nominal Outcomes
- Long, J. Scott. 1997. Regression Models for Categorical and Limited Dependent Variables. Sage. Chapter 6.
- Liao, Tim Futing. 1994. Interpreting Probability Models: Logit, Probit and Other Generalized Linear Models. Sage Publications, Inc. Chapters 6–7.
- Long, J. Scott and Jeremy Freese. 2014. Regression Models for Categorical Dependent Variables Using Stata. 3rd Edition. Stata Press. Chapter 8.
3. Count Outcomes
- Long, J. Scott. 1997. Regression Models for Categorical and Limited Dependent Variables. Sage. Chapter 8.
- Liao, Tim Futing. 1994. Interpreting Probability Models: Logit, Probit and Other Generalized Linear Models. Sage Publications, Inc. Chapter 8.
- Long, J. Scott and Jeremy Freese. 2014. Regression Models for Categorical Dependent Variables Using Stata. 3rd Edition. Stata Press. Chapter 9.
- Rabe-Hesketh, Sophia, and Anders Skrondal. 2012. Multilevel and Longitudinal Modeling Using Stata, Volume II: Categorical Responses, Counts, and Survival. 3rd Edition. Stata Press. Chapter 13.
4. Censored Outcomes
- Long, J. Scott. 1997. Regression Models for Categorical and Limited Dependent Variables. Sage. Chapter 7, pp. 187–210.
- Breen, Richard. 1996. Regression Models: Censored, Sample Selected or Truncated Data. Sage Publications, Inc. Chapters 1–2.
Unit 3: Longitudinal and Multilevel Models
1. Panel and Multilevel Models for Dichotomous Outcomes
- Andreß, Hans-Jürgen, Katrin Golsch, and Alexander W. Schmidt. 2013. Applied Panel Data Analysis for Economic and Social Surveys. Springer. pp. 203–248.
- Allison, Paul D. 2009. Fixed Effects Regression Models. Sage Publications, Inc. Chapters 1–3.
- Rabe-Hesketh, Sophia, and Anders Skrondal. 2012. Multilevel and Longitudinal Modeling Using Stata, Volume II: Categorical Responses, Counts, and Survival. 3rd Edition. Stata Press. Chapter 10.
2. Panel and Multilevel Models for Ordinal, Nominal and Count Outcomes
- Allison, Paul D. 2009. Fixed Effects Regression Models. Sage Publications, Inc. Chapter 4.
- Rabe-Hesketh, Sophia, and Anders Skrondal. 2012. Multilevel and Longitudinal Modeling Using Stata, Volume II: Categorical Responses, Counts, and Survival. 3rd Edition. Stata Press. Chapters 11–13.
Unit 4: Endogenous Regressor and Sample Selection Models
1. Sample Selection, Endogenous Regressors and Endogenous Treatment Effects
- Angrist, Joshua D., and Jörn-Steffen Pischke. 2009. Mostly Harmless Econometrics: An Empiricist’s Companion. Princeton University Press. Chapter 4, “Instrumental Variables in Action.”
- Breen, Richard. 1996. Regression Models: Censored, Sample Selected or Truncated Data. Sage Publications, Inc. Chapters 3–5.
- Maydeu-Olivares, Alberto, Dexin Shi, and Amanda J. Fairchild. 2020. “Estimating Causal Effects in Linear Regression Models With Observational Data: The Instrumental Variables Regression Model.” Psychological Methods 25(2): 243–258.
- Terza, Joseph V., Anirban Basu, and Paul J. Rathouz. 2008. “Two-stage Residual Inclusion Estimation: Addressing Endogeneity in Health Econometric Modeling.” Journal of Health Economics 27(3): 531–543.
- StataCorp. 2021. Stata Extended Regression Models Reference Manual: Release 17. College Station, TX: Stata Press. pp. 1–67.