Regression - Diagnostic - Plot - Influence Index

From Q
Jump to navigation Jump to search

Create index plots of studentized residuals, hat values, and Cook's distance

Charts the studentized residuals, hat values, and Cook's distances for the observations in a regression model. The observation numbers of the five highest values on each of the measures are charted.

Example

The below example shows the output from running this diagnostic on a Poisson regression model used to predict days absent from school for a sample of school children in New South Wales, Australia.

Details

Roughly, in a model with n samples and p predictors (including the intercept), studentized residuals larger than 2 indicate possible outliers, hat values larger than 2*p/n indicate possible high leverage values, and Cook's distance values larger than 4/(n-p) indicate possible highly influential observations.

Acknowledgements

Uses the influenceIndexPlot function from the car package.

References

Fox, J., & Sanford, W. (2011). An R Companion to Applied Regression, Second Edition. Thousand Oaks CA: Sage.

Weisberg, S. (2014). Applied Linear Regression, Fourth Edition. Wiley.

Code

includeWeb("QScript R Output Functions");

main();

function main() {

    // The following 2 variables contain information specific to this diagnostic.
    var required_class = "Regression";
    var output_name_suffix = "influence.index";
    
    var item = checkSelectedItemClass(required_class);
    if (item == null)
        return false;
    var r_name = stringToRName(item.referenceName);

    // The following lines contain the R code to run
    var expression = "car::influenceIndexPlot(" + r_name + ", id = list(method = 'y', n = 5, cex = 1, location = 'lr'),  vars = c('Studentized', 'hat', 'Cook'))";

    return createROutput(item, expression, output_name_suffix);
}