Test - Chi-Square Test of Independence

From Q
Jump to navigation Jump to search

Test for independence between a pair of categorical variables

Tests for independence between a pair of categorical variables. Any non-categorical variables that are supplied will be treated as categorical, that is to say, cases with the same value are treated as being in the same category, and date variables are categorised by period.

How to run this test

  1. In Displayr, go to Insert > More > Test > Chi-Square Test of Independence. In Q, go to Create > Test > Chi-Square Test of Independence
  2. Specify the variables to use under Inputs > Input Variables
  3. Adjust the options (noted below)

Chi oi.PNG

You should use numeric variables as inputs. If you use categorical or ordinal variables, they will be coerced to numeric based on their values for the purposes of runnning the test.

Example

An example output is shown below:

Options

INPUTS

Variable 1 Sample to analyse.
Variable 2 Second sample to compare to Variable 1.
Variable names Display Variable Names in the output, instead of Variable Labels.
More decimal places Display numeric values with 8 decimal places.

Additional Properties

When using this feature you can obtain additional information that is stored by the R code which produces the output.

  1. To do so, select Create > R Output.
  2. In the R CODE, paste: item = YourReferenceName
  3. Replace YourReferenceName with the name of your item. (eg: 'chi_square.test'). You can find this by selecting the item and then going to Properties > General > Name from the object inspector on the right.
  4. Below the first line of code, you can paste in snippets from below or type in str(item) to see a list of available information.

Chi more.PNG

Acknowledgements

Uses the svychisq function from the survey package to conduct the chi-square test.

Code

form.setHeading("Chi-Square Test of Independence");
form.dropBox({label: "Variable 1",
              types:["Variable: Numeric, Categorical, OrderedCategorical, Text, Date, Money"],
              name: "formVariable1", prompt: "Select the Variable containing the first sample"});
form.dropBox({label: "Variable 2",
              types:["Variable: Numeric, Categorical, OrderedCategorical, Text, Date, Money"],
              name: "formVariable2", prompt: "Select the Variable containing the second sample"});
form.checkBox({label: "Variable names", name: "formNames", default_value: false,
               prompt: "Display names instead of labels"});
form.checkBox({label: "More decimal places", name: "formDecimals", default_value: false,
               prompt: "Display numeric values with eight decimal places"});
library(flipData)
library(flipFormat)
library(flipTransformations)
library(survey)

if (length(formVariable1) != length(formVariable2))
    stop("Variables 1 and 2 have different lengths. Please ensure that the variables are from the same data set or have the same length.")

dat.raw <- ProcessQVariables(data.frame(var1 = formVariable1, var2 = formVariable2, stringsAsFactors=FALSE))
dat <- dat.raw[QFilter, ]
dat$var1 <- factor(dat$var1)
dat$var2 <- factor(dat$var2)
 
if (is.null(QCalibratedWeight)) {
    s <- summary(xtabs(~ var1 + var2, dat))
    statistic.value <- s$statistic
    df <- s$parameter
    p.value <- s$p.value
    statistic.name <- "Chi-square"
} else {
    wgt <- QCalibratedWeight[QFilter]
    design <- WeightedSurveyDesign(dat, wgt)
    tryCatch(test <- svychisq(~ var1 + var2, design, statistic = "F"),
             error = function(e) {
                 if (grepl("system is computationally singular", e))
                     stop(paste("A weighted chi-square test could not be run using the selected variables.",
                                "Consider merging categories or removing the weight variable."))
                 else
                     stop(e)
             })
    statistic.value <- test$statistic
    df <- test$parameter[1]
    p.value <- test$p.value
    statistic.name <- "F"
}
decimal.places <- if (formDecimals) 8 else NULL
chi.sq.test <- list(statistic = statistic.value, df = df, p.value = p.value)
SignificanceTest(chi.sq.test, "Chi-Square Test of Independence", dat.raw, filter = QFilter,
                     show.labels = !formNames, decimal.places = decimal.places)