How to Automatically Stack a Data Set

From Q
Jump to navigation Jump to search

This page provides a guide and template to help you achieve automatic stacking your data set in Q using R. This process can save time if you need to stack data from the same study on a regular basis (e.g. for a tracking study). If you only intend to stack your data once (ie: it’s unlikely you will need to periodically stack data) then you are perhaps best to use the built-in tool (see: Stacking Data Files).

Data can be stacked when you import your data as an R Data Set. This requires you to write some R code to specify how the variables in the data set should be arranged to create the stacked data set. Below we include an example of a template that you can use in Excel to help build the stacking code. While the template can help, it is not a one-size-fits all tool, as there are many configurations that stacked data can take.

The example used here, File:Technology 2018.sav, is from a study on technology brands. The purpose of this stacking is to stack the data for all of the brands so that there is a single NPS score, and a single set of image variables for a driver analysis.

Creating a stacked data set

The process for adding a stacked data set to your project is as follows:

  1. Select File > Data Sets > Add to Project > From R.
  2. Enter the R Code which brings in the data and performs the stacking. Our example below is used to stack our example data file, File:Technology 2018.sav
  3. Modify part (1) of the code to specify the location of your file.
  4. Modify part (2) of the code to refer to the Variable Name of the variable which contains your respondent IDs.
  5. Modify part (3) of the code to specify sets of variables that are to be stacked.
  6. Enter a Name for the data set.
  7. Click the play icon.
  8. Click Add Data Set.

Your code window will look a bit like this.

Note that when developing the code for stacking your data, you may find it convenient to first run and test the code in an R Output.

Example Code

# (1) SPECIFY the file location - if using local file change \ to \\
location = "https://wiki.q-researchsoftware.com/images/3/35/Technology_2018.sav"

# (2) NOMINATE your ID (Case) variable
id.variable = "RESPNUM"

variables.to.stack = list(
# (3) REPLACE the below with a copy/paste from BLUE (first) column of Excel
    ## !!!! be sure to remove the last comma before the closing round bracket !!!! 
'Q3_01' = c('Q3_01', 'Q3_02', 'Q3_03', 'Q3_04', 'Q3_05', 'Q3_06', 'Q3_07', 'Q3_08', 'Q3_09', 'Q3_10', 'Q3_11', 'Q3_12', 'Q3_13'),
'Q4a_01' = c('Q4a_01', 'Q4a_02', 'Q4a_03', 'Q4a_04', 'Q4a_05', 'Q4a_06', 'Q4a_07', 'Q4a_08', 'Q4a_09', 'Q4a_10', 'Q4a_11', 'Q4a_12', 'Q4a_13'),
'Q4b_01' = c('Q4b_01', 'Q4b_02', 'Q4b_03', 'Q4b_04', 'Q4b_05', 'Q4b_06', 'Q4b_07', 'Q4b_08', 'Q4b_09', 'Q4b_10', 'Q4b_11', 'Q4b_12', 'Q4b_13'),
'Q4c_01' = c('Q4c_01', 'Q4c_02', 'Q4c_03', 'Q4c_04', 'Q4c_05', 'Q4c_06', 'Q4c_07', 'Q4c_08', 'Q4c_09', 'Q4c_10', 'Q4c_11', 'Q4c_12', 'Q4c_13'),
'Q4d_01' = c('Q4d_01', 'Q4d_02', 'Q4d_03', 'Q4d_04', 'Q4d_05', 'Q4d_06', 'Q4d_07', 'Q4d_08', 'Q4d_09', 'Q4d_10', 'Q4d_11', 'Q4d_12', 'Q4d_13'),
'Q4e_01' = c('Q4e_01', 'Q4e_02', 'Q4e_03', 'Q4e_04', 'Q4e_05', 'Q4e_06', 'Q4e_07', 'Q4e_08', 'Q4e_09', 'Q4e_10', 'Q4e_11', 'Q4e_12', 'Q4e_13'),
'Q4f_01' = c('Q4f_01', 'Q4f_02', 'Q4f_03', 'Q4f_04', 'Q4f_05', 'Q4f_06', 'Q4f_07', 'Q4f_08', 'Q4f_09', 'Q4f_10', 'Q4f_11', 'Q4f_12', 'Q4f_13'),
'Q4g_01' = c('Q4g_01', 'Q4g_02', 'Q4g_03', 'Q4g_04', 'Q4g_05', 'Q4g_06', 'Q4g_07', 'Q4g_08', 'Q4g_09', 'Q4g_10', 'Q4g_11', 'Q4g_12', 'Q4g_13'),
'Q4h_01' = c('Q4h_01', 'Q4h_02', 'Q4h_03', 'Q4h_04', 'Q4h_05', 'Q4h_06', 'Q4h_07', 'Q4h_08', 'Q4h_09', 'Q4h_10', 'Q4h_11', 'Q4h_12', 'Q4h_13'),
'Q4i_01' = c('Q4i_01', 'Q4i_02', 'Q4i_03', 'Q4i_04', 'Q4i_05', 'Q4i_06', 'Q4i_07', 'Q4i_08', 'Q4i_09', 'Q4i_10', 'Q4i_11', 'Q4i_12', 'Q4i_13')
)

# The following does the stacking for you
library(foreign)
datafile = suppressWarnings(read.spss(location, use.value.labels = FALSE, to.data.frame = TRUE))
library(reshape)

all.names = names(datafile)

variables.to.exclude = all.names[!all.names %in% c(unlist(variables.to.stack), id.variable)]
stacked.tech = reshape(data = datafile, 
    idvar = id.variable, direction = "long",
    drop = variables.to.exclude,
    varying = variables.to.stack)

Specifying the location of the unstacked data file

In the example above, the function read.spss from the R package foreign is used to read in the unstacked data file prior to the stacking procedure. The location in the example is a URL - a location on the web. You can also set the location to be a path on your computer. For example, the following location will obtain a file from my Desktop:

location = "C:\\Users\\Chris\\Desktop\\Cola Tracking - January to September 2017.sav"

library(foreign)
datafile = suppressWarnings(read.spss(location, use.value.labels = FALSE, to.data.frame = TRUE))

Note the use of double backslashes "\\" to separate the folders on my system.

Other functions should be used in place of read.spss() if you are using a different format. For example, read.csv is the equivalent if using a CSV file.

The respondent ID variable

You need to specify an ID variable so that the stacked cases can be identified with the respondents in the original, unstacked file. This enables matching between the stacked and unstacked data, but it is also just good housekeeping.

Specifying stacked variables

The main part of the code is the lines following variables.to.stack, which specify the groups of variables which are to be stacked.

For example,

'Q3_01' = c('Q3_01', 'Q3_02', 'Q3_03', 'Q3_04', 'Q3_05', 'Q3_06', 'Q3_07', 'Q3_08', 'Q3_09', 'Q3_10', 'Q3_11', 'Q3_12', 'Q3_13')

indicates that the variables with names Q3_01, Q3_02, to Q3_13 should be stacked in order to for a variable called Q3_01. In this example, these variables correspond to the Likelihood to recommend scores for the brands in our study.

Excel template for generating R Code

You can use the following Excel file to assist in generating your R Code: https://wiki.q-researchsoftware.com/images/a/a5/Stacking_code_generator.xlsx

To use this template:

  • Copy the list of variable names from the Variables and Questions tab. You need not copy every single variable, just the ones you wish to eventually stack. In the example data set, the variables Q1, Q2 and Rec_Age were omitted because these are not meant to be stacked. You can copy by selecting the cells in the Name column of the Variables and Questions tab, and pressing CTRL-C.
  • Paste them into Column C in the Stacking Code Generator tab.
  • In Columns D and on-going, rearrange your variable names as you would if you were using Tools > Stack SPSS file. In the worked example, this resulted in 13 observation columns, but you may not have that many columns (or you may have more).
  • Tip: use the “Transpose” paste feature in Excel (if you need to) and delete any unnecessary rows in the generator. Some stacking does not require transpose pasting.
  • Column A (the blue column) automatically generates some code for each resultant stacked variable. Copy all the cells in the blue column and paste it into the appropriate spot in the R Code template.

Important notes:

  • When pasting the code, be sure to remove the last comma before closing the round brackets.
  • Be careful using Excel to copy, paste, delete and transpose data. All of the above requires some common-sense using Excel.

Tidy the metadata

R data sets do not currently have the ability to include metadata, like variable labels and value labels. You will need to input this manually in the Variables and Questions tab for the stacked data set. You only need to do this once - updating the data set later on will preserve the metadata you have specified. See Setting Up Files With No Metadata for the typical workflow.

Data matching

This can be very handy if you ever want to cross-tabulate data between the unstacked datafile and the stacked data set. See Data File Relationship for more on creating relationships between data sets. You will create a One-to-Many relationship from the unstacked data set to the stacked one.

In the example data set, RESPNUM is used as the linking variable (ie: the Case ID variable). If you wanted to cross-tabulate between the stacked version of question 4 and the age of respondents (variable Rec_Age), having a One-to-Many datafile relationship will permit this (without needing to stack Rec_Age).

Updating

When the time comes to update your Q project with a newer version of the stacked data, you need to:

  1. Determine the file path or URL for your new unstacked file.
  2. Open your Q Project.
  3. Select File > Data Sets > Update and chose your stacked data set.
  4. (Optional) If the file path or URL has changed, modify your code to refer to the new file.
  5. Push the play button again. The new data set will be calculated, and when it is finished the new data will appear in the preview.
  6. Select Update Data Set. The old stacked data will now be replaced with the new version of the stacked data.


See Also

  1. Stacking a single data set in Q using the built-in feature Stacking Data Files
  2. Bringing in a data set using R (see R Data Sets)
  3. (recommended) setting up a data file relationship (see Data File Relationship)