Stacking Data Files
Related Online Training modules | |
---|---|
Stacking data | |
Generally it is best to access online training from within Q by selecting Help > Online Training |
A data file is “stacked” when a single respondent’s data appears as multiple cases (i.e., multiple rows in the Data tab). Most commonly, this is because the respondent has provided data about multiple occasions (where each occasion is in a separate line) or about all the members of their household (where each household member is in a separate line).
Q can convert an un-stacked SPSS data file into a stacked SPSS data file, which can then be imported and analyzed within Q in the standard way.
Overview of the process
Stacking only works with SPSS data files (any file can be converted to an SPSS data file by selecting Tools | Save Data as SPSS/CSV file…. The basic process for stacking an SPSS data file is:
- Import the un-stacked SPSS data file into Q by starting a new Q project, then go to File > Data Sets > Add to Project > From File. When prompted, make sure to select Use original data file structure.
- In the Variables and Questions tab, order variables by dragging and dropping so that variables you want to stack are adjacent and in the right order. When done, use Tools > Save Data as SPSS/CSV File. Re-use (and modify) this file when new data is obtained for trackers.
- Select Tools and Stack SPSS .sav File….
- Drag and drop variable names until the file is as you desire. This is discussed in more detail in the online training tutorial.
- Delete any variables that you do not need. This is done by dragging variable names into the Omit box on the left side of the dialog box. Stacked data files are generally much, much larger than unstacked data files because they repeat data for many variables. This can slow down Q and, in some cases, make it prone to crashing (if your computer has insufficient memory). Consequently, the more variables that you can omit, the better.
- Set any missing values. For variables that cannot be stacked, Q will, by default, stack copies of a respondent’s values on top of each other. If you right-click and select Set as Missing, Q will replace all but the first observation for a respondent with missing values (
NaN
). - Revise any variable names and labels, as required, using Override Name... and Override Label....
Worked example
A worked example of the process of stacking a data file is available in the Online Training tutorial Stacking data.
Stack SPSS File Dialog
Use this dialog to convert the loaded SPSS file to a new stacked data file.
The new data file will contain two new variables:
- original_case will record the case number from your original data file.
- observation will be 1 for the first column of stacked variables, 2 for the second column, and so on.
Omit Variables dragged into this list will not be included in the output data. e.g. Variables not need for the stacked analysis.
Output file structure Drag variables beside each other to stack them. Start by selecting one or more variables and then click and drag the selection to the right of the variables you want to stack them with. Variables that are only present in the first column will be automatically repeated in subsequent columns.
Another Example
If your original SPSS data file looked like this:
respid | age1 | sex1 | age2 | sex2 |
---|---|---|---|---|
123 | 17 | M | 12 | F |
456 | 5 | M | 3 | M |
then you might stack your variables like this:
Observation 1 | Observation 2 |
---|---|
respid | respid |
age1 | age2 |
sex1 | sex2 |
and then your output data file would look like:
respid | age | sex | original_case | observation |
---|---|---|---|---|
123 | 17 | M | 1 | 1 |
123 | 12 | F | 1 | 2 |
456 | 5 | M | 2 | 1 |
456 | 3 | M | 2 | 2 |
Tips
- Stacking can produce very large data files. Omit variables that are not required in your stacked analysis.
- Q will try to intelligently choose new variable names and labels for the stacked variables. Right click on the labels to override them.
- If you drag multiple groups of variables together then Q will try to arrange them side by side.
- If you do not want to automatically repeat a variable then right click on the cell you want to blank and select Set as Missing.
Advanced and automated stacking using R
Sometimes the structure of the data file is too complex for this approach described above to work. Or, there may be a need to repeat the stacking regularly. In such cases, the stacking can be performed using R. The basic workflow for this is:
- File > Data Sets > Add to Project > From R
- Write code that imports the data file. For example, if it is an SPSS data file, using foreign::read.spss.
- Write code to restructure the data file, for example:
- The stack function (see R help on stack).
- The gather function in tidyr.
- The reshape function in dplyr.
- The melt function in data.table.
For a worked example of stacking using R, see How to Automatically Stack a Data Set.
See stackoverflow for a worked example of most of these.