How to Randomly Select a Sub-Sample
Jump to navigation
Jump to search
Pages with syntax highlighting errors
Pages with syntax highlighting errors
Q Technical Reference
Q Technical Reference
Q Technical Reference > Setting Up Data > Creating New Variables
Q Technical Reference > Troubleshooting
Q Technical Reference > Updating and Automation > JavaScript
User Interface > JavaScript Variables > JavaScript Variables Examples Library
Sampling from all respondents
- Create a new JavaScript Variable.
- In the dialogue that appears, select Access all data rows (advanced) and paste the code below into the Expression field.
- In the code, on the first line, change the value assigned to required_sample_size to your required sample size.
- Click OK to create the variable.
- On the Data_tab, ensure that your unique ID variable has been selected in the Case IDs drop-down in the top-left.
- On the Variables and Questions tab, right-click on the new variable and select Copy and Paste Variable(s) > As Values.... This fixes the data that has been selected by the random sampling formula.
- Hide the original JavaScript variable by selecting the yellow H in the Tags column.
- Use the new variable as a Filter by selecting the yellow F in the Tags column.
var required_sample_size = 200;
//generating an array of random numbers
var rnd = new Array(N);
var orig_rnd = new Array(N);
for (var i = 0; i < N; i++){
r = Math.random();
rnd[i] = r;
orig_rnd[i] = r;
}
//Finding the cut-off
rnd.sort();
var cutoff = rnd[required_sample_size];
//creating filter variable
var result = new Array(N);
for (var i = 0; i < N; i++)
result[i] = orig_rnd[i] < cutoff;
result
It is important to note that this code samples without replacement.
In the above code, N is a reserved variable which gives the number of cases in the data file.
Sampling from Groups or Segments
Sometimes you might need to take a random sample from two or more groups/segments of respondents. The code below takes a random sample of 15 people from Group 1 and 5 people from Group 2. The Group variable is the variable that segments out the respondents and values 1 and 2 denote Group 1 and Group 2 for sampling. If you have more than 2 groups you can scale the code up by adding new variables in the #CHANGE sections for each new group and copying the GROUP 2 CODE section and replacing the 2s with 3, 4, 5, etc...
- Create a new JavaScript Variable.
- In the dialogue that appears, select Access all data rows (advanced) and paste the code below into the Expression field.
- Find #Change in the code a follow the instructions on modifying it for your specific use case.
- Click OK to create the variable.
- On the Data_tab, ensure that your unique ID variable has been selected in the Case IDs drop-down in the top-left.
- On the Variables and Questions tab, right-click on the new variable and select Copy and Paste Variable(s) > As Values.... This fixes the data that has been selected by the random sampling formula.
- Hide the original JavaScript variable by selecting the yellow H in the Tags column.
- Use the new variable as a Filter by selecting the yellow F in the Tags column.
//////////////////////////////////
// Look for #CHANGE to see what inputs you need to change for your project
/////////////////////////////////
// #CHANGE replace group with the variable to reference for the sub-groups
var gp = group;
// #CHANGE replace 1 and 2 with the value denoting each group from your group variable above
var gp1val = 1;
var gp2val = 2;
// #CHANGE Select sample sizes for the two groups
var required_sample_size_gp1 = 15;
var required_sample_size_gp2 = 5;
//////////////////////////////////
// Identify respondents in each group
/////////////////////////////////
//create a function to find all rows of the table for each group
function getAllIndexes(arr, val) {
var indexes = [], i = -1;
while ((i = arr.indexOf(val, i+1)) != -1){
indexes.push(i);
}
return indexes;
}
//getAllIndexes below to find respondents to sample from for each group
//#CHANGE if you're sampling more than 2 groups add another similar line of code below for each additional group
var gp1rows = getAllIndexes(gp, gp1val);
var gp2rows = getAllIndexes(gp, gp2val);
//////////////////////////////////
// GROUP 1 CODE
/////////////////////////////////
//Generate random numbers to pick from the rows matching group 1
var rnd1 = new Array(gp1rows.length);
var orig_rnd1 = new Array(gp1rows.length);
for (var i = 0; i < gp1rows.length; i++){
r = Math.random();
rnd1[i] = r;
orig_rnd1[i] = r;
}
//Cut-off the random numbers at the sample size we want
rnd1.sort();
var cutoff = rnd1[required_sample_size_gp1];
//Pull out the rows for Group 1 that are picked by the random numbers
var result1 = new Array(required_sample_size_gp1);
for (var i = 0; i < gp1rows.length; i++) {
if(orig_rnd1[i] < cutoff)
result1.push(gp1rows[i]);
}
//Make final result change to 1 for group 1 sample
var finalresult = new Array(N).fill(0);
for (var i = 0; i < result1.length; i++){
finalresult[result1[i]] = 1;
}
//////////////////////////////////
// GROUP 2 CODE
/////////////////////////////////
//Generate random numbers to pick from the rows matching group 2
var rnd2 = new Array(gp2rows.length);
var orig_rnd2 = new Array(gp2rows.length);
for (var i = 0; i < gp2rows.length; i++){
r = Math.random();
rnd2[i] = r;
orig_rnd2[i] = r;
}
//Cut-off the random numbers at the sample size we want for group 2
rnd2.sort();
var cutoff = rnd2[required_sample_size_gp2];
//Pull out the rows for Group 2 that are picked by the random numbers
var result2 = new Array(required_sample_size_gp2);
for (var i = 0; i < gp2rows.length; i++) {
if(orig_rnd2[i] < cutoff)
result2.push(gp2rows[i]);
}
//Make final result change to 1 for group 2 sample
for (var i = 0; i < result2.length; i++){
finalresult[result2[i]] = 1;
}
//////////////////////////////////
// Return the final list of samples from each group
/////////////////////////////////
finalresult
See Also
- Filtering - Filters for Train-Test Split for a QScript which will split your data set into two samples for training and testing machine learning methods.
- Filtering - Filters for Train-Validation-Test Split for a QScript which will split your data set into three samples for training, validating, and testing machine learning methods.
- JavaScript Variables for detail on how to create new variables in the Variables and Questions tab using JavaScript.
- JavaScript for information about the JavaScript programming language.
Pages with syntax highlighting errors
Pages with syntax highlighting errors
Q Technical Reference
Q Technical Reference
Q Technical Reference > Setting Up Data > Creating New Variables
Q Technical Reference > Troubleshooting
Q Technical Reference > Updating and Automation > JavaScript
User Interface > JavaScript Variables > JavaScript Variables Examples Library