How to Randomly Select a Sub-Sample

From Q
Jump to navigation Jump to search

Sampling from all respondents

  • Create a new JavaScript Variable.
  • In the dialogue that appears, select Access all data rows (advanced) and paste the code below into the Expression field.
  • In the code, on the first line, change the value assigned to required_sample_size to your required sample size.
  • Click OK to create the variable.
  • On the Data_tab, ensure that your unique ID variable has been selected in the Case IDs drop-down in the top-left.
  • On the Variables and Questions tab, right-click on the new variable and select Copy and Paste Variable(s) > As Values.... This fixes the data that has been selected by the random sampling formula.
  • Hide the original JavaScript variable by selecting the yellow H in the Tags column.
  • Use the new variable as a Filter by selecting the yellow F in the Tags column.
var required_sample_size = 200;

//generating an array of random numbers
var rnd = new Array(N);
var orig_rnd = new Array(N);
for (var i = 0; i < N; i++){
    r = Math.random();
    rnd[i] = r;
    orig_rnd[i] = r;
}
//Finding the cut-off
rnd.sort();
var cutoff = rnd[required_sample_size];
//creating filter variable
var result = new Array(N);
for (var i = 0; i < N; i++)
  result[i] = orig_rnd[i] < cutoff;
result

It is important to note that this code samples without replacement.

In the above code, N is a reserved variable which gives the number of cases in the data file.

Sampling from Groups or Segments

Sometimes you might need to take a random sample from two or more groups/segments of respondents. The code below takes a random sample of 15 people from Group 1 and 5 people from Group 2. The Group variable is the variable that segments out the respondents and values 1 and 2 denote Group 1 and Group 2 for sampling. If you have more than 2 groups you can scale the code up by adding new variables in the #CHANGE sections for each new group and copying the GROUP 2 CODE section and replacing the 2s with 3, 4, 5, etc...

  • Create a new JavaScript Variable.
  • In the dialogue that appears, select Access all data rows (advanced) and paste the code below into the Expression field.
  • Find #Change in the code a follow the instructions on modifying it for your specific use case.
  • Click OK to create the variable.
  • On the Data_tab, ensure that your unique ID variable has been selected in the Case IDs drop-down in the top-left.
  • On the Variables and Questions tab, right-click on the new variable and select Copy and Paste Variable(s) > As Values.... This fixes the data that has been selected by the random sampling formula.
  • Hide the original JavaScript variable by selecting the yellow H in the Tags column.
  • Use the new variable as a Filter by selecting the yellow F in the Tags column.
//////////////////////////////////
// Look for #CHANGE to see what inputs you need to change for your project
/////////////////////////////////
// #CHANGE replace group with the variable to reference for the sub-groups
var gp = group;

// #CHANGE replace 1 and 2 with the value denoting each group from your group variable above
var gp1val = 1;
var gp2val = 2;

// #CHANGE Select sample sizes for the two groups
var required_sample_size_gp1 = 15;
var required_sample_size_gp2 = 5;

//////////////////////////////////
// Identify respondents in each group
/////////////////////////////////
//create a function to find all rows of the table for each group
function getAllIndexes(arr, val) {
    var indexes = [], i = -1;
    while ((i = arr.indexOf(val, i+1)) != -1){
        indexes.push(i);
    }
    return indexes;
}

//getAllIndexes below to find respondents to sample from for each group
//#CHANGE if you're sampling more than 2 groups add another similar line of code below for each additional group
var gp1rows = getAllIndexes(gp, gp1val);
var gp2rows = getAllIndexes(gp, gp2val);

//////////////////////////////////
// GROUP 1 CODE
/////////////////////////////////

//Generate random numbers to pick from the rows matching group 1
var rnd1 = new Array(gp1rows.length);
var orig_rnd1 = new Array(gp1rows.length);
for (var i = 0; i < gp1rows.length; i++){
    r = Math.random();
    rnd1[i] = r;
    orig_rnd1[i] = r;
}

//Cut-off the random numbers at the sample size we want
rnd1.sort();
var cutoff = rnd1[required_sample_size_gp1];

//Pull out the rows for Group 1 that are picked by the random numbers
var result1 = new Array(required_sample_size_gp1);
for (var i = 0; i < gp1rows.length; i++) {
    if(orig_rnd1[i] < cutoff) 
        result1.push(gp1rows[i]);
}

//Make final result change to 1 for group 1 sample
var finalresult = new Array(N).fill(0);
for (var i = 0; i < result1.length; i++){
    finalresult[result1[i]] = 1;
}


//////////////////////////////////
// GROUP 2 CODE
/////////////////////////////////

//Generate random numbers to pick from the rows matching group 2
var rnd2 = new Array(gp2rows.length);
var orig_rnd2 = new Array(gp2rows.length);
for (var i = 0; i < gp2rows.length; i++){
    r = Math.random();
    rnd2[i] = r;
    orig_rnd2[i] = r;
}

//Cut-off the random numbers at the sample size we want for group 2
rnd2.sort();
var cutoff = rnd2[required_sample_size_gp2];

//Pull out the rows for Group 2 that are picked by the random numbers
var result2 = new Array(required_sample_size_gp2);
for (var i = 0; i < gp2rows.length; i++) {
    if(orig_rnd2[i] < cutoff) 
        result2.push(gp2rows[i]);
}

//Make final result change to 1 for group 2 sample
for (var i = 0; i < result2.length; i++){
    finalresult[result2[i]] = 1;
}

//////////////////////////////////
// Return the final list of samples from each group
/////////////////////////////////
finalresult​

See Also