Filtering - Filters for Train-Test Split
Create 2 new filters based on a random 70%/30% split of the selected data
This QScript creates 2 new filters based on a random 70%/30% split of the selected data. These filters can then be applied to predictive models in order to separate a training data set from a test data set. The QScript can be amended to adjust the split ratio.
Example
The result of running this script is shown below. The first 2 variables are the new filters created.
Technical details
The value of trainPercentage in the QScript code below controls the split ratio. The default of 70 means that 70% of the data (rounded to the nearest whole number of instances) is selected as part of the Training split and the remaining 30% is selected by the Testing split filter.
By adjusting this value as described below in Customizing the QScript the percentages in the training and testing filters can be controlled.
How to apply this QScript
- Start typing the name of the QScript into the Search features and data box in the top right of the Q window.
- Click on the QScript when it appears in the QScripts and Rules section of the search results.
OR
- Select Automate > Browse Online Library.
- Select this QScript from the list.
Customizing the QScript
This QScript is written in JavaScript and can be customized by copying and modifying the JavaScript.
Customizing QScripts in Q4.11 and more recent versions
- Start typing the name of the QScript into the Search features and data box in the top right of the Q window.
- Hover your mouse over the QScript when it appears in the QScripts and Rules section of the search results.
- Press Edit a Copy (bottom-left corner of the preview).
- Modify the JavaScript (see QScripts for more detail on this).
- Either:
- Run the QScript, by pressing the blue triangle button.
- Save the QScript and run it at a later time, using Automate > Run QScript (Macro) from File.
Customizing QScripts in older versions
JavaScript
// This script creates 2 new filters based upon a random split of the data.
includeWeb("QScript Selection Functions");
includeWeb("QScript Functions to Generate Outputs");
if (!main())
log("QScript cancelled.");
else
conditionallyEmptyLog("QScript finished.");
function main() {
// Set percentage of data used for training set
var trainPercentage = prompt("What percentage of the data set should be used as the training set?", 70);
if (trainPercentage < 0 || trainPercentage > 100) {
log("Invalid split. Please ensure that trainPercentage is between 0 and 100.");
return false;
}
// Get the data
var dataFile;
var selected_questions = project.report.selectedQuestions();
if (selected_questions.length > 0)
dataFile = project.report.selectedQuestions()[0].dataFile;
else
dataFile = dataFileSelection()[0];
// Create a training filter based on a random sample
var RText = "percentage <- " + trainPercentage + " # Change this number to change the percentage in the training sample\n" +
"set.seed(123) # This ensures that the randomization is identical each time\n" +
"n <- " + dataFile.totalN + " # This is the total sample size\n" +
"indices <- sample.int(n, round(percentage * n / 100))\n" +
"filter <- rep(0, n)\n" +
"filter[indices] <- 1\n" +
"filter";
var new_q_name = preventDuplicateQuestionName(dataFile, "Training sample");
try {
var train = dataFile.newRVariable(RText, preventDuplicateVariableName(dataFile, "training"), new_q_name, null);
} catch (e) {
log("Could not create train filter: " + e);
return false;
}
train.needsCheck = false;
// Create testing filter of the data not selected by the training filter
RText = "as.numeric(!`" + dataFile.name + "`$Variables$" + train.name + ")"; // backticks allow hyphen in dataFile.fileName
try {
var test = dataFile.newRVariable(RText, preventDuplicateVariableName(dataFile, "testing"), "Testing sample", null);
} catch (e)
{
log("Could not create test filter: " + e);
return false;
}
test.needsCheck = false;
// Combine the 2 new variables into a Pick-Any question
trainTest = dataFile.setQuestion(preventDuplicateQuestionName(dataFile, "Train test split"), "Pick Any", [train, test]);
var suffix = trainTest.name.replace(/^Train test split/, "");
trainTest.variables[0].label = "Training sample" + suffix;
trainTest.variables[1].label = "Testing sample" + suffix;
trainTest.needsCheckValuesToCount = false;
trainTest.isFilter = true;
insertAtHoverButtonIfShown(trainTest);
reportNewRQuestion(trainTest, "Filter");
return true;
}
See also
- QScript for more general information about QScripts.
- QScript Examples Library for other examples.
- Online JavaScript Libraries for the libraries of functions that can be used when writing QScripts.
- QScript Reference for information about how QScript can manipulate the different elements of a project.
- JavaScript for information about the JavaScript programming language.
- Table JavaScript and Plot JavaScript for tools for using JavaScript to modify the appearance of tables and charts.
Displayr - Anything Menu
Displayr - Filtering
Displayr - New Variable Menu
Q Technical Reference
Q Technical Reference
Q Technical Reference > Setting Up Data > Creating New Variables
Q Technical Reference > Updating and Automation > Automation Online Library
Q Technical Reference > Updating and Automation > JavaScript > QScript > QScript Examples Library > QScript Online Library