Create New Variables - Binary Variable(s)

From Q
Jump to navigation Jump to search

Create a new binary variable from the selected variable

This QScripttransformation creates a binary Pick Any or Pick Any - GridBinary - Multi or Binary - Grid from user selected questionsvariable sets that contain either NumberNumeric or Categorical variables. It does this by reassigning which output values are counted with positive values counted for NumberNumeric variables. For Categorical variables, the top half of the labels are counted and it will search for positive labels to count in the top half. For example, if the categorical labels are "Yes" and "No", the "Yes" label will count. In larger categorical label structures it will count the positive statements in the top half. For example, in a 5 point scale of "Strongly Disagree, Disagree, Neither, Agree and Strongly Agree" the binary transformed questionvariable set will count "Agree" or "Strongly Agree" and not count the other labels. More details of the input questionvariable set types and output types are given in the Technical details section.

Example

In the example below the Pick One variables with ordinal scalesOrdinal variables are combined and transformed into a Pick AnyBinary - Multi. Note that the labels Strongly Agree and Agree are counted for the binary transformed output variable. E.g. For summary output shown below, the variable Like the look of phones has the percentages of 24% and 38% for Strongly agree and Agree respectively that combine into a single binary value of 62% in the output variable.

Binary Transform Example in QBinary Transform Example in Displayr

Technical details

You are required to select at least one questionvariable set of the types Number, Number - Multi, Number - Grid, Pick One or Pick One - Multi. The Pick One question can have either nominal or ordinal scales.Numeric, Numeric - Multi, Numeric - Grid, Nominal, Nominal - Multi, Ordinal, Ordinal - Multi or Binary - Grid. However, if multiple questionsvariable setsare selected in the transform, all of its variables must be the same type (either all Numeric or all Categorical). If Categorical, then they must have the same label structure. To create a new Categorical Binary output variable from a Numeric questionvariable set the following options are possible:

  1. Selecting one or more Number questions to produce a Pick Any output.
  2. Selecting one Number - Multi question to produce a Pick Any output.
  3. Selecting two or more Number - Multi questions to produce a Pick Any - Grid output.
  4. Selecting one and only one Number - Grid question to produce a Pick Any - Grid output.

To create a Categorical Binary output variable from a set of Categorical questions, the following options are possible (again with Displayr options shown in parantheses):

  1. Selecting one or more Pick One questions with nominal or ordinal scales, to produce a single Pick Any output.*
  2. Selecting one Pick Any question to produce a single Pick Any output.
  3. Selecting two or more Pick Any questions to produce a single Pick Any - Grid output.*
  1. Selecting one or more Numeric variable sets to produce a Binary - Multi single variable set.
  2. Selecting one Numeric - Multi variable set to produce a single Binary - Multi variable set.
  3. Selecting two or more Numeric - Multi variable sets to produce a single Binary - Grid variable set.
  4. Selecting one and only one Numeric - Grid variable set to produce a single Binary - Grid variable set.

To create a Categorical Binary output variable from a multiple Categorical variable sets, the following options are possible (again with Displayr options shown in parantheses):

  1. Selecting one or more Nominal or Ordinal variable sets to produce a Binary - Multi.*
  2. Selecting one Nominal - Multi or Ordinal - Multi to produce a Binary - Multi.
  3. Selecting two or more Nominal - Multi or Ordinal - Multi variable sets to produce a single Binary - Grid.*

*:If multiple categorical input questionsvariable sets are selected, then they must have the same label structure, with the same number of labels and in the same order.

How to apply this QScript

  • Start typing the name of the QScript into the Search features and data box in the top right of the Q window.
  • Click on the QScript when it appears in the QScripts and Rules section of the search results.

OR

  • Select Automate > Browse Online Library.
  • Select this QScript from the list.

Customizing the QScript

This QScript is written in JavaScript and can be customized by copying and modifying the JavaScript.

Customizing QScripts in Q4.11 and more recent versions

  • Start typing the name of the QScript into the Search features and data box in the top right of the Q window.
  • Hover your mouse over the QScript when it appears in the QScripts and Rules section of the search results.
  • Press Edit a Copy (bottom-left corner of the preview).
  • Modify the JavaScript (see QScripts for more detail on this).
  • Either:
    • Run the QScript, by pressing the blue triangle button.
    • Save the QScript and run it at a later time, using Automate > Run QScript (Macro) from File.

Customizing QScripts in older versions

  • Copy the JavaScript shown on this page.
  • Create a new text file, giving it a file extension of .QScript. See here for more information about how to do this.
  • Modify the JavaScript (see QScripts for more detail on this).
  • Run the file using Automate > Run QScript (Macro) from File.

JavaScript

includeWeb("QScript Utility Functions");
includeWeb("QScript Selection Functions");
includeWeb("QScript Functions to Generate Outputs");
includeWeb("QScript R Output Functions");

function isInArray(value, arr) {
  return arr.indexOf(value) > -1;
}

function recodeCountsInArray(question_or_variable, one_array, zero_array) {
    var values = question_or_variable.uniqueValues;
    var num_vals = values.length;
    var attributes = question_or_variable.valueAttributes;
    for (var j = 0; j < num_vals; j++) {
        if (isInArray(attributes.getValue(values[j]), one_array))
            attributes.setCountThisValue(values[j], true);
        else if(isInArray(attributes.getValue(values[j]), zero_array))
            attributes.setCountThisValue(values[j], false);
    }
}

function getVariableOrQuestionLabel(variable) {
    if (/- Multi|- Grid/.test(variable.question.variableSetStructure))
        return variable.question.name;
    if (variable.label.length > 0)
        return variable.label;
    else
        return variable.name
}

function variablesToBinary(data_file, variables, is_displayr, questions) {
    var suitable_for_grid = suitableForGrid(questions);
    var make_grid = (questions.length > 1 & suitable_for_grid) || (questions.length === 1 & suitable_for_grid & questions[0].questionType === "Number - Grid");
    if (variables[0].variableType === "Numeric") {
        var variable_labels = variables.map(function(v) {
            return v.label;
        });
        var duplicate_variable_labels = variable_labels.some(function(x) {
            return variable_labels.indexOf(x) !== variable_labels.lastIndexOf(x)
        });

        if (make_grid || duplicate_variable_labels) {
            variable_labels = variables.map(function(v, v_ind) {
                return(v.question.name + " - " + variable_labels[v_ind])
            });
        }

        var base_question_name = preventDuplicateQuestionName(data_file, variable_labels.filter(onlyUnique).join(" + "));
        var r_variable_name = variables.length === 1 ? "x" : "variable.set";
        var last_variable = getLastVariable(variables);
        var temp_var_name = randomVariableName(16); // temporary name, random to (almost) guarantee uniqueness
        var variable_names = variables.map(function(v) {
            return checkDuplicateVariable(v.name) ? generateDisambiguatedVariableName(v) : stringToRName(v.name);
        });

        // Simple assignment if single variable, otherwise data.frame
        var expression;
        if (variables.length === 1) {
            expression = r_variable_name + ' <- ' + variable_names + '\n';
        } else {
            var df_assignments = [];
            for (i = 0; i < variables.length; i += 1) {
                df_assignments[i] = stringToRName(variable_labels[i]) + " = " +  variable_names[i];
            }
            var def_prefix = r_variable_name + ' <- data.frame(';
            var white_spaces = " ".repeat(def_prefix.length);
            expression = def_prefix + df_assignments.join(",\n" + white_spaces) + ',\n' + white_spaces + 'check.names = FALSE)\n';
        }

        expression += r_variable_name + " > 0\n" +
                  "# If you wish to change the cut-off for the count (from > 0), modify the code above\n" + 
                  "# E.g. To count values larger than 50, change > 0 to > 50\n" +
                  "# E.g. To count values smaller than or equal to 25, change > 0 to <= 25\n";

        try {
            var question = data_file.newRQuestion(expression, base_question_name, temp_var_name, last_variable);
            question.questionType = make_grid ? "Pick Any - Grid" : "Pick Any";
            question.name = preventDuplicateQuestionName(data_file, variables.map(function(x) {
                return(getVariableOrQuestionLabel(x))
            }).filter(onlyUnique).join(" + ") + " > 0");
            question.needsCheckValuesToCount = false;
            insertAtHoverButtonIfShown(question);
        } catch (e) {
            var structure_name = getVariableNaming(is_displayr);
            log("The binary transform could not be computed for this " + structure_name + ": " + e);
            return false;
        }
        // Replace temporary variable names
        nameSequentialVariables(question.variables, "binary");
    } else {
        var question_name = variables.map(function(v) {
            return getVariableOrQuestionLabel(v);
        }).filter(onlyUnique).join(" + ");
        
        var new_variables = [];
        var mult_qs = questions.length > 1;
        questions.forEach(function (q) {
            q.variables.forEach(function (v) {
						var q_below = data_file.getVariableByName(v.name);
						var new_linked = preventDuplicateVariableName(data_file, v.name) 
						var new_var = data_file.newJavaScriptVariable(v.name, false, new_linked, v.name, q_below);
						if (make_grid)
							new_var.label = q.name + " - " + v.label;
						else if (mult_qs && q.variables.length > 1)
							new_var.label = q.name + " " + v.label;
						else
							new_var.label = v.label;
						v.label;
						new_var.variableType = "Categorical";
						v.uniqueValues.forEach(function(val) {
							new_var.valueAttributes.setLabel(val, v.valueAttributes.getLabel(val));
							new_var.valueAttributes.setIsMissingData(val, v.valueAttributes.getIsMissingData(val));
							new_var.valueAttributes.setValue(val, v.valueAttributes.getValue(val));
						})
						new_variables.push(new_var);
            });
        });
       
        var output_type = make_grid ? "Pick Any - Grid" : "Pick Any";
        var new_question_name = preventDuplicateQuestionName(data_file, question_name);
        var question = data_file.setQuestion(new_question_name, output_type, new_variables);
        insertAtHoverButtonIfShown(question);
        var values = new_variables[0].uniqueValues;
        var attributes = new_variables[0].valueAttributes;
        var k = values.filter(function (x) {
            return !isDontKnow(attributes.getLabel(x)) && !isNaN(attributes.getValue(x)) && !attributes.getIsMissingData(x);
        }).length;
        var top_k = Math.floor(k / 2);
        var one_array = getTopOrBottomKNonMissingValues(question, top_k, false, {excludeDK: true});
        if (one_array == null)
            return false;
        
        var one_labs = getLabelsForValues(question, one_array);
        if (one_labs.some(function(x) {
            return /disagree|dislike|hate|dont|don't|^no$|^not|unhappy|unsatisfied|dissatisfied/.test(x.toLowerCase());
        })) {
            one_array = getTopOrBottomKNonMissingValues(question, top_k, true, {excludeDK: true});
            if (one_array == null)
                return false;
            one_labs = getLabelsForValues(question, one_array);
        }

        var zero_array = values.filter(function (x) {
            return isDontKnow(attributes.getLabel(x)) || (one_array.indexOf(x) < 0 && !isNaN(attributes.getValue(x)) && !attributes.getIsMissingData(x));  
        });
        var trailing_name = one_labs[0];
        for (var j = 1; j < one_labs.length; j++){
            trailing_name += " + " + one_labs[j];
        }

        recodeCountsInArray(question, one_array, zero_array);
        question.name = preventDuplicateQuestionName(data_file, question_name + " : " + trailing_name);
    }
    reportNewRQuestion(question, "Binary transformed question");
}


// check the Variable Set 
checkStructureAndLabels = function(questions, structure_name, is_displayr) {
    // Check all same type
    var variable_set_structures = questions.map(function(x){return(x.variableSetStructure)});
    // Labels unimportant for Numeric but need to be checked for Categorical
    if (!/^Numeric/.test(variable_set_structures[0])) {
        // Check labels
        var all_variables = getVariablesFromQuestions(questions);
        var all_labels = all_variables.map(function(x) {return(variableToLabels(x))});
        
        // Check lengths
        if (!all_labels.every(function(x) {return(x.length === all_labels[0].length)})) {
            userFeedback(all_variables, all_labels, "length", getVariableNaming(is_displayr));
            return false;
        }
        // Check equal elements in same order
        if (!all_labels.every(function (label_array) { return arraysEqual(label_array, all_labels[0]); }) ) {
            userFeedback(all_variables, all_labels, "not all equal", getVariableNaming(is_displayr));
            return false;
        }
    }
    return true;
}

userFeedback = function(all_variables, variable_labels, error_type, structure_name) {
    var idx = [];
    if (error_type === "length") {
        variable_labels.some(function(x, x_index) {
            if(x.length !== variable_labels[0].length){
                idx = x_index;
                return true;
            }
        });
        var pre_message = "The length of the labels should be the same for all selected variables. " + 
            "However, the selected variables don't have the same label lengths. ";
        var post_message = " If a label was miscoded consider excluding it from analysis before running the binary transform again.";
    } else {
        variable_labels.some(function(x, x_index) {
            if (!arraysEqual(x, variable_labels[0])) {
                idx = x_index;
                return true;
            }
        });
        var pre_message = "The labels from these " + structure_name + " do not match, and so the questions cannot be combined. ";
        var post_message = " Note that the order of the labels need to match for all selected questions the transform to occur.";
    }
    log(pre_message + "For example, the variable '" + getVariableOrQuestionLabel(all_variables[0]) + "' has " + variable_labels[0].length + 
            " labels :" + printTypesString(variable_labels[0], " and ") + " while the variable '" + getVariableOrQuestionLabel(all_variables[idx]) +
            "' has " + variable_labels[idx].length + " labels :" + printTypesString(variable_labels[idx], " and ") + "." +
        post_message);
}

getVariableNaming = function(is_displayr) {
    return is_displayr ? "variable sets" : "questions";
}

differentTypeFeedback = function(variable_feedback, is_displayr, mixed_message) {
    var structure_name = getVariableNaming(is_displayr);
    var transformation_name = is_displayr ? "transformation" : "QScript";
    var first_var = variable_feedback[0];
    var remaining_vars = variable_feedback.filter(function(x){
        return x !== first_var;
    }).filter(onlyUnique);
    log("The selected " + structure_name + " include " + printTypesString([first_var, remaining_vars], " and ")  + 
         ". These cannot be combined into a Binary output " + structure_name.slice(0, -1) + " with this " + transformation_name + mixed_message);
}

suitableForGrid = function(questions) {
    // Check each question has the same number of variables
    var qvar_names = questions.map(function(q) {return(q.variables.map(function(v) {return(v.label)}))});
    for (var i = 1; i < questions.length; i++) {
        if (qvar_names[i].length != qvar_names[0].length)
            return false;
        if (!arraysEqual(qvar_names[i], qvar_names[0]))
            return false;
    }
    return true;
}


if (!main())
    log("QScript cancelled.");
else
    conditionallyEmptyLog("QScript finished.");

function main() {
    var is_displayr = (!!Q.isOnTheWeb && Q.isOnTheWeb());
    var allowed_types = ["Nominal", "Nominal - Multi", "Numeric", "Numeric - Multi", "Numeric - Grid", "Numeric - Multi", "Ordinal", "Ordinal - Multi"];

    var selected_questions = selectInputQuestions(allowed_types);
    if (!selected_questions)
        return false;
    var data_file = getDataFileFromQuestions(selected_questions);

    // Grab all base variables from all selected items
    var all_variables = getVariablesFromQuestions(selected_questions);
    var variable_set_structures = selected_questions.map(function(x){return(x.variableSetStructure)});
    var variable_feedback = is_displayr ? variable_set_structures : selected_questions.map(function(x){return(x.questionType)});
        var mixed_message = ". The selected variable sets should also have the same structure. E.g. all Numeric variables or all Categorical " +
    "(mixing Ordinal and Nominal categorical variables is permissible for this transform so long as the label structure is the same).";

    // If only one question selected, do the Binary transform.
    if (selected_questions.length === 1) {
        variablesToBinary(data_file, all_variables, is_displayr, selected_questions);
        return true;
    } else if (variable_set_structures.every(function(x) {
        return /Numeric/.test(x);
    }) || variable_set_structures.every(function(x) {
        return /(Nominal|Ordinal)/.test(x);
    })) {
        if (!checkStructureAndLabels(selected_questions))
            return false;
        variablesToBinary(data_file, all_variables, is_displayr, selected_questions);
        return true;
    } else {
        differentTypeFeedback(variable_feedback, is_displayr, mixed_message);
        return false;
    }
}

See also