How non-cognitive skills are measured
iMentor measures non-cognitive skills using methods commonly used in the social sciences. To measure non-cognitive skills, researchers develop a series of questions called a scale, collected from study participants through a survey. The scale represents multiple aspects of the overarching non-cognitive skill, and often asks similar questions to ensure responses are consistent throughout the survey. Researchers test these scales to determine whether the questions make sense with one another, are reliable, and valid. A scale is deemed to be reliable when it produces similar results in similar survey administrations, testing environments, and populations. A scale is valid when it is determined that the scale measures what it is intended to measure. Each scale is tested for reliability and validity on thousands of respondents over multiple years. The beginning and end of year surveys contain scales for all six of the non-cognitive skills targeted in our short-term outcomes.
At each survey administration students get a scale score for each non-cognitive skill. The score is an aggregation of students’ responses to the questions about that non-cognitive skill. Student responses are converted from a text response (i.e., strongly agree, sort of agree) to a numeric one (i.e., 4, 3). These responses are aggregated and expressed as a mean or a sum that represents that non-cognitive skill. To get a scale score for a non-cognitive skill, a student must answer at least half of the questions in that skill’s scale.
Click on the links below to jump to a specific section.
- Assessing Change
- The Importance of Sample Size
- Interpreting Results
- Negative Change or No Change in Non-Cognitive Skill Growth
Having a large amount of beginning and end of year survey data is crucial for assessing whether there was a change across the program. Non-cognitive skill development can only be assessed for students who have scale scores at both time points. Beginning and end of year scale scores are tested using a paired-samples t-test. This statistical analysis determines whether there has been a meaningful (statistically significant) change in the group average for that non-cognitive skill from the beginning of the year to the end of the year. When a result is statistically significant, it means that there is only small likelihood that the change was the result of chance. You can then infer that the change was the result of something students experienced over the course of the year.
Change in a non-cognitive skill can only be assessed at the group level. The minimum recommended sample size for an independent samples t-test is 30 respondents. However, small sample sizes lead to volatile results. Small samples can be easily biased or influences by a confounding variable. Larger sample sizes can minimize the impact of those biases on the overall analysis.
An independent samples t-test used in this analysis can have one of three results: (1) no statistically significant change in a non-cognitive skill, (2) a positive statistically significant change in a non-cognitive skill, or (3) a negative statistically significant change in a non-cognitive skill. A result using these methods is not necessarily evidence that there was an impact (or no impact). A very important caveat to these analyses is that there is no comparison group. Without a comparison group, changes in a non-cognitive skill could have other explanations such as students’ natural development, factors at play in the school, or other college success organizations in the school.
When programmatic impact on non-cognitive skill, it is important to remember that these scales and statistical methods are intended to assess change in the entire group. These tools are not designed to assess individuals. Large sample sizes are critical for reducing measurement bias in non-cognitive skill assessment. Attempting to assess an individual on non-cognitive skills using these methods reintroduces the potential for that bias. Therefore, it is not methodologically valid to state that an individual student can show growth in a non-cognitive skill based on these statistical methods.
When a t-test detects negative statistically significant change or no statistically significant change in a non-cognitive skill, that does not necessarily mean that the program is ineffective. There are several potential explanations for this result.
Small sample size
It may not be possible to detect an effect on a non-cognitive skill because the sample size used in the measurement is too small. This caveat is particularly relevant for small programs and smaller demographic groups in our analyses.
Skill not emphasized in programming
Different programs, program managers, or grade levels may emphasize specific non-cognitive skills to differing degrees. If a skill isn’t explicitly called out in programming, the curriculum, or emphasized by a mentor then it may not be reasonable to expect a positive growth in that skill that year.
Duration of programming
Some skills may take more than one year of programming to effectively develop in a student. Research conducted by iMentor’s Research and Evaluation team has found that certain non-cognitive skills develop at different rates. Longitudinal growth of non-cognitive skills may get lost when looking at results within one program year.
While there may not be growth in a non-cognitive skill, Research conducted by iMentor’s Research and Evaluation team has found differing impacts on non-cognitive skills based on program engagement. For some non-cognitive skills, pairs who do not meet participation benchmarks show a negative change in that skill while pairs above participation benchmarks show no change. This pattern suggests that students who are above these benchmarks may experience some kind of protective effect from having a mentor.
Adolescent development and non-cognitive skills
Certain non-cognitive skills may experience declines or flat growth during certain periods of adolescence. In essence, it may be harder to impact a non-cognitive skill given the natural changes happening in this population at this time.
It is also possible that the measurement tools or the programming has reached the maximum detectable effect on a particular non-cognitive skill. Research conducted by iMentor’s Research and Evaluation team has found that there may be a ceiling effect in several of the skills covered in the iMentor program.
When a result of a non-cognitive skill assessment comes back as non-significant or as negative, we do not immediately interpret that result as a failure or non-impact of the program. The reasons documented above provide several potential explanations for these results that must be explored during any debrief of a non-cognitive skill analysis.