Is the Survey Valid and Reliable?

Research

How do we know that the Practicing Faith Survey is valid and reliable? Does it really assess the things it claims to assess? This page explains the grounds for trusting the tool. It is more technical than other parts of this website, but we include it so that those who are interested can judge the grounds for confidence in this resource.

The Practicing Faith Survey (PFS) is designed to assess student faith-formation with respect to what they do, not just what they say they believe. It was developed in six stages, following the recommendations by Gehlbach and Brinkworth (2011). These steps offer a thorough set of procedures for developing a new assessment tool.

1. Literature Review

We began with a review of related scholarly work. This included recent literature on assessment, faith formation, and vocation, but also a wide range of Christian sources on Christian learning practices stretching from recent popular titles back to early church fathers. This allowed us to determine whether similar instruments already existed and to clarify the outcomes we intended to measure.

Once our five dimensions of practice were established, we engaged in additional literature searches to hone what each comprised. For instance, as we considered intellectual practices, we considered articles on intellectual humility and intellectual virtues. Such literature reviews for each of the five dimensions provided additional conceptual clarity.
2. Drafting

In the second step, we drafted items to measure each of the five dimensions of Christian practice. We paid careful attention to the readability of the items, recognizing the age of the students who would answer the questions. This yielded an item bank of 149 items from which we would pare down.
3. Focus Groups

As we began to reduce the number of items, we held 17 student focus groups in two U.S. Christian schools. Students from grades 5 through 12 responded to a subset of the items and a member of the development team asked a series of follow-up questions to assess their comprehension of the items and their thought-process for providing a response. We also hosted a focus group of Christian school faculty to gather their reactions to the original items and their impressions of how the survey could be implemented. After these focus groups, we removed or revised survey items that were difficult to comprehend or interpreted by students in unintended ways.
4. Expert Panel

We then convened an expert panel to review the items. A group of 10 individuals, including university professors from psychology, sociology, political science, and education as well as schoolteachers, systematically reviewed each item. During the processes, individuals provided an assessment of each item’s readability and conceptual clarity. Based on their feedback, we made further revisions and deletions. Based on the expert panel and the student focus groups, we reduced the number of items to 56 across the five dimensions. At this point, we were ready to pilot the instrument.
5. Pilot

During our pilot phase, we administered the survey to about 1,226 fifth- through twelfth-grade students across 8 Christian schools throughout the U.S. After securing parental consent for students to complete the survey and for the development team to analyze their responses for diagnostic purposes, schools administered electronic versions of the survey. Students took one of nearly 25 versions of the survey that included alternate wordings of items and different combinations of items to measure a variety of different outcomes.
6. Diagnostic Data Analysis

We conducted four different analyses to provide validity evidence for the survey. The goal here is to provide some empirical evidence that the survey measures the five dimensions we intended it to measure with fidelity and that it does so reliably. Here we only outline the procedures and sketch the results.
1. 1. Structural Validity
  
  We first tested the structural validity of the PFS. Structural validity refers to the extent to which the items constituting each dimension measure one underlying factor. It would be problematic if the items within, say, relational practices measured aspects of introspective practices. The five dimensions may overlap in some ways, but we should observe items within a given dimension mostly capturing aspects of that dimension rather than others. A procedure called factor analysis is used to assess structural validity. This analysis confirmed the structural validity of the Practicing Faith Survey.
2. 2. Convergent Validity
  
  We then examined correlations between measures of the five dimensions of our survey and other measures of related outcomes that have already been established in existing research. If the new measures the intended dimensions with fidelity, then those measures should be correlated with previous measures of related outcomes. For instance, the measure of relational practices on the PFS should exhibit some correlation with measures of prosocial behavior, compassion, or empathy. Testing such correlations provides evidence of convergent validity. In fact, we found ample evidence for convergent validity in the Practicing Faith Survey measures.
3. 3. Discriminant Validity
  
  Next, we similarly examined correlations between measures of the five dimensions of our survey with other measures that should have no theoretical relationship to them. Logically, if the survey measures the intended dimensions with fidelity, then those measures should not turn out to be correlated with unrelated measures. For example, measures of relational practices should not exhibit correlations with measures of curiosity or openness to revise one’s views. Finding evidence of a lack of correlation between measures on the PFS and other conceptually unrelated measures provides evidence of discriminant validity. As was the case with convergent validity, we found good evidence of discriminant validity.
4. 4. Reliability
  
  The final diagnostic test assessed whether the survey measures the five dimensions of Christian practice in a reliable way. That is to say, each item that constitutes a measure of a particular dimension of Christian practice should be measuring the intended dimension. In practice, this means that student responses to a set of items that are all designed to measure the same dimension should be consistent and correlated with one another. Reliability is assessed using a measure called Cronbach’s alpha, which is a number that ranges from 0 to 1. Scales should have a Cronbach’s alpha level of at least 0.7. Indeed, each of the measures of the five dimensions of Christian practice on the PFS exhibit sufficient levels of reliability.
Conclusions

The PFS was developed with the most thorough process available in current assessment design, and that process has provided us with good empirical evidence for sufficient levels of validity and reliability.

This does not mean that the survey captures everything. The fullness of practicing faith as students within the life of a school cannot be captured by a mere survey. For instance, the list of items connected with relational practices dimension does not reflect every conceivable way to embody neighborly love within the school community. The process of measurement inherently reduces the object being measured so that a number can be feasibly assigned to represent it. Measurement is a method to bring something hidden into the light, but it cannot bring the entirety of that object to the light. We caution educators, students, and parents against simply chasing ways to maximize scores on the Practicing Faith Survey. Instead, we encourage them to think of other ways in which the five dimensions or practice can be embodied and practiced. We hope to see students and their schools expanding their sense of faithful living, following the Apostle Paul’s exhortation to “live worthily of the calling with which [they] have been called.” (Ephesians 4:1, NET). This survey offers some valid and reliable data points to help guide this process.

References

Gehlbach, H., & Brinkworth, M.E. (2011). Measure twice, cut down error: A process for enhancing the validity of survey scales. Review of General Psychology, 15(4), 380-387.

Is the Survey Valid and Reliable?

Research

1. Literature Review

2. Drafting

3. Focus Groups

4. Expert Panel

5. Pilot

6. Diagnostic Data Analysis

1. Structural Validity

2. Convergent Validity

3. Discriminant Validity

4. Reliability

Conclusions

References