What does validity mean? And other questions you have about the current state of school accountability
The Department of Education has released an avalanche of information over the past few weeks regarding the initial administration of the new Florida Standards Assessment (FSA) this past spring, and upcoming steps remaining in completing the standards setting process for those results and their use in pending accountability reports.
One of the most anticipated documents of this whole process was the independent study intended to assess the security and fairness of testing procedures in light of some concerns raised during administration, as well as the overall validity of the new FSAs for measuring standards-aligned achievement in each grade-level and subject area they cover. The report (available here: Executive Summary/Full Report) produced by Alpine Testing Solutions, a private test development and research organization based out of Utah, was widely intended to help clarify the path forward during what has become a contentious statewide transition in standards and assessments.
Rather than provide the clear direction many were hoping to get from it, the report instead has quickly become a political Rorschach test for both sides of Florida’s increasingly divided education community — offering results and recommendations just vague enough for any side to read what they want to read into it.
Nowhere was this more evident than at last week’s Senate Education Pre-K-12 Committee Meeting to review the report’s conclusions with the researchers and Commissioner Stewart. Perhaps in anticipation of some of the questions, the Department of Education (FLDOE) issued a press release prior to the meeting to try to address some specific concerns about the report’s findings and recommendations that had been already mounting statewide. Nevertheless, many lawmakers seemed to leave the meeting with about the same level of comfort with the results as they came in with — some supportive, some opposed, many confused in-between (see here, here, or here for more).
Given the amount of confusion still evident among many of those leading this process, it shouldn’t be a surprise if you’re feeling unclear about what’s happening as well. Below is a quick primer on how we got here, what we learned from the report, what happens next, and what it means for you.
How We Got Here:
In 2010, Florida adopted the Common Core State Standards (CCSS), a new set of learning standards developed by the the Council of Chief State School Officers and the National Governors Association Center for Best Practices that were designed with a focus on college and career readiness and better streamlining expectations for students and achievement comparisons across states. These standards were independently developed and voluntarily adopted by 42 states. The standards were to be fully adopted at all grade levels by 2014.
As the rollout date for the full transition grew closer, mounting public concern and confusion prompted a political response to withdraw Florida from a national testing consortium and adjust the standards into something slightly different, now known as the Florida Standards.
The timing of this decision left Florida scrambling to put together a new, independent statewide standardized assessment aligned to the new Florida Standards less than a year before FCAT 2.0 was phased out.
In March 2014, Florida hired American Institutes for Research (AIR) to develop the new Florida Standards Assessment (FSA). As part of their proposal, AIR planned to use items rented from a similar exam created for Utah (SAGE) due to the short turnaround time, while continuing to work on the development of a fully unique set of test items for Florida to be ready by 2015-16.
The FSA was administered in the spring of 2015, with technical glitches reported in some areas, along with renewed focus on the items rented from Utah’s SAGE test. This prompted lawmakers to take a renewed look at the FSAs and FLDOE’s plan for using its results in student, teacher and school accountability systems. One of the major resolutions of the 2015 Legislative Session was to commission an independent audit of the content and administration protocols of the FSA. This report — known to many as the “validity study” — was released earlier this month.
What we Learned from the Report:
Was the test “valid”?
In some ways this is a loaded question, as “valid” in casual language and statistical language may not always mean the same thing.
In the words of the Alpine researchers, they concluded that “the policies and procedures that were followed [in developing the FSA] are generally consistent with expected practices as described in the Test Standards and other key sources that define best practices in the testing industry.”
In other words, the way the test was developed was found to meet the generally accepted national best practice standards for developing a new test. Part of the confusion comes in because statisticians use the word “validity” differently than most people — ie, parents, legislators, etc — do.
When most of us ask if a test is valid, we typically mean something more like, “Does this test tell me exactly what I want to know about student academic performance?”
The answer to that is not as simple, but is also something that no single test would ever be able to do. To think about it another way, if you were interested in knowing whether a car you were about to buy was a “good” car, the answer to that depends a lot on what you mean by “good.” One report may be able to tell you if the car was manufactured in a way that meets all of the current standards for safety and performance for this type of vehicle, but that doesn’t necessarily mean it’s meant to be used for racing, towing, and commuting. How useful any given vehicle is, like any given test, depends greatly on what you want to use it for.
In this case, the report broadly endorsed the FSA as meeting the generally accepted standards of rigor for test development and cautiously endorsed the use of aggregate scores at the district and state levels, but stopped short of specifically commenting on the appropriateness of these results for the purposes of school grades or teacher evaluations in their current forms.
Did technical problems significantly impact student assessments?
While the report indicates that the “precise magnitude of the problems is difficult to gauge with 100% accuracy,” they use multiple data sources to try to estimate what they could.
Researchers report that statewide usage data puts the estimate somewhere between “1% - 5%” of students statewide experiencing disruptions in taking the computer-based tests, though their own surveys with district and school administrators put the numbers potentially much higher.
In an online survey of district assessment coordinators, Alpine reports that as many as 94% (Writing exam), 91% (Reading exam), and 91% (Math exam) of district coordinators reported experiencing some type of disruption in administering the computer-based tests. These reported disruptions included issues ranging from those deemed to have little or no impact on student test-taking, to more significant issues such as students being inadvertently logged out of the system or losing work.
Evidence of these issues played a large part in Alpine’s recommendation that the computer-based assessment scores from these assessments not be used as a sole determinant in student-level decisions such as promotion, retention or graduation. (Paper-and-pencil delivered exams were supported for use in student-level decisions.)
Because hard evidence of the issues indicated relatively smaller levels of impact at the statewide level, the report supported the use of these scores for reporting at the statewide level, but cautioned that impacts may vary significantly at the district or school levels.
Were the test items appropriate for Florida students?
Questions about the items rented from Utah for the 2014-15 assessment raised questions about whether they were aligned with Florida Standards and whether they were appropriately field-tested for Florida students.
Results of the report indicate that test item alignment with Florida Standards varied by grade level, with anywhere from 6% (4th grade Math exam) to 33% (3rd grade English/Language Arts exam) of test items being found to not be aligned with the intended Florida Standards they were supposed to be measuring.
FLDOE has countered concerns about this by highlighting a broader finding that over 99% of test items (383 out of 386) were found to align with at least one of the Florida Standards (if not the one it was intended to).
This is not particularly useful from a measurement perspective, and the researchers nevertheless strongly recommended moving to replace these items with questions aligned to the Florida Standards they are intended to measure before the 2015-16 administration.
In terms of the field testing, the report generally supported Florida’s approach given the conditions. For more information on why this approach is not uncommon among other states, see the comments posted below the article here.
What happens next?
FLDOE is currently in the process of establishing appropriate achievement level cut scores for the 2014-15 FSA results. This process involves a combination of gathering input from educators and other stakeholders on the content of the test items to determine appropriateness and difficulty of items at each grade level, as well as running statistical models to determine the impact of different cut scores on the 2014-15 data with respect to prior achievement and achievement on comparable tests.
This process has or will include input from the following groups:
Achievement Level Description Panel: April 28-May 1 (Tallahassee)
Educator Panel: August 31-September 4 (Orlando, FL)
Reactor Panel: September 10-11 (Orlando, FL)
Rule Development Feedback Workshops: September 15-17 (Mult. Cities)
Public Rule Development Feedback Forum: September 24 (Tallahassee)
Online Public Rule Development Feedback Survey: Open until October 15.
Once final cut scores are determined, they will be retroactively applied to last year’s test results to be used as the baseline results for performance on the new FSAs going forward. The results will also be used to produce school grades and teacher evaluation scores for 2014-15, which FLDOE expects to release sometime in or after December 2015.
FLDOE has emphasized that first year FSA results and any accountability scores (such as school grades) produced from them should be considered “informational baseline” reports only, and will not have any formal penalties associated with poor performance attached to them.
On Monday, the Florida Department of Education issued more information about its plans for this fall — namely, that they will be issuing results of the 2014-15 FSAs to parents and school districts in the form of T-scores and percentile ranks because (as they report) “the standards setting process will not be completed by the time scores are required to be reported.” These scores will tell parents and educators how individual students performed in relation to other students state-wide. This is not ideal (or arguably appropriate) for this type of test and is likely to cause more confusion in the short term. Read more about this latest development in this Tampa Bay Times story.
What does it mean for you?
Student scores are likely to “drop”, but don’t panic. Based on results coming out of other states going through similar transitions, andprojected impact results of the cut scores proposed so far, performance results across most schools and grade levels will likely go down compared to last year’s results. But thinking about these scores as directly comparable to previous scores is a major contributing factor to the overall confusion around these tests.
As a parent or a community member engaged in education, it is important to understand that these changes are expected as part of a transition to more difficult standards and assessments, and should be seen as a new measure of performance rather than a continuation of performance that should be directly compared to previous FCAT 2.0 performance.
School grades are likely to drop significantly, and the explanations may be confusing. We have advocated at length for why we believe school grades should not be released for the 2014-15 school year, but the state has made clear that they will be releasing 2014-15 “information only” school grades sometime in or after December. We’d encourage you to revisit why the current school grading formula, with an appropriate emphasis on student learning gains, becomes problematic during a transition year from one test to another. Keep this information in mind as you interpret with caution any new school grades information that comes out over the next few months.
Some of this will likely happen again next year — don’t forget why. Remember that it has always been the plan for AIR to have to replace a number of test items which were rented from Utah’s SAGE exam with items specific for Florida. A recommendation to do this was reiterated in the Alpine Testing Solutions report, but it has always been part of the plan. As we get closer to the Spring 2016 FSA administration and 2016 legislative session, there’s a good chance that following through on this process will be considered by some to be “more changes to the test” again, and become an issue.
If we are to ever get through this transition period fully and achieve some stability in our expectations and have meaningful measurement standards for students, teachers, and schools we need to remember the full scope of what this transition involves and see it all the way through without changing course again just before we get there.
Keep checking back here for updates as new information is released, and be sure to check our Get Involved page for information on upcoming community meetings and events around the new standards, tests, and accountability reports.
And plan to join us on October 19, where we’ll be discussion this and other issues at the ONE by ONE Public Education Forum. Register today!