Ideas to Consider: How to Make AYP Work Better for Students and Their Schools

There is a growing consensus that, as described in the previous article, the formula for determining whether a school has made adequate yearly progress (AYP) is deeply flawed. These flaws can produce misjudgments about a school's effectiveness and then, under NCLB, trigger consequences that impede, rather than support, school improvement. (For example, a school that is improving could be shut down or turned into a charter.) Here are ideas culled from the worlds of policy and research on a few ways to overcome some of AYP's main problems.


1. Establish ambitious, but attainable, goals for how much progress schools should make.

Goals for school progress mandated by an accountability system should be ambitious, but they also should be realistically obtainable with sufficient effort. At the very least, there needs to be an existence proof. That is, there should be evidence that the goal set for all schools does not exceed what has been accomplished by those schools that have made the greatest progress. For example, you could identify the 10–20 percent of high-poverty schools that had the greatest average rate of increase over three to five years and establish that rate of improvement as a target to shoot for. That would be a great challenge to the vast majority of schools, but it might be a target that is within reach with sufficient effort. Such evidence-based performance goals are necessary under AYP—and would be necessary in a value-added system as well.

–Robert Linn,
Distinguished Professor of Education, University of Colorado at Boulder

2. Evaluate schools based on students' progress, not their status, using a genuine "value-added" system.

Many critics of No Child Left Behind's AYP provisions point to two related problems with the current system that teachers immediately understand: First, how well kids do in school depends partly on things that happen to them outside of school, especially at home. Second, schools shouldn't be judged simply on their students' absolute test scores (since so much of that depends on out-of-school factors); they should be judged by how much progress they help their students make—their "value added." Here's the view of one statistician.

Adequate yearly progress (according to NCLB) is primarily determined by the proportion of students who are "proficient" on the state's accountability test. It depends on the students' achievement status at a point in time. One concern with using achievement status for accountability is that it depends largely on factors unrelated to school quality (such as what a child already knows when he enters kindergarten, what he learns during the summer, etc.) and is highly correlated with family attributes such as wealth and parents' level of education. Critics of NCLB and AYP often complain that by focusing on achievement status, the law evaluates school quality based on factors that are unrelated to schooling.

A potential alternative that is gaining popularity among researchers, educators, and policymakers is to evaluate schools based on individual students' growth in achievement. This is an intuitively appealing idea because schools contribute to student learning and learning should be visible as growth in achievement, regardless of the students' initial status. Studies find that although growth can also be related to family and neighborhood attributes, it is much less related than achievement status is. Therefore, measuring growth is seen as a way to focus on schooling inputs to education and is often referred to as "value-added" assessment. Value-added assessment systems follow individual students' achievement over several years. The term should not be confused with other measures that compare successive cohorts of students (e.g., this year's fourth-graders to last year's), but do not report on how the same students do on tests over several years.

Proponents of value-added assessments argue that they more accurately portray the actual quality of schools and point to the examples of schools with low achievement status but high levels of growth or schools with high levels of achievement but low growth to argue that value-added measures can truly identify high or low performing schools rather than merely identifying schools that serve high or low achieving populations. (For a hypothetical example, see the figure in "The AYP Blues.")

Value-added assessments clearly have their benefits, but there are challenges to overcome before states build them into their accountability systems. First, although they are an improvement over achievement status measures, value-added assessments might not provide accurate measures of schools' contributions to learning, and methods to account for such errors are being developed. Second, the statistical procedures used—such as adjustments to minimize the effects of family and neighborhood attributes—are complex and lack the transparency that many people consider desirable in accountability systems. Third, measuring growth requires linking individual students' scores across grades and places great demands on the tests to ensure that growth is due to student learning, not differences in the tests. Many state tests and data systems do not currently meet these rigorous requirements. Fourth, in order for value-added assessments to contribute to an evaluation of school quality, standards for adequate growth need to be set. However, meeting a standard of growth will not ensure meeting a standard of proficiency. As I heard one policymaker say, "How do you explain to parents that their children's schools provided good value-added growth for the last 12 years, but their children still failed to pass the state's high school exit exam?" Possibly different standards of growth will be required for different schools. For these (and other) reasons, any move to value-added accountability systems should be paired with well-designed evaluations to give us a better understanding of the best uses of the measures and their potential limitations.

–Daniel McCaffrey,
Statistician, The RAND Corporation

3. Require statistical safeguards. But don't use them to take the focus off of traditionally low-achieving groups.

As discussed in "Accountability 101: Tests Are Blunt Instruments," there are many reasons, unrelated to teaching and learning, why a school's test scores might not accurately reflect the school's quality. To help assure that these inaccuracies don't lead schools to be misidentified, it's critical that states make use of such statistical safeguards as confidence intervals and multi-year averaging; not all states currently do so. These safeguards are critical to any accountability system, not just NCLB's, and are even more important when accountability is based on achievement growth.

The need for statistical safeguards is especially important when dealing with small groups, such as the subgroups identified by NCLB. Most states try to address this by establishing a minimum subgroup size—meaning that if a subgroup is too small, its progress won't count for AYP determination. Unfortunately, not only is this a rather crude statistical safeguard, but it also undermines NCLB's promise to "leave no child behind" by allowing states to avoid counting the test scores of small groups of disadvantaged children. Confidence intervals do a better job of ensuring statistical reliability by taking into account the statistical "margin of error" in the AYP calculation in the same way that public opinion polls include a margin of error. Because the margin of error widens as subgroups get smaller, large minimum subgroup sizes are not needed—and more subgroups can count for AYP determinations.

Several states are now "gaming" AYP; by establishing excessively large minimum subgroup sizes, especially for the special education subgroup, states seem to be focused primarily on reducing the number of subgroups counted in AYP determinations rather than on fairness and accuracy.

–Howard Nelson,
Senior Researcher, American Federation of Teachers

4. Complement the evidence provided by test scores with on-the-ground observations.

In Britain, where greater school accountability has also been introduced, an inspection system, in which trained inspectors observe schools, is used to complement the data produced by test scores. An aide to Britain's prime minister explains one of the reasons why.

Inspection enables a much more refined approach to dealing with school failure. Intervention in schools that are seriously underperforming—enabled by the development of accountability—has been hugely beneficial, but where the system depends purely on test results to determine school failure or success, it risks being far too crude. Our interventions in failing schools are driven by the inspection system. Where a team of inspectors judges a school to be failing ("in need of special measures," as the legal euphemism puts it), a second team of inspectors follows up shortly afterwards to corroborate the judgment. This process enables real analysis—not just of whether performance is poor, but also why. In addition, it enables the system to identify and tackle failure even where it is masked by temporarily reasonable test results.

Once a school is in special measures, the inspectors return three times a year. Often within a year or 18 months, they are able to give a school a clean bill of health. Our evidence suggests that in these circumstances, the expertise of the inspectors is hugely appreciated. For the principal and staff, these visits are simultaneously both challenging and beneficial. They provide an expert commentary to the school on what is happening. There is feedback on the impact of changes in leadership, standards of attendance and behavior, staff morale, and the systems in place for grading work, dealing with pupils with special educational needs, and so on. These changes are the lead indicators that point to improvements in test scores in the future. A system depending purely on test scores both for intervening and for deciding whether the intervention has worked has no such subtlety and can sometimes have destructive consequences.

–Michael Barber,
Head of the Prime Minister's Delivery Unit, United Kingdom

5. Distinguish among schools that are progressing substantially, schools that need to improve, and schools that desperately need to improve.

NCLB should replace its all-or-nothing AYP calculation with a more flexible approach. One might, for example, distinguish among schools that are making progress overall and in 90 percent or more of their demographic subcategories; those that are making progress overall but in less than 90 percent of categories; and those failing to make acceptable overall progress. Such a triage system would reduce the vast number of mostly okay schools that are now being labeled as "needing improvement." It would distinguish between those that are on the verge of succeeding and those that are catastrophically inadequate. It would enable states and districts to focus on repairing the latter.

–Chester Finn/Frederick Hess
President, Thomas B. Fordham Foundation/
Director, Education Policy Studies, American Enterprise Institute

The comment by Chester Finn and Frederick Hess is drawn from "On leaving no child behind" in the Fall 2004 issue of Public Interest. Michael Barber's comment is drawn from a lecture he gave at Boston University that is published in the current issue of the Journal of Education.

Related Articles

The "AYP" Blues
Low-Achieving Schools Will Fail—but They're Not the Only Ones
By Nancy Kober

Accountability 101: Tests Are Blunt Instruments

Ideas to Consider: How to Make AYP Work Better for Students and Their Schools

American Educator, Spring 2005