Our friend Mike Martin in Arizona sent this to us. You might remember Mike as the author of “Waiting for Superfraud,” the most popular article we have ever published (by far!). Here he takes on the use of student test scores as a measure of teacher effectiveness.
On Mar 9, 2011, at 8:24 AM, Michael Martin wrote:
Maybe it’s just me, but I keep hearing about how there are all these ineffective teachers in public schools that the unions are protecting. First of all, let’s ignore that unions are SUPPOSED to defend ineffective teachers until they are proven ineffective, just like lawyers are supposed to defend murderers until they are proven murderers.
It just seems to me there is no difference between the old “I have the names of 200 communists who are working for the state department” and today’s almost identical “I have the names of 200 ineffective teachers that unions keep on the job.” It seems to be a witch hunt.
In both cases it is essentially fantasy wedded to fallacy. It is as if there is an implicit assumption that all students are uniform commodities for which there is only one way to teach them along a single spectrum from bad to good. This is fantasy. The fallacy is that instead each classroom is a roomful of conundrums that teachers attempt to tailor customized instruction with varying levels of success. This single spectrum fantasy taken logically implies that somewhere in America there is a great teacher and everyone else is less effective, or ineffective. The fallacy is that teachers are actually functioning in an environment that is dynamic rather than static and children are not uniform commodities. What works one day for one group won’t always work another day with another group.
Back in 1979 the Philadelphia Federal Reserve Bank utilized its vast computer research capabilities with the Philadelphia school board to research what characteristics were associated with the largest gains in fourth grade reading scores. The researchers were trained econometric experts, experienced in finding associations between economic characteristics and growth. The kind of analysis that econometric researchers use is very closely related to the value added concept in testing.
Their findings were that the usual things about teachers were not strongly associated with reading gains. What I found most illuminating, however, was their finding that whether the teacher had a background in reading instruction was weakly associated with reading gains, but whether the principal had a background in reading instruction was strongly associated with reading gains. Meaning there was empirical research forty years ago that showed student test scores were more strongly influenced by the principal than by the teacher. Not to mention last year’s teacher has a significant impact on the success of this year’s teacher.
Plus, in 1991, some of the top testing experts, Daniel M. Koretz, then with the RAND Institute, Robert L. Linn, with the University of Colorado, Steven B. Dunbar, with the University of Iowa, and Lorrie A. Shephard of the University of Colorado, collaborated on a paper in which they compared student test scores on different tests of the same subject matter. They used the existing test of the school district to compare with a test that previously was used in the district. When the existing test had been introduced the previously high test scores plummeted. Now several years later the new test had high scores. So they administered both tests to the same students and the administration of the old test showed low scores. Their point being that both tests that had demonstrated high subject test scores for the same subject, showed entirely different test scores depending on which test the teachers were prepared for.
Thus test scores do not actually measure subject knowledge, if they did both test scores would be the same. Thus to use test scores to measure teacher effectiveness is both a fantasy and a fallacy. We know from empirical research that student test scores are more strongly influenced by the principal than the teacher, and we know from empirical research that students with the same subject knowledge will produce entirely test scores on different tests. Yet the people with the pitchforks want to use test scores to find ineffective teachers.
There is no denying that there are some ineffective teachers in classrooms; there is no denying that there were some communists in the State department. That there were any substantial numbers in either case is unlikely, but there is no empirical evidence for either. It is a fantasy world of witch hunts, much like the witch hunts of Salem persecuting something we now know is fantasy. Witches cannot do what they were persecuted for doing.
When I first became a “boss” after college, I inherited an ineffective secretary who was unable to function other than ways she determined were the way it had always been done. I was told my predecessor had the same problem. I struggled to force the secretary to change that went so far as to result in her passing out on the job from “stress.” I shared this secretary with a more experienced colleague who simply arranged for the secretary to be transferred to a job where she could be more successful. In other words, a lot of the spectrum of effective and ineffective is based on matching the people to the job. The fantasy that all teachers are in the same job with the same students is the fallacy behind designating teachers as effective or ineffective, even if we did have a valid measure of effectiveness, which we don’t. My excuse is that I was young, inexperienced and stupid, what is Bill Gates’ excuse?
The National Academy of Sciences assigned the National Research Council to evaluate the Washington, D.C., school system. Their preliminary “Plan” report subtitled “From impressions to evidence” was released Monday and in it they state unequivocally “It is important to note that measuring teacher effectiveness is a complex endeavor for which there is no established consensus in the education research community.”
Well, d’oh.


Does it matter that the research community believes that “It is important to note that measuring teacher effectiveness is a complex endeavor for which there is no established consensus in the education research community.”? Apparently not, because this is exactly the course being pursued by states across the nation. It all starts with money, of course. If you want any RttT dollars, you must comply whether it works or not. The time has come to stand up and speak out.
Mike Martin adds to his comments:
There is a crucial distinction to be made here. The Federal Reserve findings are empirical facts, they are not interpretations. In a December 20, 2010, Hechinger Report titled “Improving teachers means improving principals, too,” Alan J. Borsuk wrote:
An analysis of a large body of education research conducted a few years ago concluded that a third of the effect that a school had on students came from how teachers do their jobs. But a quarter of the effect – the second largest factor – came from principals. An author of that study, Karen Seashore Louis of the University of Minnesota, said new research in which she has been involved sheds more light on that issue. “Principals have a very strong effect on student learning, but it’s primarily indirect and it’s primarily because of the way their behaviors encourage teachers to work together on improving their professional practice.”
The Hechinger Report is dealing with research findings, empirical facts, plus some interpretations. My point being that one must consider empirical facts when making interpretations, but not vice versa. Research producing empirical facts says principals have a major effect on student test scores. That is entirely distinct from interpretations or speculations based on that empirical fact.
You can argue about interpretations, you can dismiss speculation, you can wonder or disbelieve opinions about facts but facts have to be considered. I find over and over that a large part of the population does not understand the difference between facts and opinions. Opinions and interpretations are ephemeral, facts are not.
So, given the fact that principals have a large effect on student test scores, one necessarily must consider the principal when evaluating test scores. Consequently, to evaluate student test scores solely in terms of the teacher’s performance is invalid. In the case of the Federal Reserve study, it was found that a principal having a background in reading instruction strongly influenced student reading test scores. That was a distinction with other principals who did not have a background in reading instruction. Therefore one cannot evaluate teacher performance in a school by saying they all had the same principal. The facts state that the effect of the principal varies by subject matter taught.
I will admit when I first read the Philadelphia Federal Reserve study back in about 1980, my first response was “what does bozo in the front office have to do with student test scores?” However, it is crucial to consider why having a teacher who is knowledgeable in reading instruction was not as strongly associated with reading gains as having a knowledgeable principal. Presumably if the decisive factor was hiring the teacher, then it would have been having a teacher with a background in reading instruction that was crucial, and it was not.
What I believe is the crucial factor is providing teachers with the resources and leadership to implement a consistent reading program. It is the consistency that allows teachers to influence each other so that the third grade teacher implements reading instruction that complements the fourth grade reading program.
What I have concluded from the research I have reviewed since then is that the reason a principal with a background in reading has a strong influence on student reading scores is that the principal becomes involved in structuring the reading program. He/she does not leave the reading instruction to the whims of isolated teachers. The derivative point of this is as Harvard Professor Richard Elmore has stated: “This view derives from the assumption that learning is essentially a collaborative, rather than an individual, activity-that educators learn more powerfully in concert with others who are struggling with the same problems-and that the essential purpose of professional development should be the improvement of schools and school systems, not just the improvement of the individuals who work in them.”
So my conclusion is that mediocre teachers will outperform even the best teachers if the mediocre teachers work together to implement a consistent complementary education program while the best teachers work alone in isolated classrooms. In a sports analogy, it is why John Wooden could consistently produce incredible basketball teams, why Vince Lombardi could consistently turn a small market football team from Green Bay into national champions, and why when I went through U.S. Marine Corps training they constantly ridiculed “John Wayne”: complementary organization will outperform inconsistent superstars. You may notice this is essentially the same interpretation that Karen Seashore Louis gave in the Hechinger quote above.
Michael T. Martin