Well, d’ho.

 

Our friend Mike Martin in Arizona sent this to us.  You might remember Mike as the author of  “Waiting for Superfraud,” the most popular article we have ever published (by far!).  Here he takes on the use of student test scores as a  measure of teacher effectiveness.

On Mar 9, 2011, at 8:24 AM, Michael Martin wrote:
Maybe it’s just me, but I keep hearing about how there are all these ineffective teachers in public schools that the unions are protecting. First of all, let’s ignore that unions are SUPPOSED to defend ineffective teachers until they are proven ineffective, just like lawyers are supposed to defend murderers until they are proven murderers.

It just seems to me there is no difference between the old “I have the names of 200 communists who are working for the state department” and today’s almost identical “I have the names of 200 ineffective teachers that unions keep on the job.” It seems to be a witch hunt.

In both cases it is essentially fantasy wedded to fallacy. It is as if there is an implicit assumption that all students are uniform commodities for which there is only one way to teach them along a single spectrum from bad to good. This is fantasy. The fallacy is that instead each classroom is a roomful of conundrums that teachers attempt to tailor customized instruction with varying levels of success. This single spectrum fantasy taken logically implies that somewhere in America there is a great teacher and everyone else is less effective, or ineffective. The fallacy is that teachers are actually functioning in an environment that is dynamic rather than static and children are not uniform commodities. What works one day for one group won’t always work another day with another group.

Back in 1979 the Philadelphia Federal Reserve Bank utilized its vast computer research capabilities with the Philadelphia school board to research what characteristics were associated with the largest gains in fourth grade reading scores. The researchers were trained econometric experts, experienced in finding associations between economic characteristics and growth. The kind of analysis that econometric researchers use is very closely related to the value added concept in testing.

Their findings were that the usual things about teachers were not strongly associated with reading gains. What I found most illuminating, however, was their finding that whether the teacher had a background in reading instruction was weakly associated with reading gains, but whether the principal had a background in reading instruction was strongly associated with reading gains. Meaning there was empirical research forty years ago that showed student test scores were more strongly influenced by the principal than by the teacher. Not to mention last year’s teacher has a significant impact on the success of this year’s teacher.

Plus, in 1991, some of the top testing experts, Daniel M. Koretz, then with the RAND Institute, Robert L. Linn, with the University of Colorado, Steven B. Dunbar, with the University of Iowa, and Lorrie A. Shephard of the University of Colorado, collaborated on a paper in which they compared student test scores on different tests of the same subject matter. They used the existing test of the school district to compare with a test that previously was used in the district. When the existing test had been introduced the previously high test scores plummeted. Now several years later the new test had high scores. So they administered both tests to the same students and the administration of the old test showed low scores. Their point being that both tests that had demonstrated high subject test scores for the same subject, showed entirely different test scores depending on which test the teachers were prepared for.

Thus test scores do not actually measure subject knowledge, if they did both test scores would be the same. Thus to use test scores to measure teacher effectiveness is both a fantasy and a fallacy. We know from empirical research that student test scores are more strongly influenced by the principal than the teacher, and we know from empirical research that students with the same subject knowledge will produce entirely test scores on different tests. Yet the people with the pitchforks want to use test scores to find ineffective teachers.

There is no denying that there are some ineffective teachers in classrooms; there is no denying that there were some communists in the State department. That there were any substantial numbers in either case is unlikely, but there is no empirical evidence for either. It is a fantasy world of witch hunts, much like the witch hunts of Salem persecuting something we now know is fantasy. Witches cannot do what they were persecuted for doing.

When I first became a “boss” after college, I inherited an ineffective secretary who was unable to function other than ways she determined were the way it had always been done. I was told my predecessor had the same problem. I struggled to force the secretary to change that went so far as to result in her passing out on the job from “stress.” I shared this secretary with a more experienced colleague who simply arranged for the secretary to be transferred to a job where she could be more successful. In other words, a lot of the spectrum of effective and ineffective is based on matching the people to the job. The fantasy that all teachers are in the same job with the same students is the fallacy behind designating teachers as effective or ineffective, even if we did have a valid measure of effectiveness, which we don’t. My excuse is that I was young, inexperienced and stupid, what is Bill Gates’ excuse?

The National Academy of Sciences assigned the National Research Council to evaluate the Washington, D.C., school system. Their preliminary “Plan” report subtitled “From impressions to evidence” was released Monday and in it they state unequivocally “It is important to note that measuring teacher effectiveness is a complex endeavor for which there is no established consensus in the education research community.”

Well, d’oh.