At 7:20 one recent morning, less than 24 hours after taking an essay test in my Advanced Placement English Language & Composition class, one of my top-performing students dropped by my classroom and asked, “Mrs. Nahirny, what did I get on my essay?”
“I don’t know yet,” I responded. “I’m in the middle of grading now. “I’ll probably need another day or two to finish all 71 of them.”
“I’m just anxious to see if the essay impacted my grade; I wanted to see if I still have an A.”
Her disappointment at my inability to plow through several dozen multi-page essays overnight left me wondering whether I should’ve asked her if she wanted me to try a new automated software program developed by EdX, the nonprofit enterprise founded by Harvard and Massachusetts Institute of Technology, which will soon be available online at no cost, to score student essays. The new service will “grade” student essays instantaneously, providing immediate “feedback.” According to a New York Times article, “The software will assign a letter grade depending on the scoring system created by the teacher…and provide general feedback, like telling a student whether an answer was on topic or not.”
Should teachers use artificial intelligence technology to grade essays? Automated grading systems (like Scantron) have helped teachers grade multiple choice tests quickly for years. But even the College Board still uses educators to read and score students’ SAT and AP essays. (Disclosure: I score AP essays for the College Board every summer.)
Yet, using automated technology to score certain essays might make sense in some cases. Think about the Florida Department of Education, which administers the FCAT Writing 2.0 test. The software created by the nonprofit EdX is free. If DOE used it, that alone could save Sunshine State taxpayers millions of dollars, just by eliminating the costs of training and paying temporary workers, (who have no specialized training in the pedagogy of writing instruction anyway) who are hired by a for-profit contractor to score the essays.
It would also save the state lots of time, too. Students won’t receive their FCAT scores until mid-May, nearly three months after composing their essays. Students who take the SAT get their scores back online from the College Board in as few as 17 days. Using EdX software, Florida students could know how they fared within a day. Additionally, since the software, at least theoretically, provides “general feedback,” that would vastly improve what students currently receive from the state, which is now just a numerical score from 1 to 6, with no comments or suggestions for improvement whatsoever.
But many educators believe automated systems can’t provide the feedback students need or value most. Even the Flagler school district’s new evaluation system now rates instructors, in part, on the feedback they provide to their students. In Domain 3, Component 3D, administrators must rank teachers as “ 1-unsatisfactory”, “2-needs improvement”, “3-effective” or “4-highly effective” based on whether the feedback they provide to students is accurate, substantive, constructive, specific and timely.
I view giving feedback to students as one of the most valuable and important things I do as a teacher. I take it very seriously. That’s because when I was a child, I couldn’t stand teachers who returned work with nothing more than checkmark, a smiley face sticker or a numerical score atop the paper. Later, as a parent, I hated it when my children would work hours or days on a project, and get nothing more on it than a letter grade. No comments. No critiques. Nothing to even show that the teacher had done more than checked to see if the assignment had merely been completed. As a teacher, therefore, I give students substantially more feedback than I’d gotten from most teachers, and certainly more than my own two children received when they were in school.
This year, I have about 175 students in seven class periods. Mathematically speaking, this means during a typical school day, I can devote two or three minutes, tops, to each student individually. So ultimately, the “feedback” I provide on their assignments defines the “quality time” I actually spend with each student.
This is where “individualized instruction” comes in. When students get essays back from me, they see that in addition to the grammar issues I’ve addressed, I’ve also commented on their style, diction, syntax, tone and much more. Typically, I’ll write individualized messages on each essay ranging from a few sentences to a paragraph or more, about how to improve the essay by rewriting. Or maybe I’ll include some words of encouragement, or a personal message about how something they wrote reminded me of something in my own life–anything to make “connections” with a student to motivate them.
I was therefore somewhat taken aback when I got my own feedback, in the form of an evaluation from an administrator who recently observed me in my classroom, and received a “3” (effective) for “Component 3D: Providing feedback to students.” Although I received an overall evaluation of “highly effective,” as I have for many years, this one item bothered me so much, that I asked the administrator to come back to my room, and look more closely at the students’ portfolios, and read what I’d written on their work, to gauge the quality of the feedback I so painstakingly provided. I asked the evaluator to go through my emails, to witness the volume I read and respond to from my students each and every day of the week, even on weekends (more than two dozen just today, a Sunday, as I’m writing this), as teens seek assistance or clarification on assignments, or as they bounce proposed research topics off me.
The thing that most stuck in my craw about receiving a “3” (effective) vs. a “4” (highly effective), was that while I was out sick for three months earlier this school year for 42 radiation treatments and surgery for a cancer recurrence (documented in previous columns on Flaglerlive), though I wasn’t required to, I asked my substitute to send my students’ essays each week to me to grade. I read and scored hundreds of them, providing my usual copious comments, as I recovered. Doing this primarily helped the overworked sub stay sane. But it also prevented me from growing depressed over my medical situation, and connected me to the students, allowing me to continue to monitor their progress, even when I couldn’t physically be in the classroom. When I finally got back, it almost felt like I’d never left because I’d kept tabs on the students’ work. Likewise, we’d kept “in touch,” and they continued to benefit from accurate, substantive, constructive, specific and timely feedback from me. No slacking off with a sub, as kids are wont to do.
When I asked the evaluator to explain how my student feedback wasn’t “highly effective,” I was told: “In Domain 2, (the classroom environment) and Domain 3 (instruction), I can’t base your score on anything that I didn’t actually see going on in the classroom while I was in there on the day I was observing you. The score can be based soley on what I saw that day.”
Huh? Geez, if I’d known that, I’d have asked the evaluator to do half my observation in the classroom while I was teaching, and half at my house, where I could have been observed in real-time, providing feedback to students until 10 or 11 most nights.
I decided instead to ask those who matter most: my students. Turning the tables, I told them, “I’ve been giving you feedback for the past eight months. Now that the AP exam is only a few weeks away, I’d like some feedback from you. What do you think of the feedback I’ve been giving you? Is it too much? Not enough? Take out a sheet of paper and write what you think; you can be anonymous, as long as you’re honest.”
Here’s a sample: “I can tell you actually read our work.” “You tell me the truth about everything I need to fix.” “The more you chew up my paper, the better I learn to write!” “The comments are ruthless, but all true. The comments are so in depth that it makes rewrites a lot easier. You are one of the few teachers I know who sends out emails to students and parents with information and updates. Plus you respond quickly to my questions when I email you.”
Ultimately, the evaluator changed the component 3D score from “3” to “4” when provided with a preponderance of evidence (see page 3, Flagler County Teacher Evaluation, below). But my experiences and those of many of my colleagues illustrated that much like the free EdX software, no matter how useful, the Flagler County teacher evaluation system still needs some “tweaking.” Teacher effectiveness can’t be gauged solely be what is happening during class time, because so much of what we do that makes us effective with students takes place on our own time, before or after school when students seek us out for extra help, or at night when we’re providing feedback on student papers, answering student emails, and calling parents. It would take many more hours than a one-period classroom visit and a couple of five-minute walk-thrus to even begin to measure the breadth and depth of what teachers do.
Just as an essay graded by a computer can’t possibly appreciate irony, or the nuances of figurative language, or the subtlety of mood, likewise, those who evaluate teachers, be they administrators, parents or politicians, don’t necessarily get the “big picture” or see what goes on behind the scenes –because much of it is immeasurable and unobservable.
Rubrics and summative scoring guides and complex mathematical formulas notwithstanding, just as with grading an essay, assessing teachers and their effectiveness simply needs a touch of humanity.
Jo Ann C. Nahirny, a 1985 graduate of Columbia University and a National Board Certified Teacher, teaches English at Matanzas High School in Palm Coast. Reach her by email here.