Skip to content
October 20, 2011 / Ben Chun

Automatic Grading

For the past couple of years, I’ve been using an automatic code evaluation system that I made for my AP class. The students write code according to a spec, and then I give them a precompiled class file that relies on that spec. It takes a bit of shuffling to get things in the right place, but JCreator will run the class file and I don’t have to give them the source. My goal was to cut down on my own workload, but I’ve also found that it speeds up the process by which students uncover their own errors and eventually learn to anticipate them well enough to write their own test cases.

Here’s an example of the output for a recent lab:


0	Checking for required classes... 1/1
1	Checking Classroom methods... 4/4
2	Testing add, remove, and size... 1/1
3	Testing createStudents... 2/2
4	Testing class average GPA methods... 2/2

TOTAL = 10/10

The grader uses the Java Reflection API to check for required classes, methods (including return type and parameter lists), and instance variables. It also captures the system output and input to allow me to simulate interaction with their code. This functionality is all in a base class. Then I write the specific test cases for a particular assignment in a subclass, as a series of scored tasks. Students get feedback via the scores, telling them which tasks succeed and which don’t. There’s also a facility for giving a hint when a task doesn’t get the maximum possible score.

I’ve hesitated to share the code, knowing that the style is weird (and not very Java-ish), that there are probably better automated code assessment tools out there, and that anyone else who thought of this would probably want to write it themselves. On the other hand, maybe this will inspire someone or be a starting point for something else. I’d love it if anyone has input on how to improve or generalize this.

You can grab the code, including an example, from As the kids say these days, please fork my repo — I’d be happy to discuss ideas and get your pull requests!



Leave a Comment
  1. gasstationwithoutpumps / Oct 23 2011 8:07 am

    Autograding for CS classes is fairly common now, but it has resulted in a lot of “programmers” who’ve never had anyone read their code and give them feedback on the style. I have had grad students who said that I was the first person ever to give them feedback. It showed in their code, as they had no idea how variables should be named or documented. Their internal documentation was either absent or useless, as no one had ever checked to see if they had learned how to do it.

    Your method, where the students can test their code into a perfect grade, often results in code that is extremely fragile and does not work on values other than the test cases provided.

    At the very least, you need to test them for their grades on a different set of test cases than they used for debugging.

    Note: I’m not saying that it is a bad idea to provide students with test cases and automated testing tools—just a bad idea to then base their grades solely on that. It might be better to teach them how to use standard QA tools for testing programs, rather rolling your own, since many will end up working QA jobs anyway.

  2. Ben Chun / Oct 23 2011 2:17 pm

    Thanks for reminding us all about the potential weaknesses of automatic grading. I should probably give a little more context for how I use this in my class, because as you point out things can go very wrong.

    Since this is a high school class, I’m with the students 5 hours a week and we spend a little over half that time with them writing code. So I have lots of opportunities to give feedback on things like style and conventions, and to point out weaknesses in one approach or another.

    I use this automatic grading software more like an acceptance test — they don’t get to see the test cases that go into it, and they don’t get to test against it until they’re done implementing. It’s an interesting idea to give them some kind of testing tools to use during development. So far, all of that kind of testing has been informal.

    • gasstationwithoutpumps / Oct 23 2011 3:27 pm

      Thanks for the extra info. I agree that acceptance testing on unknown test cases is an excellent thing to include when grading programs. I generally do that also, though I ended up using makefiles for my testing, as I was testing completed programs and I was more comfortable programming a makefile than learning a new QA tool.

      I always read the code as well, and provided detailed feedback on the documentation and style. Feedback while the students are working on the code is very good, but feedback on the finished product may be taken more seriously.

      My son’s high-school-level programming class did a lot of unit testing as a required part of homework, using check-expect in Scheme. I have not been requiring unit testing in my classes, generally because the development time they have is too short, and developing good tests would double the time it takes to do the assignments. Putting together crappy tests would be a waste of time. If I were teaching a course where learning to program was the main point (instead of a little extra on a course mainly about stochastic modeling and algorithms), I would probably do something more formal with unit testing.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: