Saturday, 7 August 2010

Computer Adaptive Testing and the GMAT

Back around Christmas, I had a few weeks free and decided to prepare for the GMAT exam, a "computer-adaptive" standardized test required (or at least accepted) by business schools around the world. With university application deadlines looming, most of the testing sessions were already fully booked, but I managed to find a mid-January test date just a few hours' drive away.

I'll start by saying that I finished university eight years ago, so—ignoring a few intense weeks of German classes—it's been a while since I "studied". And it was about half my lifetime ago that I last wrote a standardized test: you know, with those sealed envelopes, bar-coded stickers, big machine-readable answer papers, and detailed instruction books reminding you to use a #2 pencil and "chose the BEST answer". If, like me, you haven’t written one of these in a while, you may be surprised by how much has changed.

The GMAT, like a number of other admissions tests, is now administered exclusively by computer and the test centres even have palm and iris scanners, which are used any time you enter or leave the room. Unlike most computer-based tests, though, which stray little from the well-worn paths of their pencil-and-paper siblings, the GMAT uses a computer-adaptive process undoubtedly conceived by a singularly sinister set of scholars and statisticians. This process is complex and has a number of interesting implications but basically it works like this: when you get an answer wrong, the questions get easier; when you get an answer right, they get harder. The theory is that by adjusting the test to your ability, the computer is able to rate you more precisely against people of about the same level.

The material on the test is not really that hard. It's no walk in the park either, but it mostly limits itself to stuff you learned (then unfortunately forgot) in high school. Everything else about the GMAT, though, seems designed to maximize stress:

  • The test is long. Nearly four hours long. There are three sections 60-75 minutes long, with a short break between each.

  • The test is timed. Ok, what test doesn’t have a time limit? But this one has a clock counting down in the corner of your screen, taunting you to pick up the pace. Worse than that: you'll probably actually need all the time because the questions keep getting harder as you get them right, remember? The challenge on this test is not usually solving the problems, but rather solving them in time.

  • The breaks are timed. Again, not surprising, I guess. But your break is only eight minutes long and, if you’re not back, the next section simply starts running without you! Inevitably you spend the entire break worrying about how many minutes you have left. Since you need to scan your palm and iris at the beginning and end of each break, your trip to the bathroom is not going to be leisurely.

  • Erasable notepads. No pens or paper are allowed, presumably so you can't smuggle out questions. Try working out math problems quickly on a laminated card with a dry-erase marker.

  • You can't skip a question. Remember that the questions get harder as you get them right and easier as you get them wrong. This means that the next question you see is largely determined by how you answer the current one. The computer needs an answer to the question, so you can’t skip it.

  • You can't go back. Similarly, since your current position is determined by your earlier answers, you can't go back and change them. So if you're used to finishing early and then checking over your work, you'd better start unlearning that habit.

  • You don't know the difficulty level of the questions. Is the test feeling easy because you really know your stuff or are you simply earning yourself easier questions by choosing a lot of wrong answers? The only saving grace here is that you're so busy madly answering questions that you don't have many brain cycles left to worry about this.

  • Some of the questions are "experimental". About 25% of the test is made up of new questions being tested on you to determine their difficulty level, but of course you don’t know which. That's right: that really hard question you just spent 5 minutes working on because you were sure you could solve it... doesn't count!

  • You are heavily penalized for not finishing. Right, ok, so you have a fixed time, you can't skip or come back, and you can’t predict the difficulty of the remaining questions. But if you want a decent score, you still need to pace yourself to answer all of them. Remember that countdown clock? You have about two minutes per question–so keep an eye on that average time! Oh, and the clock counts down but of course the question numbers go up, so you’d better get real quick at subtracting your time from 75 (you’ll be working out your average question time every few questions).

  • Data Sufficiency questions. These nasty little buggers are, I think, unique to the GMAT. Given a math problem and two statements, you are asked whether the problem can be solved using either statement, both statements, or both statements together. You don't need to work out the answer to the problem but you need partially solve it several times with different information and keep each attempt separate in your mind. Don't think that sounds tricky? Try searching for "sample gmat data sufficiency questions" and try a few. I think I got only about a quarter of these right on my first practice test.

You have to admire the brilliantly evil minds that came up with this thing. The experience for the test-taker is four hours of pure, non-stop stress. At least that was my experience: my brain literally didn’t stop whirring. The adaptive process pushes everyone to their limit, challenging them to keep their feet under them and ensuring that they're sweating right until the end.

The test designers have really optimized the experience around their own needs: the test is easy for them to grade, minimizes cheating, allows new questions evaluated automatically, and measures something in a pretty precise, consistent way. I’m not entirely certain what it measures, but I’m pretty confidant that people who are generally smarter, better organized, faster to learn and adapt, and better at dealing with stress will obtain a better result.

As a company selling a product, it might seem odd that GMAC (the company that runs the test) can get away with optimizing the test for their own needs. But, although it may appear that the test is the product you’re buying, I think what you’re really buying is the report that is sent to the universities. The cost of this report just happens to be $250 + study time + four hours of stress. If GMAC had competitors, they might be forced to optimize for the test-taker but, as a virtual monopoly, the motivation just isn’t there.

The challenge with the GMAT, I think, is really learning an entirely new test-taking strategy. I used a couple of books (Cracking the GMAT, by Princeton Review and The Official Guide for GMAT Review) to first understand the test and the differences in approach that were required and then to practice as many questions as possible of the specific types that appear. Doing computer-based practice exams is, of course, also essential given that what you’re learning is the test-taking strategy more than the material.

I emerged from the exam feeling absolutely drained but energized by the rush of tackling something so intense and coming out on top. In some ways it was fun but I have no intention of rewriting it any time soon. :)

Friday, 9 July 2010

Off to the races


This week I made my first ever trip to a race track. "The Races" are a perfect opportunity for the class-based British society to strut its stuff, with innumerable options for different seating areas and "enclosures", each slightly more prestigious than the next. A number of people seemed to have paid extra for the ability to watch the races from the center section in the middle of the course; I can't see that the seating was any better but they did get to be seen by everyone else walking across the track before each race. Somebody even arrived by helicopter, parking their aircraft (and leaving it all night—you can see it in the photo above) in the middle of the course!
It was fun though with everyone in their posh summer dresses, running back and forth from the track to the tables to the betting windows. And we even managed to win enough to just about cover admission, food, and drinks. Not bad for an evening's entertainment.

Among other things I learned that the odds on the favourites shorten closer to the race as more people bet on them. So if you're betting on a favourite, you should do so early and with one of the betting houses near the track, which pay out based on the odds printed on your ticket. Conversely, if you're not betting on a favourite, you should place your bet at the last minute or with the betting windows, which pay out odds based on the final total of bets placed (they make a risk-free killing from a straight mathematical percentage off the top). Also, the "three-way" bets, which cost double but pay out 125% for a win and 25% for a 2nd- or 3rd-place finish, seem like a good idea for decently-ranked (but non-favourite) horses with odds between, say, 5-1 and 7-1.
So of course, now that I have a fool-proof guaranteed system, we're going to have to head back more often to capitalize on it. Or... maybe I should quite while I'm ahead.

Friday, 2 July 2010

Seaside 3 "Release Candidate"

You could say it's been a long time coming.

Seaside 3.0 began ambitiously and grew from there. We began (at least I did) with the goal of cleaning up the architecture, revisiting each aspect and asking what could be simplified, clarified, or standardized. As functional layers were teased apart, suddenly pieces became unloadable and a repackaging effort got under way. From this we realized we could make the process of porting Seaside much less painful. Along the way, we lowered response times and reduced memory usage, added 10x the number of unit tests (1467 at last count), standardized code and improved code documentation, added jQuery support, and, oh, did you hear there's a book?

The result? This release runs leaner, on at least six Smalltalk platforms and is, I think, easier to learn, easier to use, and easier to extend. Seaside 3.0 is the best platform out there for developing complex, server-side web applications. Is it perfect? No, but I'll come to that part in a moment. It is the result of literally thousands of hours of work by a small group of people across all six platforms. But this release also exists only due to the generosity of Seaside users who tried it, filed bugs against it, submitted patches for it, and eventually deployed it.

Deployed it?! Yeah, you see, not only have all the commercial vendors chosen to ship our alphas and betas, but our users have also used them to put national-scale commercial projects into production. I alluded last month to a conference session I attended, in which somebody made the statement that
The best way to kill a product is to publicly announce a rewrite. Customers will immediately avoid investing in the "old" system like the plague, starving the product of all its revenue and eventually killing it.
It was a shocking moment as I realized we'd attempted just that. At first we justified the long release cycle because we were "doing a major rewrite"; then we just had "a lot more work to do". Eventually there were "just too many bugs" and things "just weren't stable enough". And, finally, once we realized we desperately needed to release and move forward, we just ran out of steam (no quotes there—we really did).

I still think the original architectural work needed doing and I'm really happy about where we ended up, but here's what I've learned:
  • When your wonderful, dedicated users start putting your code into production, they're telling you it's ready to be released. Listen to them.
  • We don't have the manpower to carry out the kind of QA process that goes along with an Development, Alpha, Beta, RC, Final release process.
  • We need to figure out how to get more users actively involved in the project. This could be by writing code but probably more importantly by writing documentation, improving usability, building releases, managing the website, doing graphical design, or something else entirely. The small core team simply can't handle it all.
Trying to apply these lessons over the past month, I asked for help from a few people (thank you!) and we closed some final bugs, ran through the functional tests, developed a brand new welcome screen, and managed to bundle everything up. We're releasing this today as 3.0RC.

We're not planning a standard multi-RC process. The "Release Candidate" simply signifies that you all have one last chance to download it, try it , and let us know about any major explosions before we do a final release, hopefully at the end of the month. From there we'll be reverting to a simpler process, using frequent point releases to fix bugs. 3.1 will have a smaller, better defined scope and a shorter cycle. I have some ideas but before we start thinking about that, we all need a breather.

I also have some ideas about the challenges that potential contributors to the project may face. But I'd like to hear your thoughts and experiences. So, if you have any suggestions or you'd like to help but something is stopping you, send me an email or (better yet if you're there) pull me aside at Camp Smalltalk London or ESUG and tell me about it.

Ok, ok. You've waited long enough—thank you. Here's the 3.0RC one-click image, based on Pharo 1.1 RC3 and Grease 1.0RC (just the image here). Dale has promised an updated Metacello definition soon. Enjoy!

Friday, 25 June 2010

The Trouble with Twitter

The thing about Twitter is it's so easy. Sitting down to write a blog post takes time and effort. I want to develop a thesis, establish a reasonable structure, and edit the thing until it flows and becomes a pleasure to read. Ignoring the time spent in advance thinking about the topic, a well-written non-trivial blog post might take me an hour to write (some have taken longer). As a result, I find it increasingly tempting to just dash off 140 characters and toss the result out to the masses.

The trouble is, if you have something to say and you want people to spend their time reading it, you really ought to take the time to craft a proper argument; it seems only fair. I would much rather read a handful of well-written, thought-provoking blog posts than a hundred trivial tweets. And besides, I actually enjoy the writing process.

I'm pretty confidant that some ideas are better suited for tweets and others for blog postss, but the line can be fuzzy. And the temptation of laziness persists so I'm going to need to increase the temptation of effort to counter it. In the meantime, I'll be on Twitter throwing out undeveloped thoughts with everyone else.

Saturday, 12 June 2010

This week's events


The VASt Forum in Stuttgart this week was well attended, with maybe 40 attendees. Unfortunately, as the presentations were all running long and I had to leave before the social event, there was quite limited time for discussion; but it was clear that most people were either past or existing Smalltalk users (though not necessarily current VASt customers). This, combined with the increasing regularity of Pharo sprints and the more than forty people who have already signed up for Camp Smalltalk London, seems to be a very good indication of the enthusiasm and growth in the Smalltalk community these days.

Attendance at the Irish Software Show in Dublin has been lower than we expected. My informal counts suggest about 60-80 people in attendance each day. Of interest to me was Wicket, which I had never looked at before; I was quite surprised to see how similar it is to Seaside in some respects and how similarly Andrew Lombardi, who was giving the presentation, described the framework's benefits and his joy when using it.

The web framework panel discussion had about 30 people watching and we had some good discussion there. Attendance at my Seaside talk was probably closer to 10. It would have been nice to have attracted more of the Java developers at the conference (there were about 20 people at the Wicket session earlier in the day) but it was interesting to find out that the majority of those who came had at least played with Smalltalk before.

Other interesting highlights include Kevin Noonan's talk on Clojure (seq's are much like Smalltalk's collection protocol but available on more classes), Matthew McCullough's presentation on Java debugging tools (interesting to see their progress and a also few ideas to look at ourselves), and Tim Berglund's overview of Gaelyk (reminds me disturbingly of writing PHP but the easy deployability and integration of XMPP, email, and Google Auth are cool). The speakers' dinner at the Odessa Club last night was great and we had a number of good discussions there as well.

The above photograph was humourously hung over the urinals in a restroom here in Dublin. I would have thought the slightly disturbing visual association was accidental if there hadn't been five separate copies!

Sunday, 6 June 2010

Berlin, product management, and Smalltalk events

Beach bars, cuba libres, bircher müsli. I'd forgotten how classically German these things are but it only takes being away for a few months to make them stand out again.

Thanks to the official un-organizers of Product Camp Berlin, yesterday was a very successful day of discussions and networking. Some interesting points for me were:
  • Kill a feature every day. That way people get used to the process and don't scream so loudly when support for features and platforms needs to be removed. This reminds me of the concepts of constant refactoring and non-ownership in software development, which helps ensure that people are similarly used to code going away.
  • The problem may be your pricing model. When products (in startups particularly) begin to flounder, there may be nothing wrong with the product itself. Sometimes a simple tweak of the pricing model can be the most effective solution.
  • The best way to unofficially kill a product is to publicly announce a "rewrite". Customers will avoid investing in the old system like the plague, rapidly starving the product of all its revenue.
  • It sounds like there are some interesting products on the way from Nokia.
  • This is my second conference since I actively started using twitter — it was not as well used this time but I still really like the technology for this sort of use case: it's great to see what you're missing, share your thoughts, and catch up with people after the event is over.
The weather was gorgeous in Berlin but has turned foul in southern Germany today. No big deal though as I've been slogging away indoors at my presentation for epicenter in Dublin on Thursday. I'm getting close with my slides and looking forward to the event but, before I can get that checked off my list, it's off to Stuttgart tomorrow evening for the VASt Forum.

[update: I've been offered 10 discount tickets for epicenter to give away; details here if you'd like to come see me in Dublin this week.]

For those wanting to attend the Camp Smalltalk London event on July 16-18, make sure you head over and sign up now. It's looking like we're going to fill up even our expanded capacity. If it's full by the time you get there, add yourself to the waiting list and we'll see what we can do.

Saturday, 29 May 2010

Camp Smalltalk is popular

When the UK Smalltalk User Group started planning the Camp Smalltalk London event a few weeks ago, we imagined we might get 20 people. After only four days, 30 have signed up and we're jumping to figure out how many more people are interested and how many more we can handle. There are certainly worse problems to have!

If you're still interested in attending, please do us a favour and add yourself to the waiting list at cslondon2010.eventbrite.com.