As you say - using numbers to judge people almost always ends badly if not tempered with common sense and other factors.
Personally I'm not a fan of efficiency metrics like what you describe. Does it really matter if one person spend 40 hours and fixed 20 bugs, and another spend 50 hours and fixed 20 bugs - if both fixed 20 bugs in a week, does it matter if both are on salary? Also if some guy is managing to fix 20 bugs in 30 hours, and yet he consumes 20 hours of other people's time while doing so, is that better or worse that the "less efficient" guy who worked totally alone and whose work always saves time for other people later on.
I have looked at the following, and loosely considered how much net value was being given to the team:
Development Metrics
how many features of what size is each person managing to finish in a phase or release. In waterfall, I'd be doing this phase by phase - high level design, detailed design, code/unit test - in Agile, it would be sprint by sprint. I usually have a general sense of features being small, medium or large. When using hindsight, I go with the size the feature turned out to be, vs. the size I thought it would be when we started.
how many times did each individual get a feature that grew on them? Some people are better than others at managing scope. Everyone gets a problem feature now and then that just can't be controlled, but some people seem to have the endlessly growing feature problem. If 9 out of 10 features grew from small to large for a single person, I start looking at whether that person is having problems controlling scope.
If possible, I'd say at all costs, avoid getting into SLOC per developer. Not only is it a time sink since it's really hard to figure this out in many complicated projects - it's also measuring the wrong thing. 1 elegant line of code is worth a 100 sloppy lines, and different features will require different approaches. SLOC per developer almost always drives you to a "more" = "better" stance, and that just ain't so. In many cases, I want to know that the developer took the code we had already and made it do something new with a minimum of rework and new development - that's what's going to make a great product in the long haul - and that's the guy who gets screwed by SLOC metrics - it takes serious time to elegantly add to a code base, and if you do it right, you add very few lines of code in the process.
Bug Metrics
Personally, I think I hold more stock in bug-related metrics than development metrics. Bugs can make or break a product, so I want to know that:
we are finding as many bugs as possible with as little effort as possible
when bugs are fixed, they don't introduce new bugs that have to be found and fixes
we introduce as few bugs as possible to the product when we add features to the product
no part of the product is so esoteric that only one guy can fix the bugs there
From that, I look at:
bugfix averages - preferably by type of bug - vs. individual averages. Is one guy only able to fix the softball, easy to see bugs? Is one guy only able to fix bugs in his area? is some guy taking forever to fix seemingly easy bugs? Is that bad or good? Sometimes the guy who takes the longest is also the guy who has significantly post-bug-fix-bugs, because he cleaned up an entire area that would have had 10 other bugs that are gone now...
bug fix rejection rate - who's getting a lot of returned bugs because they claimed to have fixed them, but subsequent testing showed the bug wasn't fixed?
buggy areas of code? - there's almost always a problem area, and you'll almost always know where it is, but check your metrics and see if its REALLY the area with the most bugs. Where are the bugs coming from, what are the most common types of bugs? This is looking more at the bugs on a case by case basis and seeing if any of them are something that the developer should have/could have dealt with prior to the test phase.
I strongly suggest that you center your metrics, as much as you can, on the product and not on the developers. It's easier to mine the metrics out of your tools that way, since your CM and your bug management system are concerned about your product, not your people. And it forces you to do some interpretation when reviewing developers - at every point, when you from a number to a person, you need to thing - what does this number really reflect and is this a really an indication of a problem with a person, or with the way the work is done?