What is an adequate amount of commentary on code?

Calina · 17-04-2014 5:50pm

moycullen14 wrote: »

It's just that in a long(ish) career here and in the UK, I have rarely come across it being done systematically. One of the few places I saw it had terrible problems
with it. Temper tantrums, dysfunctional behavior, the lot. It was down to a lack of coding standards, mainly, so everything was based on prejudice.

Maybe with API & Framework based systems, it should be less contentious.

One thing for sure, most environments would benefit hugely from code walk-throughs - just as long as I don't have to do it!

I worked in a highly specialised environment which almost certainly contributed to the culture, plus it had been the culture in the place pretty much since day one. If it's all you've ever known, maybe you never take it personally. I suspect if you bring in code review, people start to feel defensive about errors being found and do take it personally.

That being said, I can't see how you'd get an effective code review process working if there are no clear local standards; I guess the next problem is if you try to introduce them, that causes arguments as well.

Interestingly, while there is a lot of discussion about commentary in college courses I've attended since I started programming, the question of coding standards isn't something that has come up so much.

Sparks · 17-04-2014 6:11pm

moycullen14 wrote: »

Am I right in saying that code reviews is the one thing that has been empirically proven to improve quality/lower cost in development?

It's more accurate to say that it's one of the only methods that currently has been studied properly -- but that doesn't mean it's the only thing that works nor that it doesn't come with caveats.

You rarely come across it, though. I guess because if (when) it is badly managed, it causes a lot of problems.

That's one of the caveats

And I have seen some people who were obnoxiously unacceptably awful at it (to the point where whole teams quit because of them). Mind you, same's true for almost any method you encounter. Personally, I think it has a lot more to do with it's IBM-esque image, people seem to think of it the way they think of Waterfall. But if done right, it's actually highly effective, if expensive (your effectiveness in code review after an hour's work drops off dramatically so there's only so much code you can review at one time - not a surprising result if you look at projects like the linux kernel and the sizes of patches they work with).

fergalr · 17-04-2014 6:48pm

GreeBo wrote: »

The relevance is that if I have complex code that is being changed, I want a confidence level that nothing unexpected/unnoticed has been broken.

Unit tests give you this confidence level. I think there is always a benefit to having unit tests, sure there may be a cost, but there is a cost with any code you write, error handling for example, logging another, but we accept that cost because the benefits outweigh it.

My point is that the benefits of having a particular test does not always outweigh the costs of having that test.

This is obvious to me.

But here's an example. Lets say its a fairly small 2 person project working on some math code, for a video game we are writing. You want to calculate the error of a prediction, using squared error.
Lets say you know that code is only going to be called with positive double values in the range between -1 and 1.

public double getErrorInPredictions(double target, double prediction) {
    double distance = target - prediction;
    return distance * distance;
}

Should you write unit tests for this code?

Maybe we should write a test that puts in (5,7) and checks we get approximately 4 back?
And another that checks it works with whole number values? And that it works when target is negative? And when prediction is negative? And when both are negative?
Maybe we should write a whole battery of tests like that?

In general, no, we shouldn't. Maybe write one unit test, but probably zero, as long as there's a functional test elsewhere that we're reasonably happy drives this code (e.g. in our game, we can see the monster that uses this code intercept the player)

This is the same reason we don't write a comment like:

//the distance is the difference between the target and the prediction.

Even with simple tests, even if the test is simple, and the code being tested is simple, the tests add to the complexity of our codebase overall.

You mention error handling.
Its the same thing with error handling.

Its very important to know when to leave error handling out.
Newbie programmers do things like write error handling in every method they write, even when the method is only called by some other code they've just written - on the basis that, well, it could change in future.

By the time you do this, you've covered your code in error handling; you can no longer see what the code actually does; and that costs you more than it benefits.

Everything should be done only when assessing the cost and the benefit.

And, if you assert that every test's benefit outweighs its cost - or that every piece of error handling code's benefit outweighs its cost - or the same for comments - then I'm pretty sure you're wrong.

Again, cost benefits - if you are writing the code for the last-ditch killer-asteroid intercept mission, then you do things differently than if you are writing for a video game - obviously.

Stark · 17-04-2014 8:16pm

Very interesting points. I agree that 100% coverage is an unfeasible target for unit test coverage, there are always pieces of code where it's more hassle than it's worth to write tests around. I generally assign quite a high benefit to cost ratio for unit tests though (orders of magnitude higher than for comments), I think 80% - 90% line coverage is a good target depending on the code.

As for TDD, I quite like it myself, there is a focus on keeping things lean (you don't write "just in case" code) Strictly speaking TDD wouldn't advocate writing spurious tests for input conditions that can never happen as your example describes. So in that sense it's quite good. And by writing tests for tricky pieces of code up front you take redeploy out of "deploy, test, rewrite, redeploy" dev cycle which more than makes up for the cost of writing the test. Especially if your container is a pain to deploy to.

I'm generally in favour of the test pyramid as well (High number of unit tests -> Low number of system tests). It's a lot less painful to debug and fix an error when you catch it at source than when you're trying to debug something at a system level and not sure where the error's coming from. Don't think relying on a functional test like does the monster move in the right direction is a good way of verifying that the basic components of your physics engine work correctly, sounds like a recipe for lots of hair pulling.

Edit: Reading that "Making software work" book at the moment (God bless Safari Books). Quite wittily written:

Regardless of the reporting quality of the TDD trials, a related question is raised: “Should the textbook definition of TDD be followed in all real-life cases?” Sometimes patients get better even with a half-sized or quarter-sized pill modified for their specific work context and personal style.

Word. My personal belief is the correct TDD dosage is somewhat less than what a typical manager might prescribe.

The overall conclusion from the book seems to be inconclusive though.

GreeBo · 18-04-2014 8:39am

fergalr wrote: »
My point is that the benefits of having a particular test does not always outweigh the costs of having that test.

This is obvious to me.

But here's an example. Lets say its a fairly small 2 person project working on some math code, for a video game we are writing. You want to calculate the error of a prediction, using squared error.
Lets say you know that code is only going to be called with positive double values in the range between -1 and 1.
public double getErrorInPredictions(double target, double prediction) {
    double distance = target - prediction;
    return distance * distance;
}
Should you write unit tests for this code?

Maybe we should write a test that puts in (5,7) and checks we get approximately 4 back?
And another that checks it works with whole number values? And that it works when target is negative? And when prediction is negative? And when both are negative?
Maybe we should write a whole battery of tests like that?

In general, no, we shouldn't. Maybe write one unit test, but probably zero, as long as there's a functional test elsewhere that we're reasonably happy drives this code (e.g. in our game, we can see the monster that uses this code intercept the player)

This is the same reason we don't write a comment like:

Even with simple tests, even if the test is simple, and the code being tested is simple, the tests add to the complexity of our codebase overall.

You mention error handling.
Its the same thing with error handling.

Its very important to know when to leave error handling out.
Newbie programmers do things like write error handling in every method they write, even when the method is only called by some other code they've just written - on the basis that, well, it could change in future.

By the time you do this, you've covered your code in error handling; you can no longer see what the code actually does; and that costs you more than it benefits.

Everything should be done only when assessing the cost and the benefit.

And, if you assert that every test's benefit outweighs its cost - or that every piece of error handling code's benefit outweighs its cost - or the same for comments - then I'm pretty sure you're wrong.

Again, cost benefits - if you are writing the code for the last-ditch killer-asteroid intercept mission, then you do things differently than if you are writing for a video game - obviously.

Honestly I think your example is flawed, why would you test that subtraction works? Unit tests should be testing your (complicated) logic to confirm changes haven't broken it, not that the compiler our runtime still works as expected.

With error handling, if my code needs to do something in the case of an error then I handle it, that might mean throw an exception or maybe close resources etc, but its the job of my code, not just rely on fixing it later when it's a problem, so I disagree with you strongly here tbh.

Even with costs benefits, it doesn't have to be mission critical, for your company it is critical, you can't just trot out the old "no one died" when you discover a bug in production...

Stark · 18-04-2014 9:35am

GreeBo wrote: »

With error handling, if my code needs to do something in the case of an error then I handle it, that might mean throw an exception or maybe close resources etc, but its the job of my code, not just rely on fixing it later when it's a problem, so I disagree with you strongly here tbh.

I agree strongly with fergalr here. Pedantically handling every possible error case can make code unreadable and slow down development time dramatically. You can't ignore error cases and always program for the happy path either but there is a balance to be struck. The reason we have exceptions in the first place is so we can bubble error conditions up the stack when we don't want to handle everything at a low level.

GreeBo wrote: »

Even with costs benefits, it doesn't have to be mission critical, for your company it is critical, you can't just trot out the old "no one died" when you discover a bug in production...

You can't catch every single bug in a product. If it takes 3 months to get something out that's 95% bug free vs 6 months to get something out that's 100% bug free, it's generally better to get it out in 3 months and use the revenue gained to do follow up fixes. A single bug in production? Yeah I'd happily trot out "no-one died".

I would disagree somewhat with fergalr on the unit testing strategy. In my experience an experienced project team and can code out with good coverage as quickly as code without coverage by leveraging the solid foundations approach to building a codebase quickly. So the cost is actually low if any. Of course it depends on developer experience. Inexperienced developers often write bad tests the first time they attempt unit testing and that comes with a cost just as writing bad production code comes with a cost. But any good developer should learn to write good tests.

Sparks · 18-04-2014 12:41pm

Stark wrote: »

You can't catch every single bug in a product.

Technically, you can (effectively at least). However it's expensive so few do it. But "you can't" and "you don't want to pay for it" are different things.

As to not handling errors... folks, please don't ever write a network stack or anything else low-level? M'kay? Life's hard enough for the rest of us as it is.

IRLConor · 18-04-2014 6:18pm

Stark wrote: »

I agree that 100% coverage is an unfeasible target for unit test coverage, there are always pieces of code where it's more hassle than it's worth to write tests around.

IMHO, that's a sign that you might have a design failure. Code that's hard to test is a big red flag for me.

Yes, some languages will force you to write code in such a way that you end up with lots of hard-to-test corners but again that's less of an excuse for low coverage and more of a reason to avoid that language in the future.

fergalr · 18-04-2014 8:48pm

GreeBo wrote: »

Honestly I think your example is flawed, why would you test that subtraction works?

The example is an example of a method that I would not test.

Because the complexity after adding a test, outweighs the benefit of testing this code.

Once we all agree that such examples occur, we then are just arguing about where to draw the line. And thats my point - whether or not to test - much like whether or not to comment - is a judgement call, which depends on the context.

(Secondly: The method calculated squared error, not subtraction.)

GreeBo wrote: »

Unit tests should be testing your (complicated) logic to confirm changes haven't broken it, not that the compiler our runtime still works as expected.

Sure - but there's no clear and bright line between those two things.

There's a sliding scale of method complexity. I think its obvious that the method I wrote doesn't need to be testing - i.e. that, as long as the compiler is correct, its going to work (or I can be sure enough of that that I dont need to test.)

But, as I add lines to the method, I'll eventually reach a point of complexity where that's not the case, and where it makes sense to introduce a test.

I should not test before that point, and I should test afterwards.

The exact point will depend on context - what I'm building, the cost of bugs, the cost of complexity, etc.

GreeBo wrote: »

With error handling, if my code needs to do something in the case of an error then I handle it, that might mean throw an exception or maybe close resources etc, but its the job of my code, not just rely on fixing it later when it's a problem, so I disagree with you strongly here tbh.

If you've a method A, that is only called from one other place in your video game, which you've just wrote, do you always write error handling in method A, for the case where its given bad parameters? I often don't, if its a small self contained project.

fergalr · 18-04-2014 8:52pm

GreeBo wrote: »

Even with costs benefits, it doesn't have to be mission critical, for your company it is critical, you can't just trot out the old "no one died" when you discover a bug in production...

This is a silly thing to say.

Of course you can trot out the 'no one died' reason when you discover a bug in production.

We would write our code VERY DIFFERENTLY if production bugs were likely to result in deaths.

Formal verification, intense code review, formal specification, formal QA, checking multiple independent implementations for agreement - that sort of thing. Right? Hopefully this is obvious?

We don't use those techniques making our video games, or our crud apps or whatever, because the costs do not outweigh the benefits.

But of course, if you only had 8 hours to write the asteroid intercept code, or everyone died, then you'd be back to cutting corners again. Because, again, cost/benefit. Which depends on context.

fergalr · 18-04-2014 8:59pm

Sparks wrote: »

Technically, you can (effectively at least). However it's expensive so few do it. But "you can't" and "you don't want to pay for it" are different things.

You can increase your confidence, but you can't ever be sure - which you are acknowledging when you say 'effectively at least' - so agreed there.

Sparks wrote: »

As to not handling errors... folks, please don't ever write a network stack or anything else low-level? M'kay? Life's hard enough for the rest of us as it is.

It depends, it depends... depends on why you are writing it, whats the context... is it a proof of concept? For a new network analysis startup? Or the prototype new netcode that has to ship in a week? For an academic paper? Or is it going to be embedded into firmware? Maybe its the low level code that safeguards the nukes?

You can't be dogmatic about this stuff. Maybe sometimes you want 'defense in depth' where every part of the system distrusts every other part, and checks the input it gets; other times that'll screw up the code so that its hard to see the logic of what your system is doing.

What is considered best practice changes all the time - SaaS has changed a lot - you can fix an internal bug almost instantly, you don't have to ship CDs.

Its like, when Nintendo used to ship a game on the SNES, they'd have to have found almost all the big bugs; now you can patch your game after release; changes like that in your deployment have got to impact how you handle and respond to errors, if you are doing things right.

fergalr · 18-04-2014 9:08pm

IRLConor wrote: »

Stark wrote: »

I agree that 100% coverage is an unfeasible target for unit test coverage, there are always pieces of code where it's more hassle than it's worth to write tests around.

IMHO, that's a sign that you might have a design failure. Code that's hard to test is a big red flag for me.

I don't even agree with this way of thinking about 'coverage'. What does '100% coverage' even mean? It might mean that every line of code is exercised by a test - but it doesn't mean that you have tested 100% of the possible states the system can be in.

States are what we care about, not SLOC. And we can't test all the states, in any non-trivial stateful system.*

I mean, you can always move to a functional setup, then you can actual start to think about those thinks. Maybe you are saying not using a functional setup is a design failure, Conor? If so, I have a badge you can have...

*Famous CS paper: "Out of the tar pit" explains this stuff in great detail - not so hot on the second part of the paper explaining functional relational programming, but the intro to the problem is excellent, including great discussion on why tests only get you so far. http://shaffner.us/cs/papers/tarpit.pdf

Even just read page 4 - so nice.

c_man · 18-04-2014 9:40pm

I was thinking of this thread today. A bug has brought me back to a piece of code I wrote when I was pretty green

and for some reason I remember it pretty well. Anyways it's been made a further balls of since then from reviewing it... As a recent skeptic-turned-believer in UT and quasi-TDD, I applied to rewriting the bad boy! Grand, got a nice change going and had tests covering the one case that was reported failing and a few of the more common states it would encounter. So with production code changes done I started cranking out tests cases for all the other states... a bit in, I gave up. I couldn't be arsed. I don't think these states need to be covered.

I just know this is going to be commented upon in code review, and I have no good answer bar I don't think it worth covering the other combos

IRLConor · 18-04-2014 10:24pm

fergalr wrote: »

I don't even agree with this way of thinking about 'coverage'. What does '100% coverage' even mean? It might mean that every line of code is exercised by a test - but it doesn't mean that you have tested 100% of the possible states the system can be in.

States are what we care about, not SLOC. And we can't test all the states, in any non-trivial stateful system.*

I know. That's why I look at 80% line coverage as "at least 20% of this code is entirely untested". 100% line coverage is a starting point, not a goal.

It's also why I'm more interested in testable code rather than the tests themselves. After all, you're going to find bugs in your code and you're going to have to write some sort of tests to isolate and understand them so you may as well help yourself up front. More testable code leads to shorter time to resolution.

fergalr wrote: »

I mean, you can always move to a functional setup, then you can actual start to think about those thinks. Maybe you are saying not using a functional setup is a design failure, Conor? If so, I have a badge you can have...

I'm not a member of the Haskell Justice League, thanks.

You can write functional code in damn near any language. In very many cases it's the correct decision too. I just don't like being forced to do it.

GreeBo · 18-04-2014 10:35pm

Testing branches is far more important than testing lines of code, branches imply business logic, lines of code imply nothing
You could have 60% line coverage and not be testing anything bar your compiler / runtime.

Stark · 19-04-2014 12:13am

IRLConor wrote: »

IMHO, that's a sign that you might have a design failure. Code that's hard to test is a big red flag for me.

Yes, some languages will force you to write code in such a way that you end up with lots of hard-to-test corners but again that's less of an excuse for low coverage and more of a reason to avoid that language in the future.

Case in point I had a logger initialised in a static block in a class and a call to that logger as the only line in a method (it was part of an error handling interface). I was in a pedantic mood that day and I wanted 100% coverage so ended up adding powermock as a dependency and using the static class intercept functionality just for that one method. Successfully achieved full coverage but it was a silly waste of time and increase in complexity for the sake of chasing a magic number really.

Sparks · 19-04-2014 12:16am

fergalr wrote: »

You can increase your confidence, but you can't ever be sure

Just on that point, we don't actually know that yet (and most of the Formal Methods research field would disagree with you, they just can't prove their case yet either

)

fergalr · 19-04-2014 12:51am

Sparks wrote: »

Just on that point, we don't actually know that yet (and most of the Formal Methods research field would disagree with you, they just can't prove their case yet either )

We start to get into tricky territory, going too far down this road - which I think is what you are alluding to.

Its like, the physicists estimating the risk of a black hole from the LHC, put it at some extremely low number - lets say 1 in 10^50 or something; but someone else points out that we can't use that as a lower bound, because its now much more probable that the lower bound on our certainty is determined by things like [all the physicists being wrong about their paradigm] or [some subtle mistake that has never been spotted in the theoretical framework] rather than [the actual figure provided by the theoretical framework].

And you can never get away from things like that; its kind of a bootstrapping problem...

Sparks · 19-04-2014 4:05am

True for physics where we're guessing at the rules and checking our guesses and the system wasn't our invention. Formal methods goes the other way - we invented the system, proved the fundamentals and every step thereafter which is proven is basically irrefutable (there's no error bars involved - a mathematical proof is an all-or-nothing sort of thing for various reasons relating to the nature of maths and follow that rabbit hole into metaphysical argument only if you have nothing better to be doing with your time and have a bout of insomnia). Since FM's basic thesis is to find a way to create a mathematical expression of a given program to prove its correctness, if you could pull it off, you could then state that there are no bugs within the program without any qualifiers implied or explicit.

Of course, FM hasn't yet developed methods or tools that let them do that on anything past a fairly trivial level yet and it's an open question if it's even possible, but thus far nobody's been able to prove that it cannot be done, only that it's desperately difficult (so was landing people on the moon in 1950, but we did it faster than someone in 1950 could produce a new voter, so saying something's impossible just because it's difficult should be something we've all learnt to avoid doing by now

)

fergalr · 19-04-2014 5:13pm

Sparks wrote: »

True for physics where we're guessing at the rules and checking our guesses and the system wasn't our invention. Formal methods goes the other way - we invented the system, proved the fundamentals and every step thereafter which is proven is basically irrefutable (there's no error bars involved - a mathematical proof is an all-or-nothing sort of thing for various reasons relating to the nature of maths and follow that rabbit hole into metaphysical argument only if you have nothing better to be doing with your time and have a bout of insomnia). Since FM's basic thesis is to find a way to create a mathematical expression of a given program to prove its correctness, if you could pull it off, you could then state that there are no bugs within the program without any qualifiers implied or explicit.

Hmm - still not sure I agree here.

Even if you prove something correct, there's still going to be gaps between [your mathematical model of the world] and [the world itself]. As long as the program has to interface with something external to itself, you'll have the question of bugs, even with very strong formal methods.

I mean, also, at a practical level, people rarely *really* ever prove anything. Even mathematicians. I know machine checked proofs and so on attempts to eventually get away from this - but its hard to do anything other than the most simple stuff, when working at that level of abstraction. Maybe some day with strongAI we'll be able to apply that sort of technology to useful problems. Then we'll just have the problem above of gaps to address. But until then, its all just about sufficiently convincing proof sketches, which have their problems, as shown in this excellent example:

http://googleresearch.blogspot.ie/2006/06/extra-extra-read-all-about-it-nearly.html

It is not sufficient merely to prove a program correct; you have to test it too. Moreover, to be really certain that a program is correct, you have to test it for all possible input values, but this is seldom feasible. With concurrent programs, it's even worse: You have to test for all internal states, which is, for all practical purposes, impossible.

fergalr · 29-04-2014 5:24pm

Here is mr rails talking about tests damaging designs etc; presumably now it'll be much easier to get people to believe testing isn't free, so thats good:

http://david.heinemeierhansson.com/2014/test-induced-design-damage.html

Giblet · 30-04-2014 7:59am

fergalr wrote: »
Lets say you know that code is only going to be called with positive double values in the range between -1 and 1.
public double getErrorInPredictions(double target, double prediction) {
    double distance = target - prediction;
    return distance * distance;
}

You say you know it's going to be called with a range of -1 and 1 but you don't guard against this. A simple unit test would have helped you write this method better. It's a public method, so you don't know the caller.

It may sound obvious, but the error will probably crash the game, not return a glitchy result.

getErrorInPredictions(Double.MinValue,1)

I would argue that you probably don't need to write the tests first, but you need to refactor in some tests at some point.

DHH above makes a point about TDD in rails, and controllers. I agree with him, in ASP MVC I do not unit test controllers either (bear in mind he still suggests integration tests!), but only because my controllers don't do anything that hasn't been covered already, so to test them is to retest what I've already tested and testing private methods doesn't make sense in a lot of cases.

fergalr · 30-04-2014 6:34pm

Giblet wrote: »

You say you know it's going to be called with a range of -1 and 1 but you don't guard against this.

Think about what you just wrote for a second.

Of course you shouldn't guard against things you know.

Its the same reason you don't write a test to check integer assignment.

You should guard against assumptions you have that might be wrong, where the chance of something violating the assumption, combined with the impact of the assumption being violated, outweighs the cost of checking the assumption. Sure.

But if you know the method is only going to be called with a certain range, you don't guard against it.

Programmers reading this might be thinking "Oh, haha, well, that's naive. You haven't worked in the real world. In the real world, your assumptions get violated all the time, because Johnny who inherits your code won't know your assumptions. blah blah"

Well, its naive to be absolutist about these things, either direction. It depends on the situation. In my example, it was a 2 person project shipping a game. So, you are either the programmer writing the code that calls that method, or someone you work closely with is. And your game is going to ship or not ship soon, and probably that method will be thrown away next week anyway.

So, if you write error handling the guards everything against everything, you just take too long to develop anything, and you've made the wrong cost/benefit tradeoff, and you are being a suboptimal developer.

Again, the point I've been making, is that you'd make different decisions for the code you write for the asteroid intercept mission, and different decisions again, for the same mission, but if your launch window is closing in 24 hours.

Giblet wrote: »

A simple unit test would have helped you write this method better. It's a public method, so you don't know the caller.

Public and private is just a way of formalising in code, what you do and don't know about who'll call your code. Its a way of putting boundaries around your assumptions, so you don't drown in error checking and documentation of interfaces.

But its an imperfect capture of what programmers actually know in their head. It can't precisely model those risk boundaries, which have to do with things like organisational structure, which programmers are writing which code, passage of time, which code is more fertile ground for bugs etc.

So its wrong to say 'oh, its a public method, so you dont know the caller'. The knowledge of whether or not I know the caller comes first. The public method is just an expression of that - an attempt to capture it. You've gotten the causality mixed up - your tail is wagging your dog.

Giblet wrote: »

It may sound obvious, but the error will probably crash the game, not return a glitchy result.

getErrorInPredictions(Double.MinValue,1)

Except it wont crash the game because that wont happen.

Its like the karate kid. Best block is not be there.

Best way to handle errors is not have errors.

In practice you can't always do this, OF COURSE.
There are interfaces where you don't have control over them, or where they are interacting with some fluid part of the system, or something where you can't trust what comes into the interface.

For everything else, there's [just assuming things are correct].

int x = 50;
int y = 20;


if ((x*y) > 50000) {
    //handle error
}

See the code above in a CRUD app? Fire the programmer.

See the code above in your micro satellite code? Well, you probably want to have a better discussion about how you handle cosmic radiation interference, but its a discussion, not a termination interview.

Giblet wrote: »

I would argue that you probably don't need to write the tests first, but you need to refactor in some tests at some point.

Yes, generally 'some tests at some point' is very easy to get behind. I agree.

Giblet wrote: »

DHH above makes a point about TDD in rails, and controllers. I agree with him, in ASP MVC I do not unit test controllers either (bear in mind he still suggests integration tests!), but only because my controllers don't do anything that hasn't been covered already, so to test them is to retest what I've already tested and testing private methods doesn't make sense in a lot of cases.

You see - its all about knowledge boundaries. Its not about public and private or about testing everything or tests being free. Its about cost vs benefit, and basic risk analysis.

Sparks · 30-04-2014 7:11pm

fergalr wrote: »

Think about what you just wrote for a second.
Of course you shouldn't guard against things you know.

Well, strictly the problem there isn't the words you highlighted, but the one I've underlined. And the example you've given to say it's not always so is incredibly esoteric btw - two people working on a small chunk of code with a one-shot-and-forget ship is a really astonishingly rare sort of project. The vast vast majority of coding doesn't work that way. And frankly, even if there was just one other coder on the project, I'm still sticking an assert in there because paranoia's a job skill

fergalr · 30-04-2014 7:34pm

Sparks wrote: »

Well, strictly the problem there isn't the words you highlighted, but the one I've underlined.

If you know a method will only be called with certain parameters, then that's all there is to it.

It doesn't matter what other people know, that's a second order thing that's fully captured, for the purposes of the decision about whether to write a guard, in your estimate of the interface being violated.

Sure, you might be wrong, but then you didn't really know what you thought you did. But that's all you need to capture the decision to reason correct about in probabilistically. What you should pay to avoid the risk is the cost of it happening times the estimated probability of it happening.

Might be fun to formalise all that, in this context, but someone has probably done it already.

Sparks wrote: »

And the example you've given to say it's not always so is incredibly esoteric btw - two people working on a small chunk of code with a one-shot-and-forget ship is a really astonishingly rare sort of project.

You are overstating that. Its certainly not 'incredibly esoteric', for two people to work on a small chunk of code, that might be shipped, or canned etc.

It mightn't be the bulk of programming, but it happens a lot. It even happens within big organisations.

Now, all the big projects you work on might have started out as badly designed small projects which everyone thought would be deleted next year, but now power the economies of nations, and, uh-oh.

But that doesn't mean that most small projects are going to go on to power the economies of nations. (thats prosecutors fallacy of course). And you can't set out to write small projects, as if that's what they were all likely to do - not if you want them to succeed with the resources they have.

Sparks wrote: »

The vast vast majority of coding doesn't work that way. And frankly, even if there was just one other coder on the project, I'm still sticking an assert in there because paranoia's a job skill

Its a balance though - the best choice isn't the paranoid one, in a lot of circumstances. Yes, that totally depends on the organisation, the context, etc, but thats my point - I'm not talking about extreme edge cases.

oscarBravo · 01-05-2014 9:18am

fergalr wrote: »

Its a balance though - the best choice isn't the paranoid one, in a lot of circumstances. Yes, that totally depends on the organisation, the context, etc, but thats my point - I'm not talking about extreme edge cases.

What would it cost to stick an "assert" in there, though? If you know for a fact that the parameters will always be in range, then there are no side effects of the assertion, and you've covered yourself if what you know for a fact ever changes in the future.

I've made assumptions in code because I knew for a fact what the bounds of the function arguments were. Those assumptions have come back to bite me five years later when, due to the scope of the overall project having moved on, the assumptions were no longer true. I'd rather have to deal with an "AssertionError at line x" than a subtle "wat" situation that requires extensive debugging. And I'm speaking as someone who has been a solo developer for the best part of a decade; I only have my own assumptions to deal with.

GreeBo · 01-05-2014 1:42pm

fergalr wrote: »

If you know a method will only be called with certain parameters, then that's all there is to it.

I dont understand how you can state that you "know" it wont be called with negative values?
Its public...in 4 years time it could be living as some compiled jar somewhere that someone else starts to use...how do you know whats going to happen in the future?

Honestly I think your previous answer of "you havent worked in the real world" is correct, in the real world anyone can call a public method with anything, sometimes 10 years after you originally wrote it..

Stark · 01-05-2014 2:47pm

GreeBo wrote: »

I dont understand how you can state that you "know" it wont be called with negative values?
Its public...in 4 years time it could be living as some compiled jar somewhere that someone else starts to use...how do you know whats going to happen in the future?

Honestly I think your previous answer of "you havent worked in the real world" is correct, in the real world anyone can call a public method with anything, sometimes 10 years after you originally wrote it..

Ah here now. The public modifier only means that the method is callable by methods outside the class. It doesn't mean "accessible by anyone anywhere". Most of the public methods in our codebases are only called from private methods within classes in the same project, not accessible from outside the project at all. And if someone does decide to hack around things and instantiate that class directly and start calling methods on it, well they deserve what they get.

Now I'm not advocating that we should be flippant about making assumptions, but making a judgement call as to how low level or high level a method is purely based on the visibility modifier is a bit silly. You can't know at what level the method is going to be used without more context. If the method was a high level method, yes put in guards. If the method is a low level method and there are guards higher up the stack, then no point in repeating those guards.

Eg:

class UserService {
   public User getUser(String userId) {
      checkNotNull(userId);
      return userDao.getUser(userId);
   }
}

class UserDao {
   public User getUser(String userId) {
      checkNotNull(userId);    // Do we really need it again!?
      return blahBlahBlah();
   }
}

Giblet · 01-05-2014 2:49pm

I never ever use public to mean "the actual public"...

Callable by anyone of course assumes it's part of a library being consumed, even internally, and the call is not encapsulated.

fergalr · 01-05-2014 5:19pm

oscarBravo wrote: »

What would it cost to stick an "assert" in there, though? If you know for a fact that the parameters will always be in range, then there are no side effects of the assertion, and you've covered yourself if what you know for a fact ever changes in the future.

Sure, 'asserts' are often very cheap, and, as such, worth putting in.

Still not free, though.

If you've data coming through your system, and you assert the same thing about that data at 5 different points as it goes down through your system, then, if you want to change something about the data, you've got 5 asserts to change.

E.g. if the range you want to support changes from [-1 to 1] to [-10 to 10]

Of course, that depends on what you chose to assert - did you assert the incoming number was between [-1 and 1] (the domain defined allowable values) or maybe between a bigger range, the largest range of values for which the function thats the limiting factor on the range of values could work? Or maybe you chose some combination between the two? The largest range that you could ever see making sense, but that is still well within the range for which the function will work correctly?

The more generous your assert is, in terms of what it allows through, then the more scope you'll have to change the behaviour of the inner functions, without having the change your asserts. Of course, the more generous the assert is, the less it is doing.

oscarBravo wrote: »

I've made assumptions in code because I knew for a fact what the bounds of the function arguments were. Those assumptions have come back to bite me five years later when, due to the scope of the overall project having moved on, the assumptions were no longer true. I'd rather have to deal with an "AssertionError at line x" than a subtle "wat" situation that requires extensive debugging. And I'm speaking as someone who has been a solo developer for the best part of a decade; I only have my own assumptions to deal with.

Yeah, we've all done this. That doesn't mean we should put guards or asserts everywhere though. It depends.

This is similar to the example Stark gave below, but you know, lets say you are the UI code, do you check the username coming from the database is in canonical form? Or do you assume the username will have been canonicalised before it goes in the database?

If you do the former, you're slightly 'safer' in some sense - its more 'defence in depth'. You are certainly safer if someone violates the assumptions for the database.
But if you put such checks everywhere, at every level - and especially if you try and handle the error, rather than just asserting (sometimes 'asserting' doesn't make sense for your production system (but thats another discussion)) then you can end up with code that has a mess of unnecessary error handling.

Whether you should do that depends on how likely you think the code will be running in 5 years, how likely it'll be you have to change all your asserts, the costs each way, etc.

fergalr · 01-05-2014 5:23pm

GreeBo wrote: »

I dont understand how you can state that you "know" it wont be called with negative values?
Its public...in 4 years time it could be living as some compiled jar somewhere that someone else starts to use...how do you know whats going to happen in the future?

Honestly I think your previous answer of "you havent worked in the real world" is correct, in the real world anyone can call a public method with anything, sometimes 10 years after you originally wrote it..

Stark's post answers this really well, in my opinion.

Like, 'knowing X' is believing that the probability of X being true is very high. Depending on how high that probability is, and the costs either way, you make your judgement call.

If you get dogmatic about it in either direction, you get bad code.

One way, you end up with fragile code that isn't robust against unanticipated user input, say.
The other way, you end up with code that checks the username is correct at every single step in the system; if you have enough such checks to do, when you look at your code, instead of simple clear methods, you see 20 lines of error checking and handling for a method that then does 3 lines of actual work, and you see this for all of your methods everywhere, and the code can't be easily changed.

fergalr · 01-05-2014 5:27pm

Giblet wrote: »

I never ever use public to mean "the actual public"...

Callable by anyone of course assumes it's part of a library being consumed, even internally, and the call is not encapsulated.

Not 100% clear on what you are saying there.

In practice, in real systems, methods that are 'public' are often encapsulated by the rest of the system, and aren't exposed to people who will (e.g. adversarially) abuse their implicit interfaces.

But you generally shouldn't write every 'public' method as if it was a library call, with the robust level of error handling, checking and documentation that that would involve.

I mean, this might not be the case if you were writing code that you knew wasn't going to change over the course of your project, and you knew all that work making your code library quality made sense, and there was a big importance of defence in depth because of your domain etc.

Stark · 01-05-2014 5:31pm

A lot of this discussion reminds me of the arguments around checked exceptions vs unchecked exceptions:

http://tutorials.jenkov.com/java-exception-handling/checked-or-unchecked-exceptions.html

In short: for checked exceptions: everything's explicit, against: polluting your stack with throws declarations, makes it difficult to later make changes to your code.

Sparks · 01-05-2014 6:10pm

fergalr wrote: »

If you've data coming through your system, and you assert the same thing about that data at 5 different points as it goes down through your system, then, if you want to change something about the data, you've got 5 asserts to change.

Well, yes, that's the point.
(a) Changing the data is changing the design and that's not free so you expect a cost; and (b) the point of the asserts is that you change the code 5 times, not 4!

Whether you should do that depends on how likely you think the code will be running in 5 years, how likely it'll be you have to change all your asserts, the costs each way, etc.

Well, yes, but games programming aside, not many of us have the luxury of assuming our code has a short lifespan and won't ever get repurposed, relinked, have its internal APIs abused by direct access, or worst yet, ctrl-c-ctrl-v'd into something else along with the lovely comment identifying us as the original author but not the idiot who made a patchwork function without knowing the original assumptions.

Stark · 01-05-2014 7:01pm

Sparks wrote: »

Well, yes, that's the point.
(a) Changing the data is changing the design and that's not free so you expect a cost; and (b) the point of the asserts is that you change the code 5 times, not 4!

Why exactly? Good code should have separation of concerns and be easy to change.

If the business for example specifies that the largest payment a payment system should be able to process is say 500 euro to safeguard against an administrator making a mistake and bankrupting the company, then that magic 500 number should appear in at most a couple of places in the code. If in the future the business decides due to inflation that that number should be 750, the programmer shouldn't be turning around and saying "sorry, we deliberately made the cost of changing that validation high, we'll have to touch every part of our codebase and regress everything". It shouldn't be your concern if someone down the line takes your code and decides to make a payment system that processes 10,000 euro payments out of it. Your concern is making code that works well for its current purpose and is easy to change if requirements change in the future.

Sparks · 01-05-2014 7:03pm

Stark wrote: »

Why exactly? Good code should have separation of concerns and be easy to change.

Yes, it should and that's why we have things like APIs and interfaces and so on; and we're talking about changing those interfaces here, not tweaking a magic number inside a module (which is a whole other can of rotting maggots).

Stark · 01-05-2014 7:12pm

Sparks wrote: »

with the lovely comment identifying us as the original author but not the idiot who made a patchwork function without knowing the original assumptions.

Tbh, this is a good argument against having silly comments to say who wrote a piece of code. The original author of a piece of code often isn't the maintainer and it's wasteful to comment the code with audit information when we have SCM systems to indicate who wrote a file or changed a file.

Sparks · 01-05-2014 7:59pm

Stark wrote: »

Tbh, this is a good argument against having silly comments to say who wrote a piece of code.

I don't write such comments, I leave it to version control to do that.
I'm talking about our V%d[Esc]:bnp'er writing a comment to say "based on X's code here:" or something slimely equivalent before jumping ship (yes, it's happened to me).

fergalr · 01-05-2014 8:39pm

Sparks wrote: »

Well, yes, that's the point.
(a) Changing the data is changing the design and that's not free so you expect a cost;

Absolutely - but what I'm saying is, that there is a benefit to minimising the cost of changes, and a cost to anything that increases the cost of changes.

In many situations, where codebase is dynamic, certain tests/guards/etc have a cost outweighing the benefit.

Other tests/guards/etc are even more valuable when the codebase is dynamic.

Sparks wrote: »

and (b) the point of the asserts is that you change the code 5 times, not 4!

And asserts are often great, for just that reason. I'm not saying 'never use asserts'. I'm saying 'think about whether the particular guard is worthwhile'.

Again, the overarching point I'm trying to make, is that:

Bad: "Its the right/best thing to do to write guards on any public method, so thats why I'm going to write a guard here."

Good: "Ok, I've wrote this public method. Does it make sense to write a guard in this case? Is it likely to be worth doing here, given what I know about how this is likely to be used?"

Sparks wrote: »

Well, yes, but games programming aside, not many of us have the luxury of assuming our code has a short lifespan and won't ever get repurposed, relinked, have its internal APIs abused by direct access, or worst yet, ctrl-c-ctrl-v'd into something else along with the lovely comment identifying us as the original author but not the idiot who made a patchwork function without knowing the original assumptions.

Sure, if you are in a situation where the code is going to be reused a lot, you do things differently.

Still, though, even in enterprise settings, as a project evolves, it'd be interesting to know what percentage of code thats written actually survives a long time. I remember working in enterprise settings, a surprising amount of projects and code were discarded or deleted within in a few months after they were written.

Sparks · 01-05-2014 10:12pm

fergalr wrote: »

I remember working in enterprise settings, a surprising amount of projects and code were discarded or deleted within in a few months after they were written.

Yeah, but on the other hand I work with two programmers - out of a team of five, and they're not fresh from college - who aren't as old as some of the code they've worked on. The rule for things like this is that the longer something's been around, the further away its end-of-life is.

fergalr · 01-05-2014 11:18pm

Sparks wrote: »

The rule for things like this is that the longer something's been around, the further away its end-of-life is.

Actually, kind of a technical point, but that's not actually incompatible with thinking that most code been written is going to be thrown away soon.
Might seem paradoxical, but lots of things are - totally logically - like that.

This is making me curious about whether people have studied these things - the distribution of the lifetime of a line of code, etc.

oscarBravo · 01-05-2014 11:32pm

fergalr wrote: »

E.g. if the range you want to support changes from [-1 to 1] to [-10 to 10]

Just zooming in on this: the only reason I'd assert a range of values for a parameter is because the function's logic makes assumptions based on that contract. If the function is written in such a way as not to make any difference - the example function you provided is a case in point - then there's nothing to assert, because it will just work the same way.

I worked on some code this week that, inter alia, does duration calculations on time values. If I can be confident that the first argument will always be earlier than the second, I can take some shortcuts in how the code is written. But if the arguments are ever the other way around, those shortcuts break the code.

I'm making the case that in such circumstances, the correct approach is to assert the parameter ordering at the very start of the function. Then, in two years' time, when I decide to call the function with arbitrary times (forgetting the assumptions), instead of getting some really strange results that I have to hunt down, I get an assertion error, which reminds me of the need to either (a) pass the parameters in the correct order, or (b) remove the assertion, and rewrite the function to handle more cases.

fergalr · 02-05-2014 12:04am

oscarBravo wrote: »

Just zooming in on this: the only reason I'd assert a range of values for a parameter is because the function's logic makes assumptions based on that contract. If the function is written in such a way as not to make any difference - the example function you provided is a case in point - then there's nothing to assert, because it will just work the same way.

I worked on some code this week that, inter alia, does duration calculations on time values. If I can be confident that the first argument will always be earlier than the second, I can take some shortcuts in how the code is written. But if the arguments are ever the other way around, those shortcuts break the code.

I'm making the case that in such circumstances, the correct approach is to assert the parameter ordering at the very start of the function. Then, in two years' time, when I decide to call the function with arbitrary times (forgetting the assumptions), instead of getting some really strange results that I have to hunt down, I get an assertion error, which reminds me of the need to either (a) pass the parameters in the correct order, or (b) remove the assertion, and rewrite the function to handle more cases.

That all makes total sense.

What about the case where you take the times from the database, and the SQL query returns them in sorted order?

Then you pass them through a helper function that increments them all by one.

Finally, you run the method you mentioned that relies on them being sorted.

Do you assert that they are sorted in every method? Check that they are still sorted when they come back from every other method?

Sometimes that would make sense - if defense in depth is important.

But other times, its a bad idea - your code just ends up looking like microsoft in this picture:

http://usingapple.com/wp-content/uploads/2011/06/Organizational-Chart-for-Apple-Amazon-Facebook-Google-Microsoft-and-Oracle.jpg

Calina · 02-05-2014 7:02am

fergalr wrote: »

Still, though, even in enterprise settings, as a project evolves, it'd be interesting to know what percentage of code thats written actually survives a long time. I remember working in enterprise settings, a surprising amount of projects and code were discarded or deleted within in a few months after they were written.

I guess it depends on your enterprise context. I have worked on code that is older than myself on a codebase that I believe even now is relatively stable.

On a vaguely related note, what might be more interesting is the underlying "why" stuff gets ripped out so quickly. Ultimately, I had a conversation on older code with someone lately - I tend to be of the opinion that older code that's still in place doing its job was well designed and specified at the outset; he made the point that anything which wasn't tended to get ripped out very quickly so by definition, if it survived that long...

Maybe it's something as normal but...it strikes me as a waste of resources if quite a bit of work is getting discarded or deleted within a few months. I'd consider efficiency gains might be possible there. That's why I'd be more interested in the underlying reasons than perhaps the quantity.

oscarBravo · 02-05-2014 9:36am

fergalr wrote: »

That all makes total sense.

What about the case where you take the times from the database, and the SQL query returns them in sorted order?

Then you pass them through a helper function that increments them all by one.

Finally, you run the method you mentioned that relies on them being sorted.

Do you assert that they are sorted in every method? Check that they are still sorted when they come back from every other method?

Sometimes that would make sense - if defense in depth is important.

I guess I'd just summarise by saying that if a function makes assumptions about its parameters that it relies on in order to reliably produce a result, the function should assert those assumptions. If it's not asserting the assumptions, it shouldn't rely on them.

But other times, its a bad idea - your code just ends up looking like microsoft in this picture:

http://usingapple.com/wp-content/uploads/2011/06/Organizational-Chart-for-Apple-Amazon-Facebook-Google-Microsoft-and-Oracle.jpg

Heh, I like that one

fergalr · 02-05-2014 8:44pm

Calina wrote: »

I guess it depends on your enterprise context. I have worked on code that is older than myself on a codebase that I believe even now is relatively stable.

On a vaguely related note, what might be more interesting is the underlying "why" stuff gets ripped out so quickly. Ultimately, I had a conversation on older code with someone lately - I tend to be of the opinion that older code that's still in place doing its job was well designed and specified at the outset; he made the point that anything which wasn't tended to get ripped out very quickly so by definition, if it survived that long...

Maybe it's something as normal but...it strikes me as a waste of resources if quite a bit of work is getting discarded or deleted within a few months. I'd consider efficiency gains might be possible there. That's why I'd be more interested in the underlying reasons than perhaps the quantity.

I strongly disagree with the mindset behind that post. (Note - nothing personal here, and don't mean to be antagonistic - just disagree with the way of thinking).

You are basically arguing that its be more efficient if everything was written in a more bulletproof way at the start.

Like, as a developer, I have sympathy for that worldview. But I think its wrong.

Code in practice doesn't get mostly thrown away because its not high enough quality. Instead it gets thrown away because its solving the wrong problem; because it doesn't meet the actual business needs; because it gets outdated when things change; because it takes too long to ship and the project gets cancelled.

People write a lot of crap about 'agile'. But the key idea is that if you do things with a bit less process, and get earlier versions out a bit quicker, you'll spend less time writing stuff that's wrong.

And that's a really good idea.

Sacrificing code 'quality' in return for speed of development (even short term speed of development) makes a lot of sense. Because you get to test the risky assumptions quicker.

This is even true in a purely coding sense, even without shifting business requirements: you often don't know what the right way to build something is when you start, so it makes a lot of sense to 'hack' a prototype; you'll probably do several such prototypes, and throw away and refactor a lot of parts of a hard project, before you really figure out how to build it. Filling your code with tests and guards while you go through that process, slows you down, and makes things less efficient overall.

Calina wrote: »

On a vaguely related note, what might be more interesting is the underlying "why" stuff gets ripped out so quickly. Ultimately, I had a conversation on older code with someone lately - I tend to be of the opinion that older code that's still in place doing its job was well designed and specified at the outset; he made the point that anything which wasn't tended to get ripped out very quickly so by definition, if it survived that long...

Again, I think the causality usually goes the other way around. Usually, if something has survived a long time, and is still being used, its because its doing something important enough that it makes sense to maintain it and make it robust.

Its not because its particularly easy to maintain up front or robustly designed. That is a factor too, but its rarer. At least thats my understanding.

fergalr · 02-05-2014 8:46pm

oscarBravo wrote: »

I guess I'd just summarise by saying that if a function makes assumptions about its parameters that it relies on in order to reliably produce a result, the function should assert those assumptions. If it's not asserting the assumptions, it shouldn't rely on them.

In general, thats a rule most people would agree with.

But you didn't answer the more specific scenario I gave.
If you take that general rule, and apply it to a system where the, e.g. user input gets passed down 10 levels of abstraction (i.e. 10 method calls) into the system, and if you call the helper functions to canoicalise/escape/security-check/check-non-null/guard that user input, and apply each of these guards at every one of the 10 steps, rather than just at some external system boundary, then you are probably doing things wrong.

Its a sensible general rule, but if you apply it everywhere without regard for cost, your code will be suboptimal.

mrkite77 · 02-05-2014 10:48pm

c_man wrote: »

I'm of the minimalist school of thought when it comes to commenting

Same. I tend to comment more on what's *not* in the code. Explaining that I, too, thought of that super awesome optimization for this part of code, but it won't work because of xxxx.

It's especially useful for Future Me, because otherwise I'll look at that old code and have the exact same idea for improving it, and not remember that I had already tried that.

There are some things I'll always comment though.

I'll comment any complex regexs, especially since things like negative lookbehind assertions tend to be confusing and require people to stop and logic through them in order to understand what is going on.

I'll also comment any magic numbers used. I never use a constant for a magic number that's only used once.. that just obfuscates things and spreads your code out.

Here's an example:

	//locate end of central directory record
	qint64 ziplen=f.size();
	qint64 maxECDlen=0xffff + 2 + 20; //max comment + comment len + ECD
	if (maxECDlen>ziplen) //zip is shorter?
		maxECDlen=ziplen;

	f.seek(ziplen-maxECDlen); //ECD must be after this

	QByteArray data=f.read(maxECDlen);
	const quint8 *p=(const quint8 *)data.constData();

	bool found=false;
	//now scan this data for the ECD signature
	for (qint64 i=0;i<maxECDlen-20;i++,p++)
	{
		if (p[0]==0x50 && p[1]==0x4b && p[2]==0x05 && p[3]==0x06)
		{
			found=true;
			break;
		}
	}
	if (!found) //no ecd found, probably not a zip
	{
		f.close();
		return false;
	}

	//awesome, now to find the central directory
....

This is part of code that reads a .zip file. I comment roughly what I'm doing, but I don't go into too much detail because if you're going to modify this code, you damn well better have the .zip specification open in front of you.

oscarBravo · 03-05-2014 1:59pm

fergalr wrote: »

If you take that general rule, and apply it to a system where the, e.g. user input gets passed down 10 levels of abstraction (i.e. 10 method calls) into the system, and if you call the helper functions to canoicalise/escape/security-check/check-non-null/guard that user input, and apply each of these guards at every one of the 10 steps, rather than just at some external system boundary, then you are probably doing things wrong.

That's the sort of situation where, rather than assuming and asserting, I'd stop assuming and start checking instead.

fergalr · 03-05-2014 7:19pm

oscarBravo wrote: »

That's the sort of situation where, rather than assuming and asserting, I'd stop assuming and start checking instead.

That doesn't change the core issue.

Either
1) you do your checking in every one of your functions, which is consistent with your stated policy of each function guarding its input; but which means every function becomes full of duplicate error checking/assertions/[your-guard-method-of-choice]

or

2) you only check at some other sort of boundary - e.g. system boundaries, trust boundaries - in which case you admit your earlier principle is not applied absolutely in practice.

What is an adequate amount of commentary on code?

Comments