What is an adequate amount of commentary on code?

oscarBravo · 03-05-2014 7:55pm

fergalr wrote: »

That doesn't change the core issue.

Either
1) you do your checking in every one of your functions, which is consistent with your stated policy of each function guarding its input; but which means every function becomes full of duplicate error checking/assertions/[your-guard-method-of-choice]

or

2) you only check at some other sort of boundary - e.g. system boundaries, trust boundaries - in which case you admit your earlier principle is not applied absolutely in practice.

Well, no. The point I made is that if a function makes an assumption about its parameters, it should assert the assumption. If it's not going to assert the assumption, it shouldn't make it.

You gave the example of the function that makes the assumptions about its parameters being ten calls deep from the initial point at which the values were derived, and you seem to be suggesting that every function has to assert the order of the parameters on behalf of the one that makes the assumption. I don't know where you're getting that: I'm talking about one single function that relies on the assumption of its parameter ordering, and that's the function that needs to assert that assumption. The other nine function calls in the stack above it don't make any assumptions, because what they're doing to the parameters doesn't rely on any such assumption. Or, if it does, then they should assert it too.

Back to my original example: there is one function that assumes the two time values it is passed are in a particular order. That one function relies on that assumption in order to take some shortcuts in how it does its calculations, and should assert that assumption.

The function that calls that one makes no such assumption. It doesn't care about the parameter order, and will work in precisely the same way whether they are in ascending or descending order. Why should it assert the sequence? That doesn't make any sense.

I never suggested that every function should guard its inputs; I said that if the function relies for correct operation on an assumption about its inputs, then it's a good idea for that function to assert the assumption. That's all.

fergalr · 03-05-2014 8:22pm

oscarBravo wrote: »

Well, no. The point I made is that if a function makes an assumption about its parameters, it should assert the assumption. If it's not going to assert the assumption, it shouldn't make it.

You gave the example of the function that makes the assumptions about its parameters being ten calls deep from the initial point at which the values were derived, and you seem to be suggesting that every function has to assert the order of the parameters on behalf of the one that makes the assumption. I don't know where you're getting that: I'm talking about one single function that relies on the assumption of its parameter ordering, and that's the function that needs to assert that assumption. The other nine function calls in the stack above it don't make any assumptions, because what they're doing to the parameters doesn't rely on any such assumption. Or, if it does, then they should assert it too.

Back to my original example: there is one function that assumes the two time values it is passed are in a particular order. That one function relies on that assumption in order to take some shortcuts in how it does its calculations, and should assert that assumption.

The function that calls that one makes no such assumption. It doesn't care about the parameter order, and will work in precisely the same way whether they are in ascending or descending order. Why should it assert the sequence? That doesn't make any sense.

I never suggested that every function should guard its inputs; I said that if the function relies for correct operation on an assumption about its inputs, then it's a good idea for that function to assert the assumption. That's all.

Lets say you are building a system that works with data, that typically has a certain constraint.

The constraint could be 'the data points are in increasing order', like you mention.
Equally, it could be 'there are no cycles in this graph data', which is a little more computationally costly to check.

Lets say you have something you want to do on this data: a complex transform, that relies on the use of many algorithms, each expressed as a separate function, each of which relies on that constraint, and each of which preserves that constraint, that you want to run in sequence on this data.

Sometimes, it makes more sense to just check the constraint once, at the system boundary, and then run all the functions, each blindly assuming the constraint is true.
The more complex or otherwise expensive the constraint is to check/assert, or the more checks I have to do, the more likely it is that this makes sense.

Even if I was building a system where the constraint was just that the data points are in sorted order, where I had many functions that were going to rely on this constraint, I could definitely see myself checking the constraint once at the system boundary, and then just relying on it thereafter, without asserting/checking this at every function boundary. Especially if it was something a bit more costly to check.

If someone agrees that, yes, this often makes sense to do, for reasons of computational efficiency, code clarity, system design, etc, then we are in agreement.

If they dogmatically say that each of the functions that rely on this constraint should check it, pretty much regardless, then we are not in agreement.

In extremis, I'll be able to build systems that sometimes complete the task in the required computational time, while they won't.

In more prosaic cases, I'll be able to write systems which are coded in a much clearer way inside the system boundary.

If I take this too far, and screw up, I'll not do error checking when I should, and I'll have a function who is making an assumption that is being violated.

But in many case that risk makes sense to take.

oscarBravo · 03-05-2014 11:22pm

If the assumptions about the parameters form an intrinsic part of the system's design, then yes: it's probably fair to check them only once, because if the assumptions are that intrinsic, then there isn't going to be a reason to suddenly change that assumption mid-system.

So yes, for the edge case you've managed to contrive, it would make sense not to bother asserting your assumptions.

Ironically, this whole conversation started from a situation where you described a function as always having values between -1 and 1, then provided code that didn't care about those constraints, and finally gave sample argument values that weren't even in that range! So I guess it's doubly true that you shouldn't assert your assumptions if those assumptions are invalid to start with.

fergalr · 04-05-2014 12:57am

oscarBravo wrote: »

If the assumptions about the parameters form an intrinsic part of the system's design, then yes: it's probably fair to check them only once, because if the assumptions are that intrinsic, then there isn't going to be a reason to suddenly change that assumption mid-system.

So yes, for the edge case you've managed to contrive, it would make sense not to bother asserting your assumptions.

I don't think its a contrived edge case. I think its really common to check or santise things at boundaries, and then write code inside the boundary that relies on those assumptions; I think that often is the best way to write things.

oscarBravo wrote: »

Ironically, this whole conversation started from a situation where you described a function as always having values between -1 and 1, then provided code that didn't care about those constraints, and finally gave sample argument values that weren't even in that range! So I guess it's doubly true that you shouldn't assert your assumptions if those assumptions are invalid to start with.

I constructed the function where you know its input is between -1 and 1 as hypothetical example to make the point clearly that you shouldn't always check things.

This might occur in practice because your parameters have been mapped into that range elsewhere in the system.
You seem to have agreed that in that case, you don't need to check/assert/guard them again, so no further disagreement there.

I agree that there's benefit to having your 'inner' code be robust, and fail predictably, if the assumptions are violated.

But there's a cost to writing the extra code to check those assumptions, to maintaining it, and to looking at it in your codebase.

On the other hand, if you don't write it, then you've got to remember, if you change the outer stuff, to make sure to check whether the inner stuff needs changing too. There are costs both ways, and tradeoffs to make.

When I write code, I try not to really determinedly follow rules. I think about whether it makes sense to do these things as I do them. It'll depend on what the code is for, how fluid it is, how tricky it is, if this method is a part of the system I think will last a long time, if I'm planning on re-writing, how likely the system is to be canned, the impact of errors on the system, the cost of errors etc. Its a quick and heuristic process. I'm sure I make mistakes, but I'm pretty convinced its the right way to go about these things.

Sparks · 04-05-2014 11:10am

fergalr wrote: »

I don't think its a contrived edge case. I think its really common to check or santise things at boundaries, and then write code inside the boundary that relies on those assumptions; I think that often is the best way to write things.

With the possible exception of writing code that nobody else will ever work on (and I'm including Future-You-Who-Hasn't-Seen-This-Codebase-In-Over-Six-Months in the "other people" bracket), I think that's not a very good way to write code.

That's not saying that nobody does it, but people who write this way seem to create a lot of work six months down the line for everyone else on the team, or worse, people not on the team who are using their libraries/products.

The paranoid people who write asserts (or whatever equivalent they have) in even their smallest functions will at least trigger panics or traps or coredumps or whatever mechanism is in place for error handling in their system and that will at least let people know there's an error; those who don't may teach everyone yet another valuable painful lesson about what the words "undefined behaviour" in the language standard really mean.

Whether or not they write test suites to exercise those asserts, that's an arguable point (no, that one really is), but what you're describing is similar in concept to open loop control (which is usually introduced with the example of driving down the M50 in a car while the windows are painted black and you have a map, a stopwatch and a speedometer to navigate by - everything's fine until the moment something unexpected happens or you make even the smallest of mistakes).

oscarBravo · 04-05-2014 12:52pm

One other thought occurred to me: if the parameters that are being passed ten calls deep into the system while potentially being altered at every step must retain various constraints - such as forming a DAG - then I'd almost certainly either find or write a class that encapsulates that behaviour and enforces it itself, and pass an instance of that class as the parameter.

fergalr · 07-05-2014 10:04pm

oscarBravo wrote: »

One other thought occurred to me: if the parameters that are being passed ten calls deep into the system while potentially being altered at every step must retain various constraints - such as forming a DAG - then I'd almost certainly either find or write a class that encapsulates that behaviour and enforces it itself, and pass an instance of that class as the parameter.

That's often a good idea - leverage the type system etc.

Anyway, I've nothing more to add - I'll be resting my case, so to speak - interesting to discuss though, thanks.

lemon_remon · 08-05-2014 12:12am

I didn't read the full discussion (it was long sorry!) but it sounds to me like you guys are arguing "Look Before You Leap" vs "It's Easier to Ask Forgiveness Than Permission". I thought this was a good blog post on the matter: http://oranlooney.com/lbyl-vs-eafp/

GreeBo · 08-05-2014 8:25am

Sparks wrote: »

....everything's fine until the moment something unexpected happens or you make even the smallest of mistakes).

In this case, someone introduces a new entrypoint OR your internals get exposed as a package/jar/library/whatever and used in all manner of other ways by new and interesting people.

Stark · 08-05-2014 10:46am

And your code achieves sentience and becomes Skynet or <insert other unlikely event>. Is the codebase really the best place to have the fight if people in your organisation are actively looking to hack and sabotage your code? Or if it's simply a case of incompetence and too many cooks, consider a move to a services architecture, ie: instead of giving someone a jar, give them a REST service or whatever that they need to use, thus isolating the entry points in your code. If you're constantly in fire fighting mode, frantically putting in defence after defence while the project crumbles then it's probably time to step back and reevalute the architecture/team structures from a different perspective.

GreeBo · 08-05-2014 10:52am

Stark wrote: »

And your code achieves sentience and becomes Skynet... Is the codebase really the best place to have the fight if people in your organisation are actively looking to hack and sabotage your code?

Who said anything about a fight?
Code gets re-used all the time, sometimes in-situ via a new entrypoint (that maybe have completely different rules about range of params) and sometimes its picked up and deployed somewhere else.

If I look at a method called calculatesSomething(int a, int b) and that something is a thing that I want to calculate then I will call that method rather than create my own.

If I see assertions guarding the params then I know how/when to use it.

If on the otherhand something funky happens sometimes for some values I probably will discover that when I get a phone call at stupid o'clock some morning about production problems.

Stark · 08-05-2014 10:56am

If someone is poking around in your code, does a copy and paste and then breaks something through using it incompetently, why are you the one getting the blame? Just saying it sounds like there are organisational/cultural issues at play that may need to be addressed. I think as a development team you need to set clear directives on what code is intended to be used as a service/reusable components with all the validations you expect from such code and what code is meant for private consumption. Otherwise things become like the funny Microsoft drawing that fergalr posted.

GreeBo · 08-05-2014 11:10am

Stark wrote: »

If someone is poking around in your code, does a copy and paste and then breaks something through using it incompetently, why are you the one getting the blame? Just saying it sounds like there are organisational/cultural issues at play that may need to be addressed. I think as a development team you need to set clear directives on what code is intended to be used as a service/reusable components with all the validations you expect from such code and what code is meant for private consumption. Otherwise things become like the funny Microsoft drawing that fergalr posted.

"Copy and Paste"?

Thats not re-use, thats barely recycling!
I'm talking about someone calling *your* code. Once your code is in the JVM and public it can be called from anywhere, I would argue that its your responsibility to make sure that it behaves as expected OR if not, it throws a defined wobbler when its not used as expected/defined.

Sparks · 08-05-2014 12:15pm

Someone accessing your library internals in an unplanned way? Why, that would never happen! Sorry Google, what did you say about undocumented DirectX calls being used all the time in games over the last decade or so? And applications doing the same thing with the windows API to the point where large amounts of code was being written with every release by microsoft to allow compatibility with those undocumented calls?

c_man · 09-05-2014 9:28am

Am on a code review at the mo. Noticed that there's a rather large block of comments already existing in one of the files, outlining that the next block should be removed "implemented elsewhere correctly, TODO remove". The code is still being hit... I asked the reviewee about it, he didn't know. Took a look with SVN blame but damn you repo move a year ago! So the comment will stay, probably forever.

Colonel Panic · 12-05-2014 9:52pm

I'm doing some silly things with C++ lambdas and C style callbacks to sqllite and thought I should probably comment what I'm doing. So, I lay my commentary here for judgement. (Sorry Sparks, this is the kind of C++ perversion you'll HATE!)

bool Database::Execute(std::string const& query, RowHandler handler)
{
  // Create a closure that matches the callback signature sqlite3_exec wants.
  // Handler param passed as user data, so cast that to the correct type and call it.
  // Explicitly setting return type as int isn't 100% necessary as it can be deduced 
  // from return type of RowHandler.
  auto wrapper = [](void* userData, int columnCount, char **columnValues, char **columnNames) -> int 
  {
    auto handler = reinterpret_cast<RowHandler*>(userData);
    return (*handler)(columnCount, columnValues, columnNames);
  };

  return sqlite3_exec(db, query.c_str(), wrapper, &handler, /* error handling can wait */ nullptr);
}

Pastebin link -> http://pastebin.com/e76ikkcT

What is an adequate amount of commentary on code?

Comments