Why object references are confusing, and what to do about it

A recent blog post from my old friend Phil ((well, vaguely recent: I really should get with this whole RSS thing!)) discussed some of the gotchas of parameter passing in object-oriented languages – or I suppose specifically in partially OO languages, since the problem in this case was a combination of objects and structs in C#.

It seems to me there is a genuine problem here, beyond programmer fallibility – the old distinction between “pass by value” and “pass by reference” is no longer a useful distinction in such languages, and someone needs to design something better.

To re-summarise, the problem is that in a lot of languages, variables that we call “objects” are actually always “object references”, so a parameter “passed by value” – or even referenced from a non-object structure that was passed by value – is not copied, and can still cause changes to the original instance of the object.

This can be extremely confusing – I remember similar confusion in Java, where int, float, boolean, and char are basic types, but Strings are objects, so every time you pass a String into a function, that function might accidentally edit the original string. ((Although you’re not supposed to modify String objects anyway, because they’re implemented in some weird way that’s not optimised for manipulation, so you’re supposed to use a different class for that, or something…)) This really highlights the arbitrary distinction underlying the “object reference” model: it’s perfectly reasonable to argue that a string is a complex type, and in C you would use an explicit reference; but in, say, PHP, assigning an existing string to a new variable will copy the contents of that string – the underlying mechanics are part of the abstraction of the language.

Now the reason for the distinction is, I presume, that cloning an object is not always trivial – its internal state may be tied to some outside resource, or require some manual allocation, or just be fundamentally a singleton. So the problem is that objects have to share languages with “POD” types, like integers, which are much simpler to manipulate as raw data. Purely OO languages, where everything is an object – take Javascript, for instance – don’t have this problem; but they have the opposite problem: they can’t pass by value for any type!

Before I go on, let me admit that I find references confusing at the best of times. I’ve done very little programming in C, but I note that at least there the level of indirection is constantly visible, even if you do end up with rather a lot of it – “pointer to a pointer to a pointer to a char” and so on. In PHP, the =& operator makes a variable a reference to another variable, but what does that actually mean? If I set $foo =& $bar, what happens if I then say $bar =& $baz, or unset($bar)? The answers will be entirely logical, I’m sure, but it can be pretty hard to keep track of what a particular assignment is actually assigning to. You can even subvert a function’s calling convention by passing in a reference to a non-reference parameter – the de-referencing is implicit, so the function unknowingly modifies the caller’s variable. ((e.g.

function foo($value)
{
      $value++;
      return $value;
}
$a = 1; foo($a); echo $a; // 1
$b = 1; foo(&$b); echo $b; // 2!

))

An interesting question to look at is why you should be able to write to parameters that have been passed by value at all, and Pascal and its descendants let you define “const parameters” which make such modification illegal. This actually allows the compiler to make some hefty optimisations, although as soon as you start passing object references, it too becomes much less useful – you can still make calls that modify the underlying object instance, you just can’t assign a brand new instance over the top of the variable.

Solution

OK, I’m not going to say I’ve suddenly invented a language construct that will revolutionise programming, but I wonder if something can be done to make this all a bit less obscure. The ability to recursively clone an object is hard enough, let alone the ability to recursively but context-sensitively mark it as immutable, so this may all be impractical. OTOH, some of this is probably already available in current OOP languages, but it’s consistency I’m after.

Make all types objects, and all variables object references – even if the underlying language optimises, say, an int object by only introducing indirection in memory when more than one reference is needed.
Include a simple method for duplicating object instances, like PHP’s clone operator, so that assign-by-value becomes something like $x = clone $y;.
Allow classes ((or the equivalent in a prototype-based language, etc)) to define custom clone mechanics, or – crucially – declare that they are “uncloneable”.
Force additional levels of indirection to be explicit – that is, a variable of type “reference to String” cannot be passed to a function expecting a variable of type “String” without explicit de-referencing.
Most importantly, replace the out-dated “pass by value” vs “pass by reference” convention with something more appropriate for such a variable system:
- “reference” parameters – the most common option; modifications to the object will affect the original instance, and assignments to the variable will overwrite the original object reference
- “clone” parameters – effectively a true pass by value: the object is cloned automatically, and only the clone is visible to the called function; passing an object that cannot be cloned is illegal
- “const” parameters – the local variable is read-only (you can’t overwrite it to reference a different object instance) and the underlying object is marked immutable, recursively, within that scope

> Now the reason for the distinction is, I presume, that cloning an object is not always trivial […]

I don’t think this is the real reason. Consider, for instance, PHP 4, where all objects had value semantics (assignment, pass-by-value). This simply doesn’t fit well to the mental model a programmer has of entities (opposed to value objects). Explicitly passing and assigning by reference is possible, but cumbersome and error prone.

> Purely OO languages, where everything is an object – take Javascript, for instance – […]

It is a misconception that everything in JavaScript is an object. JavaScript has simple types, such as integer, which have value semantics. Cf.

var foo = 42;
function bar(baz) {
baz = 17;
}
bar(foo);
window.alert(foo); // => 42

3 Comments

Julian Burgess
18 September, 2010 at 23:17

In Ruby you only have objects, so you can only pass by value (of object reference type). There is a good post here on the matter

http://javadude.com/articles/passbyvalue.htm

Rowan (Post author)
18 September, 2010 at 23:49

Yeah, there’s definitely something to be said for languages that are fully OO and don’t have any “primitives”.

Although of course, in Ruby, you can not only unexpectedly change someone else’s object, you can change someone else’s class and affect all their instances of it! ;)

Christoph M. Becker
31 January, 2015 at 00:10

> Now the reason for the distinction is, I presume, that cloning an object is not always trivial […]

I don’t think this is the real reason. Consider, for instance, PHP 4, where all objects had value semantics (assignment, pass-by-value). This simply doesn’t fit well to the mental model a programmer has of entities (opposed to value objects). Explicitly passing and assigning by reference is possible, but cumbersome and error prone.

> Purely OO languages, where everything is an object – take Javascript, for instance – […]

It is a misconception that everything in JavaScript is an object. JavaScript has simple types, such as integer, which have value semantics. Cf.

var foo = 42;
function bar(baz) {
baz = 17;
}
bar(foo);
window.alert(foo); // => 42

Why object references are confusing, and what to do about it

Solution

3 Comments

Leave a Reply to Rowan Cancel reply

Short Link to this Post

Here…

There…

Recently…

Previously…

Technically…