Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Values and objects in programming languages (1982) [pdf] (acm.org)
80 points by jgrodziski on Jan 9, 2023 | hide | past | favorite | 21 comments


Really interesting to compare such early thinking to where we are today.

> 4. Values and objects in programming languages

> Most languages confuse them.

Indeed. All programmers have an intuition of (primitive, immutable) values and (mutable) objects, but e.g. Dan Abramov claims the intuition is often lacking w.r.t. how a specific language actually works in practice. His book Just JavaScript explicitly describes the full mental model for one language, and I found this approach valuable when I taught a JS bootcamp. https://justjavascript.com/

For example, JS strings are superficially similar to an array of characters, but their semantics such as equality are completely different because strings happen to be primitives while arrays are not.

> Names should be fixed.

Here, they are advocating the use of const/final (single-assignment) variables, which has become a best practice but wasn't widely available back then. This is orthogonal to the value vs object distinction though (with const-immutable being what pure FP requires.)

However, further in conclusions:

> We have shown that objects correspond to real world entities, and hence exist in time, are changeable, have state, are instantiated, and can be created, destroyed, and shared.

I think this is the naive OO hype of the time (1982) talking: whether something should be modelled as a mutable object in a program is orthogonal to real world entities.


> For example, JS strings are superficially similar to an array of characters, but their semantics such as equality are completely different because strings happen to be primitives while arrays are not.

How do the semantics change? They are arrays, they literally have pointers to their characters.


> How do the semantics change? They are arrays, they literally have pointers to their characters.

  console.log("hello" == "hello")
  console.log([1,2,3] == [1,2,3])
These produce different results despite the surface similarity between the two expressions. If strings were just arrays, then both of these expressions should result in true or both in false, not different values.


I don't think you've chosen a good example.

A better one could've been that you can't reassign specific characters and the string is immutable, but its still a list of characters.

Anyway, the author doesn't think about objects in the oop sense, kind the contrary he thinks any value representation in a computer (or even piece of paper) is an object.


> I don't think you've chosen a good example.

This comment makes zero sense. Here's the part that you quoted from the OP in this thread:

>> For example, JS strings are superficially similar to an array of characters, but their semantics such as equality are completely different because strings happen to be primitives while arrays are not. [emphasis added]

Here's your response:

> How do the semantics change? They are arrays, they literally have pointers to their characters. [emphasis added]

My example provides an answer to your question, precisely as it relates to the statement you quoted. How is that possibly a bad example when it answers your exact question?


Because JS strings are obviously arrays (or if you prefer lists) of characters.

> The String type is the set of all ordered sequences of zero or more 16-bit unsigned integer values (“elements”) up to a maximum length of 253 - 1 elements.

Everytime the computer has to compare two strings it will have to compare the length and check whether each character is identical.

The fact that a choice was made to make strings immutable or how to implement comparison for arrays doesn't take away from strings being arrays of 16bit chars.


As far as I can tell, reading the spec, you're totally wrong. Specifically, I believe that it is not true that every time the computer compares two strings that it needs to compare length. Whether or not any string or strings is/are "interned" is completely implementation-dependant.

To elaborate, you could make a compliant EcmaScript interpreter wherein ALL strings, as they are constructed, are compared (presumably by hashing) against a global list of all already-constructed strings, and if the new string already existed, the in-memory representation of the string is a reference to the One True Instance of that string, such that equality comparison can always done by reference equality. Every string construction is slow, and every string comparison is constant time. Would this be smart, no. But it is consistent with the semantics that the ES spec puts on strings, which is that they are not arrays, they are primitive immutable sequences of codepoints.

You could also, as you perhaps are suggesting in your first parenthetical aside, implement a compliant ES interpreter where the characters of a string are not stored in contiguous memory, being perhaps stored in some kind of complicated tree or linked lits or something. I cannot see any upside to that one. Well, I guess on very particular types of data you could save a lot of space, but wow that would be atypical.

More broadly, and more of a subjective opinion, I think you're completely making the case of the top-level commenter. If someone comes into a language assuming that they know which values are values and which are references, they're going to make a lot of errors. They're going to misunderstand what equality checking does, they're going to unnecessarily make defensive copies of immutable values to avoid aliasing problems, they're going to try to mutate things, etc.


My bad, I shouldn't have said the semantics are completely different. I suppose I meant to say the semantics are defined separately. I also should have clarified that I talked of JS arrays and not the general concept of arrays or lists.


It was pretty clear you were comparing JS strings and JS arrays to everyone else. They just seem to want to argue.


They’re the same example. The reason a given string is equal to the same value and the same isn’t true of arrays (in JS) is precisely because the latter might change and the former can’t. That’s exactly the semantics of values in the language: immutable things with equal state are equal (notwithstanding NaN which has special equality semantics), nothing mutable is equal unless it’s the same reference and will always be equal.


What are the actual semantics here? Is JS automatically interning strings as symbols these days, or is the == operator comparing strings character by character, and arrays only by pointer (aka "object identity")?


The result of == on strings is defined as comparison UTF-16 code unit by UTF-16 code unit (non-normalised). The result of == on what the spec calls "Object values" (including arrays) is by identity.

Here's the spec: https://tc39.es/ecma262/#sec-samevaluenonnumber

> 7.2.12 SameValueNonNumber ( x, y )

[...]

> 5. If x is a String, then

> a. If x and y are exactly the same sequence of code units (same length and same code units at corresponding indices), return true; otherwise, return false.

[...]

> 8. If x and y are the same Object value, return true. Otherwise, return false.


Is it okay if I snickered? Because I did snicker.


Sure, although trying to decipher the spec more typically makes me want to cry. (Where do they define the "sameness" of "Object values"?)


The semantics are that primitives (string, number, boolean, undefined, symbol) can be compared by value and everything else (object, except null) is a reference type with pointer identity regardless of the value. This might change in a meaningful way for collections if the records/tuples proposal succeeds. But even then those types will be considered primitives and won’t be allowed to contain reference types either.

Strings are primitives in JS, not just these days. They’re always value-equal unless they’re boxed (eg new String, which nobody ever does unless they’re doing something really weird). === would work the same way.

Edit: I guess more to the point about how strings differ from arrays of string bytes… the semantics are that they’re only superficially similar. They’re both a sequential collection of things that might be a string character (or part of one), and they might have some getter names in common with similar behaviors, but that’s where their similarities end.


> Strings are primitives in JS, not just these days.

Right. What I meant is whether they're interned - i.e. stuffed in a global table, in which strings (or at least string constants) are automatically looked up on parsing. If they were interned, both instances of "hello" in the expression "hello" == "hello" would parse into the same object, i.e. two pointers referencing the same character array, and thus could be compared in O(1) by comparing pointer addresses, instead of checking individual characters.

This is how symbols (a distinct type from strings) are implemented in Lisp languages. I've heard some languages do that automatically as optimization for small strings. I vaguely recall JavaScript was supposed to get symbols too, a while ago.

Anyway, per yours and sibling comments, I see the difference between strings and arrays is effectively determined by how operator == treats them.


JavaScript has a symbol type but I think it's different.

I don't see the JS spec saying anything about interning or time complexity.

Here's some talk about how JS implementations intern compile-time constants and nothing else: https://stackoverflow.com/questions/5276915/do-common-javasc...


I agree with the sibling comment that there is nothing in the spec saying whether or not strings are interned, and commend the reference there.

Also, you probably meant the === operator, because discussing the == operator is a whole extra bag of worms.



Today we'd talk about value objects[1], small objects that represent simple entities whose equality is not based on identity. Some languages include types to represent things other than numbers. Things like Timestamp and Duration. Most modern languages allow defining new types, which can be made immutable (barring language features like reflection that "go through the back door"). For example a Point that has x and y values assigned at creation and unchangeable. Any Point{x,y} is equal to any other Point{x,y} if x and y are the same, so there is no need for identity. In practice, the runtime may keep different instances of points, just as there may be two int values, i, and n, which are equal but stored in different memory locations.

1 https://martinfowler.com/bliki/ValueObject.html


This is a very nomenclature-focused essay. But the names are still relatively salient for modern use. Relatively.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: