I think everyone is missing the big point here: they current implementation was ...

zo1 · on June 3, 2020

The lack of static typing does not make refactoring incredibly hard or difficult. In fact, I would argue it's even easier due to the way dynamically-typed languages pass around data. Really, as long as you follow some basic rules and don't propagate complexity into your system too-much, refactoring is a breeze. When I do it, the types are inconsequential almost.

The things I do worry about when refactoring is developer-induced complexity. E.g. "This random corner case in this function returns a null, but I don't see the consumer handling nulls." At that point, it's hard to tell their intention and whether that null's explosion is handled way further down the line in some unrelated function as part of normal execution flow. I.e. Sometimes that behavior is expected, and my "fixing" of it would introduce a bug. That is what causes regression bugs when refactoring, not the types. Sure, one or two "type typos" do slip through the cracks, but that's inevitable and one quick run-through of the system picks that up relatively easily.

Trust me, once you spend a non insignificant amount of time developing, refactoring and debugging software in a dynamically typed language, you start realizing that static-types are more of a crutch for the compiler than they are to assist the developer. Especially in more object-orientated languages with complex types and inheritance hierarchies. Think C#, Java, et al with their "abstract base classes", interfaces, interface-inheritance, virtual override methods, method overloading, type casting, etc.

Chris_Newton · on June 3, 2020

Any halfway decent static type system would capture that nullability and prevent accidentally using a null value without checking, so you seem to have slightly undermined your own argument here.

zo1 · on June 3, 2020

Static type systems don't necessarily prevent accidental null value usage. Not sure where you get that impression?

However, for initialized variables, nulls (or None) was an example I used as that is the common one in python and conveys the point rather than language details. All of the "decent static type" systems I'm aware of have the same issue with undefined values that break behavior. E.g. zero's as integers, empty strings.

Heck even in C# where where you have a Nullable type, it gets abused more often than used legitimately and is seen as a nuisance by most developers. Not only that, but even with that static type system you mention, initialized variables are a big problem. Hence all the null-reference exceptions that are all-too common.

Chris_Newton · on June 3, 2020

Static type systems don't necessarily prevent accidental null value usage.

The better ones do.

Some languages with static type systems, notably C and its descendants, have reference or pointer types that are nullable by default. With the wisdom of hindsight, that design decision is regrettable; Tony Hoare himself famously called inventing null references his “billion-dollar mistake”.

There are safer alternatives. For example, you can have a type that makes optionality explicit, so it contains either nothing or a single value of some known type. Before you can work on the contained value, if there is one, you must deliberately extract it; the type system will prevent you from accidentally using the optional value in place of the contained value. In Haskell, this type is called Maybe a. Rust has Option<a>. In OCaml, it’s 'a option.

All of the "decent static type" systems I'm aware of have the same issue with undefined values that break behavior. E.g. zero's as integers, empty strings.

Again, with a sufficiently expressive type system, you can encode properties such as a list being non-empty in your types. This lets you prevent illogical actions like trying to take the head of a list with nothing in it. You sometimes see these techniques if you’re working on high reliability systems with formal verification.

You can also handle edge cases safely by replacing a partial function that is undefined for certain inputs, such as dividing by zero or taking the head of an empty list, with a total function that gives you back an optional value as described above.

yxhuvud · on June 3, 2020

"This random corner case in this function returns a null, but I don't see the consumer handling nulls."

Modern type systems tend to incorporate handling of null values in the type system though.

I can recommend looking at Crystal. You will find that the overhead of providing sufficient types is pretty small.

Ericson2314 · on June 3, 2020

I mainly work in languages with good type systems. But I've also contributed a bunch https://github.com/mesonbuild/meson/pulls?q=is%3Apr+author%3... I can assure you despite all the unit tests refactoring just takes way, way, longer.

Chris_Newton · on June 3, 2020

More expressive type systems might make refactoring safer, but they don’t necessarily make it easier.

The former happens because it’s harder to change something encoded in the types accidentally.

For the latter, it should be easier to change something encoded in the types deliberately, but often the opposite is true.

jillesvangurp · on June 3, 2020

Having a type system enables zero risk automated refactorings, even complex ones and make them a no brainer. Not having that builds a reluctance to do even simple refactorings.

A good example is the simplest possible refactoring: renaming things. I was doing this in pycharm on a simple python project the other day and it proposed modifying just about all dependencies on the classpath because it couldn't tell apart things that were in scope and out of scope of the refactoring. I've seen similar things happen on javascript and ruby codebases. Renaming things is a PITA in those languages. Not safe at all.

On any Kotlin or Java code base I do this all the time without thinking twice. I rename stuff, I move stuff, extract variables, auto fix things, etc. It just happens. A rename is a complete non issue for that. Doesn't matter if it's a local variable or the package name of your entire code base. You can trivially modify thousands of lines of code with a keystroke without breaking stuff.

Chris_Newton · on June 3, 2020

It seems to me that what you’re talking about there has more to do with having clear rules in the language for scope and modularity than to do with the type system.

jillesvangurp · on June 4, 2020

Those clear rules are called the type system. The fact that it's static means the same stuff that the compiler uses to tell what is what may also be used to build syntax trees to facilitate transforming your code base from one valid state to another. It's impossible to do that with dynamically typed languages and at best you get some partial guarantees combined with some string replacing.

Chris_Newton · on June 6, 2020

A crude but correct algorithm for renaming all instances of an identifier that refer to the same entity (variable, function, type, etc.) could be something like:

1. Locate all occurrences of that identifier in your code base.

2. In each case, determine whether this is the place the underlying entity is defined or a reference to an identity defined elsewhere. If defined elsewhere, locate that definition.

3. Change all occurrences of the identifier that relate to the same definition as the one you started with.

If you have clear rules for things like the scope of an identifier and how identifiers may be imported and exported across modules, there is nothing in that algorithm that is necessarily specific to static or dynamic types.

It’s true that knowing the types statically can make a difference in some cases. A common example would be object.method notation, because there the context matters: the method being identified depends on what type of object you have. If you can’t identify the type of the object in some way, via a static type system or otherwise, then maybe you can’t identify the method and its underlying definition either.

However, it’s worth noting that in these sorts of late-binding environments, the operation of renaming all occurrences of an identifier that relate to the same definition probably isn’t well-defined anyway. Before you can automate a refactoring operation, you need to specify exactly what it means, and in a situation like this, the specification is ambiguous.

Ericson2314 · on June 3, 2020

No, it's really easy: change a definition's type, and then fix each type error. The errors give you a guided tour of the codebase that's impossible to get without static type checking.

Chris_Newton · on June 3, 2020

You still need to fix all of those type errors, though. As we encode more information within our types, the effort to maintain them naturally tends to increase as well.

Your profile indicates that you work with Haskell. In Haskell, we often encode possible effects explicitly in types, while many other languages do not restrict effects to the same degree. This provides a degree of safety in Haskell that those other languages lack. However, it also means that if refactoring moves the place where some effect can be caused, that may require a change to the types that propagates widely through the system.

For example, suppose we have a system where the high level code is wrapped in some logging monad. We decide to refactor so that some of the log writes move to a much lower level, perhaps so we can then add more detailed information to the logged messages. At this point, the entire call chain down to where the logging will be done is infected by the logging monad. This is perfectly correct in terms of type safety. It is also work that would be entirely unnecessary if we were performing an equivalent refactoring in a language that did not encode so much information in its types in the first place.

Ericson2314 · on June 3, 2020

Yes, I work in Haskell. As you say we have lots of types---and type errors. But refactoring is still insanely easier. I'm kinda infamous for bundling a refactor with every feature, in fact, because the extra work is just so minimal I cannot help myself.