An Objective Definition for Strong vs. Weak Typing
by: TheHans255
May 5, 2024
This post is adapted from a post I originally wrote on the Language Dev Stack Exchange.
A common debate in the world of programming languages is whether a programming language is "strongly" or "weakly" typed - that is, how much a program enforces good rules. Unfortunately, there is not a whole lot of consensus as to what these terms mean - some might describe C, for instance, as being strongly typed because it requires you to declare the types of your variables, while some might describe C as weakly typed because it lets you cast pointers willy nilly or lets you store past the end of a list. One blog entry goes as far as to give a pretty snarky definition of these terms so as to show how useless they are:
- Strong typing: A type system that I like and feel comfortable with
- Weak typing: A type system that worries me, or makes me feel uncomfortable
I don't claim to have the authoritative answer on these terms (much less the programming language debate), but I do remember learning a definition for them in my time at the University of Washington that I find to be reasonable, and while not the end-all answer for this debate, is objective, and something I believe should be an essential ingredient in talking about the desirability of a programming language's type system.
Essentially, a "weakly typed" programming language is one in which there exists one or more ways, within the language's spec, to circumvent its type system and cause undefined behavior. A "strongly typed" programming language is simply defined as having no such holes - the type system, no matter what it is, follows its stated rules.
I should first explain what a "type system" even is. Basically, a type system is a set of rules in a programming language that ensures that your program will behave a certain way, and acts by assigning a "type" to each entity in your program, including code, variables, inputs, and outputs. The "rules" themselves can be anything you want, and type systems have been built to variably enforce a great number of rules over the years:
- When two numbers are added together, those two numbers have the same
binary format (e.g. 32-bit two's complement integers, or 64-bit IEEE 754
floating point numbers), thus ensuring that the result makes sense.
If the two numbers are different, either:
- One number has to be explicitly converted to the other number's format, or
- The number whose format has less range is converted to the number whose format has more range, such that no data is lost.
- Whenever you print some text to the screen (via a
print()
call or something), the thing you are printing is a valid stream of text. If you try to print something that isn't text, the program will first apply a rule to convert it to text. - A program either cannot divide by zero, or will crash and refuse to proceed past an attempt to divide by zero.
- In an object-oriented language, when you call a method (e.g.
dog.bark()
), the method exists on the object. (For instance, in Java, you are guaranteed that when you call a method, the object you're calling it on is eithernull
or has that method exactly as described). - If you do a "look up a thing in a list by index" operation, the thing you're looking up the other thing in is actually a list.
- If you look up a thing in a list by index, the index is within the bounds of the list.
- If you have a reference to an value in memory, the value actually exists, in the expected format, at that location in memory (i.e. it has not been replaced with something else).
Not all type systems have all of these rules, even in new, production-level languages. Different languages enforce these rules differently as well. The main point is that the rules exist, and provide some assurance for how each part of your program is going to behave.
Within this context, if a type system is "strong", the rules are ironclad - this does mean that there are some valid, interesting programs that cannot be written in the language, but that you can be assured that the rules that the type system is trying to enforce are actually being applied to your program. A type system is "weak", on the other hand, if there exist ways to circumvent the rules. (This definition is closely related to the idea of a type system being "sound" or "unsound", meaning that the rules of the type system make coherent sense in the first place).
Note that this is distinct from "static typing" vs. "dynamic typing", which are better defined, and have more to do with how the rules of the type system are enforced. A statically typed language is one in which the types of its values can be determined at compile time, usually accompanied by an explicit requirement to declare the types of local/global variables, function arguments, and return values. In a dynamically typed language, the type information is handled internally within the variables at runtime, with the language not concerning itself with the value's type until it's necessary to do so.
JavaScript, for instance, is dynamically but strongly typed. Variable types
are kept internally and there's no way to declare which ones are what
(and you can end up with variables with a type you don't want at runtime),
but the type system has well defined rules for all eventualities -
as long as no FFI code is involved, you can follow the code from the beginning,
where each and every variable is instantiated, and use the language rules
to know exactly what that code is going to do. Yes, you get an error if you
try to call a method that doesn't exist on an object, but because the
JavaScript runtime has exact rules on what happens when you look up
a property that doesn't exist (getting the special value undefined
)
and what happens when you run the call operator on a value (such as undefined
)
that doesn't support it (you get a TypeError
), the type system is still strong.
Contrast that with C, which is statically but weakly typed. All variables have
a specific type associated with them, which you have to declare
(unless you're using a variant that lets you use auto
to declare a variable
if its type can be exactly inferred from the incoming expression).
Most expressions have specific rules for what types can be used together
and when type casts occur (including implicit type casts).
However, because dereferencing an invalid pointer value causes undefined behavior,
and arbitrary integers can be bit-cast to pointers (including invalid ones),
it is possible to circumvent C's type system - hence, C is weakly typed.
Rust is an interesting case - safe Rust (barring any design flaws) is strongly
and statically typed, while unsafe Rust is weakly and statically typed.
In both cases, the type system has well-defined rules (including rules with
lifetimes, which are part of the value types for each variable and expression),
but in unsafe Rust, some actions that break the type system and cause
undefined behavior are possible, and it is up to the programmer to ensure
that they are keeping the well-defined rules within the confines
of the unsafe
blocks. (Rust's type system also has some other interesting features -
in addition to the operations that are allowed on a value, the type of a value
in Rust also defines how long that variable lives, and whether the
code holding onto it has the right to change it and/or throw it away.)
And for a few other edge cases of type systems:
- Java and C# rely on both static and dynamic typing for their type systems.
Both of them feature explicitly declared variable types and object-oriented
type checking, but also keep track of the type of each object at runtime,
so that things like typecasts still work. Both languages' arrays also store
which object type they're allowed to store and will throw an
ArrayStoreException
if they store the wrong type. - Lambda calculus is statically and strongly typed, even though it doesn't have any type declarations (and is often considered "typeless"). This is because every single value has the same type: a function that takes one argument and returns one value (each of which is also a function that takes one argument and returns one value, and so on). The only allowed operations on these values are either calling them or composing them, which makes sense for these kinds of values but says virtually nothing else about how the program behaves.
- Assembly code is an interesting case, since it enforces so few rules to begin with. All values that assembly code deals with are sized chunks and ranges of bits, which are assumed to have a given meaning and format when taken as input to a specific instruction (or read as an instruction). On top of this, most of the behavior of an assembly instruction is defined by the platform it's running on top of, such as what hardware devices the memory addresses are connected to. However, the rules that assembly code does enforce tend to be enforced strongly, with the exception of don't care terms in connected hardware (or illegal opcodes in the processor itself) that lead to undefined behavior and processor states when executed.
Now, with these things said, there are several features in a language that I would certainly consider bad/unpleasant to work with in most circumstances, but would not necessarily consider to be features of a "weak" type system if the rules for when these things happen are well-defined:
- Implicit type coercion can certainly lead to frustrating errors, but does not make a type system weak. For instance, the language might go "oh, look at me! I'm going to cast this int to a float without telling you!" but as long as it defines why it thinks it's OK to cast the int to a float, and how exactly it's going to go about it, it's still strongly typed.
- Direct bit casting (i.e. interpreting the bits of a given value in a given format)
is also not reason enough to declare a type system weak
as long as 1) the values you're casting to and from
have well-defined bit patterns, and 2) the value you're casting to doesn't have
any invalid values (or the language at least has a well-defined rule for how to
deal with these invalid values). It is very easy for direct bit casting to
lead to a weak type system (such as with C's ability to arbitrarily cast a pointer
to one type to a pointer to another type), but if bit casting is offered in
a controlled way, such as with Java's
doubleToLongBits
function, it can still make for a strong type system. - Silent exceptional values (e.g.
null
for a non-existent object,NaN
for a floating point error,undefined
for missing a property lookup) that are baked into the type are not indicators of a weak type system either - as long as the language has well-defined rules for how those errors appear and how they behave. Including these possibilities simply represents a rule that the type system refuses to enforce, likely to its peril - computer science veteran Tony Hoare is famous for calling null references, a feature that he himself implemented in ALGOL back in 1965, a billion dollar mistake. - Even
goto
instructions to arbitrary lines of code do not make for a weak type system - assembly code, after all, has jump instructions. GOTO, however, does weaken a lot of the guarantees that you may want out of a type system, since anything you do at runtime to ensure that a value has a certain type, such as setting it or comparing it, can get skipped.
If "strong" or "weak" under this definition, then, does not keep out these "undesirable" features of a programming language, what does it do for us? While it enforces nothing about the job that the type system is going to do for us, knowing that a type system is strong at least enforces that its job is going to be done correctly, and any failures to do that are going to be in its implementation. In particular, it is much easier for a program with a strong type system to reliably undergo a security audit, since each of its parts, with their well-established and strongly maintained types, can be examined individually to ensure that the program is specified correctly (and if it is, but is still behaving improperly, to know that it's the compiler or runtime's fault). A weak type system cannot reliably undergo such an audit, since any other part of the program may have broken its rules, and must therefore be audited at a lower level with fewer but more strongly maintained guarantees (such as the assembly code below it).
Beyond that, though, "strength" and "weakness" don't really have anything to do with what the programmer might want out of a type system. I would definitely argue that implicit casts, reckless reinterpret casting, silent failure values, and arbitrary GOTOs are bad, but that opinion is ultimately subjective - some other programmer, at some other time, wanted those features in their language and defined rules for how they should work. The fault in this case is not in the objective ability for the type system to do its job, but in the job the type system is being asked to do.
And we can still call that "strong" or "weak" if we want - indeed, the definitions I have shared for "strong" and "weak" can probably just be called "sound" or "unsound" to make them more specific. We just can't be objective if that's what we decide to do, since everything in the wonderful world of type systems, at least at the reasons they were invented, had reasons for existing as they are.