Lately, I’ve been wrestling with the question, “What is a Programming Language?” Now, I’m not talking about what computer science is. As I am not a Computer Scientist myself, I will not hazard to try and answer that question. Feel free to refer to your local CS major to answer that question. I’m concerned with the question of what a Programming Language is.
My programming journey began with QBASIC on an old Windows 3.1 operating system. Later in life, I discovered VBA in Excel and SQL for querying databases. Along the way, I dabble in Groovy, struggle with MATLAB, suffer with C++, get lost in C#, and ultimately land on F#. The only reason I am attracted to F# is the Units of Measure feature. I create financial models of supply chains, and keeping track of units can consume significant time. I consistently run into bugs where units are not correctly converted. With F#’s Units of Measure feature, I am confident that my algebra is correct, and I never look back. At this point, I won’t seriously consider another language unless it has some facilities for verifying algebra at compile time.
I had started life as a Chemical Engineer (ChemE) but quickly learned that I didn’t want to babysit tools in a clean room for the rest of my life. During my internships, I see that companies are good at keeping their tools running but struggle to schedule their facility to maximize efficiency. For this reason, I return to school to study Industrial Engineering (IE). I remember one of my ChemE professors chiding me, saying, “Matthew, you know we call IE Imaginary Engineering, right?” Despite my ChemE professor’s low opinion of IE’s, I decide to continue. Now armed with a degree in IE, I quickly find that the companies I work with lack the staffing to build the scheduling tools they need. I know the algorithm to solve their problem, but they don’t have a programmer to implement it. Undaunted, I bought some books and started hacking away.
As I keep building new solutions and evolving my skills, I start thinking about the design of programming languages. I’ll be honest; when I learned F#, I had no idea what a lambda function was or why someone should care. It’s funny now how indifferent I was to all the features many people rave about when it comes to F#. Higher-Order Functions? Whatever. Immutability? Shrug. I had no idea why F# worked so well for my problems, I just knew that it did, and that was enough for me.
Complexity Drives Growth
Over time I begin tackling more complex problems. I am forced to grow my understanding of how F#, and programming languages in general, work. I start to appreciate the beauty of F#. I realize why libraries are formed the way they are and how these functions and types interact. I am learning about Higher-Order Functions, Functors, Monads, and the beautiful world of Category Theory.
In my search, I come across Bartosz Milewski and his excellent book “Category Theory for Programmers.” I think I have come to the holy land. Category Theory is the answer! If we have an expressive enough type system, we can solve all our problems. I form the opinion that a programming language is a means of expressing a domain and the transformations of that domain. Programs are wrong because we lack the facilities to express these relationships. A good programming language allows us to fully express the problem domain in a way that makes incorrect states unrepresentable. I join the bandwagon of F# developers who call for adding Higher-Kinded Types and Types Classes to F#.
To my dismay, these requests are turned down. I begin to form a grudge against F#. I don’t say anything because I make my living writing F#, and you don’t poison the well you drink from. I believe that F# is an inferior programming language, and I begin looking at Haskell wistfully. “My life would be so much better if I could just work in a ‘real’ programming language,” I think.
I spend several years in this state. Begrudging the “limitations” of F# but still being wildly productive, despite my frustrations. I consistently run into roadblocks with the type system. If the F# Slack had enough history, you could go and see some of my frustrations spill over. Time and again, I am forced to engineer around the “shortcomings” of F#.
Performance, the Harsh Tutor
At this point, I have a high opinion of myself. I view myself as a competent F# developer, and some scars from having to build and maintain some large, complex pieces of code. I think that I’m decent at my job and can deliver features in a reasonable amount of time. I mentor several junior developers and see them become successful developers in their own right. I must be a real Senior developer at this point, right?
I have always been a performance junkie, but most of my efforts have been focused on the performance of supply chains and manufacturing facilities. I had never turned my gaze to the performance of programming languages and computers. I know some languages are slow (Python, Ruby, VBA), and some are fast (C, C++, FORTRAN). But I have never dove into why this is the case. I have never run into a situation where the performance of my code is a problem. Almost all the performance issues I have encountered have been because of slow SQL queries or too many network calls. Then one day, I am working on a problem where a service suddenly sees the time to complete an analysis take twice as long. No one on the team seems to know why so I volunteer to diagnose the issue.
I begin benchmarking the service, trying to find the root cause. After a day, I realize that something is happening when the number of items exceeds a certain threshold. I find that odd, so I start profiling. Eventually, I found that our code spends much of the time adding items to a dictionary. I continue pulling on the thread and find that .NET has some interesting behavior regarding GetHashCode
and equality. It turns out all of the items being added to the Dictionary are hashing to the same bucket in the Dictionary, so our Dictionary has turned into a List. O(1) operations are now O(n) which is why our service performance is blowing up.
How can something as simple as hash codes and equality cause such a performance difference? Little did I know that I was beginning to tumble down the rabbit hole after Alice. I have no idea how deep of a hole I have stepped into. F# has fantastic defaults which guide you down the pit of success. I could be a successful developer without understanding how things worked. Until now, I was so confident in my grasp of types, functions, and Category Theory that I thought I could tackle most problems. What I have failed to grasp up to this point is that a programming language is not just a tool for expressing a problem domain; it is a means of communicating with a computer. It seems so obvious in retrospect it is embarrassing. I laugh at myself now for being so profoundly naïve. Until now, the hardware was wholly abstracted away, so I did not even need to think about it. Now that I saw the implications, I could not unsee it.
By seeking to understand how F# works, I am suddenly exposed to a whole world of design that I never thought about. I dive into how CPUs operate, and I see that they are data factories, not unlike the manufacturing facilities that I am spending all my time optimizing. Data moves into a register, operations are applied to the data, and then it is returned to memory. You can think of products being manufactured in a facility similarly. Raw material moves onto a machine, the machine applies a transformation, and the updated material is then moved to a new operation.
A new paradigm for thinking about programming languages emerges, “This is just a tool for expressing operations over data.” I am mad at myself for being so thoroughly entranced by the ideas of Category Theory. All this abstraction is just obfuscating what I am trying to do. I have data; I want to transform that data and then present the result. Nothing more, nothing less. How much time have I wasted trying to create the perfect Monad? Who cares whether some types form a Bifunctor or not? I need to transform some data and get on with my life.
Synthesis
I now disdain my years of loving Category Theory and being entranced by the idea of eliminating bugs through expressive type systems. I am now obsessed with performance and how to extract the most performance from a computer. I take pride in the fact that my simulations are several orders of magnitude faster than other industry-leading tools. I read every resource I can on game engines and how C developers extract the most performance possible from a CPU. I am still working in F# but writing a very different F#. I am keenly aware of when I am passing a value or a reference. I see where memory is being copied, and allocations are happening. I stay off of the Heap as much as possible. I am thinking about registers and the L1 cache. I am measuring branch mispredictions and cache misses. I view any inefficiency as a failure, but at the end of the day, I’m exhausted.
Becoming aware of how CPUs work is both a blessing and a curse. It is a blessing that you now know how to make your code orders of magnitude faster. It is a curse in that you see inefficiency everywhere. At the beginning of my degree in IE, my graduate advisor Dr. Kim warned me, “Matthew, once you learn how to design an efficient facility, you are going to start seeing inefficiency everywhere in the world…and it will drive you mad.” Well, Dr. Kim, you were right. I see it. I see it in supply chains and in my code, and yes, I have gone mad.
I have come to a new place now. I no longer need every line of code I write to be the most efficient thing possible. I see Category Theory’s merits and how it helps us write correct code. At the same time, I have to synthesize that with my understanding of how a CPU works. When I think about the original question, “What is a programming language?” I have to take a step back. On the one hand, it is a tool of thought for expressing a problem domain. A well-formed set of types and functions can make it easy to solve a problem and maintain robust code. On the other hand, a programming language is a tool for us to harness the power of computers. It allows us to transform data to answer questions.
A great programming language allows us to do both. We can both accurately describe our domain so that other humans can understand our code, while computers can efficiently compute the results. It is not either/or; it’s both/and. No language is perfect. All languages are a product of compromises. So I continue to learn to thread the needle between elegant domain representations and efficient code. It is staggering how far programming languages have come in the last few decades. I, for one, look forward to what beautiful things that may come.
Thoughts? Email me at hi@fastfsharp.com to continue the conversation 😊.