The Golden Rule of C Programming

The C Programming Language is my all time favorite programming language. Not only do I write as much of it as possible, I have yet to come across any problem that C is incapable of solving - which is simply not the case for languages such as JavaScript or PHP. But, I would be an absolute lier if I were to claim that C is perfect. The high-level ease yet low-level control that C hands to the programmer is often criticised as not hiding enough details away from the user. Now, I like it this way, but it does have some unfortunate consiquences. For instance, in Python, to get user input, all you have to do is the following:

string = input()
print("Hello " + string)

Literally a five second job.

In C, this is made rather difficult by the fact that you need:

  • Variably sized inputs
  • Dynamic string allocation
  • Dynamic string appending

If you’re wondering, the full equivalent in C (with any length of string being supported) is the following:

#include <stdio.h>
#include <string.h>
#include <stdlib.h>

int main(int argc, char **argv)
{
    printf("Enter name:");

    int alloc = 5;
    int len = 0;
    char *buf = malloc(sizeof(char) * len);

    char in;
    while ((in = getchar()) != '\n' && in != EOF) {
        if (len + 1 > alloc) {
            alloc += 5;
            buf = realloc(buf, sizeof(char) * alloc);
        }

        buf[len++] = in;
    }

    printf("Hello %s\n", buf);
}

Quite a mouthfull. Now, this C code can handle as many characters as the operating system is physically willing to give us space to store and will carry on reading characters until a newline or standard input ends. So, basically, this does exactly what Python is likely doing under the hood. Now, at first glance this might make you think that C is an absolutely awful programming language that you should never use. But, in reality, this is a perfect example of The Golden Rule of C Programming:

Where exactly is this memory coming from?

Ethan James Marshall - 2021

Unlike in languages such as Python, memory in C is shown directly to the user as it is: a big, long list of bytes, some of which we can read from - others which we can write to. You need to find space in this big, long list which can be written to safely and which your program can keep track of. In many schools, C isn’t taught like this. Instead, teachers tie themselves in knots trying to explain pointers with “move semantics” and “pass by reference” stuff. In reality, you could avoid all this confusion by just explaining how C and memory work in computing. In addition to this, if more people understood how memory works internally, we would have a whole lot less of people asking the following:

“Why can’t C just have a function that returns an inputted string for us?”

By this, they mean something like this:

#include <stdio.h>
#include <stdlib.h>

int main(int argc, char **argv)
{
    char *string = input("This is a prompt"); /* This is a fictitious function */

    printf("%s was the string entered\n", string);
}

Obviously, this wouldn’t compile, because the function “input” is fictitious. However, you may ask yourself: how are they wrong? Why can’t the C library just return us the string the user entered? Well, that’s because the C library can’t just conjure memory out of nowhere. Where would that memory come from?

Well, let’s think to ourselves where we could possibly store this data, if we were the designers of the C standard library. Well, it can’t be stored statically in the executable, because we don’t know the size at compile time. We can’t make it a string literal, because they are stored in the constant data section (and are also a fixed size, for that matter). We can’t use the stack, because that is tied to the lifetime of the function call (and is also usually a fixed size - again). So, our only real option is to use the heap. That sounds fine, but the only problem with that would be that the code above now contains a memory leak and would need to explicitly free the string pointer returned before we can say this is safe. That means that a function call to the standard library may (and probably will) cause a memory leak - or at the very least introduce a great deal of complexity for the programmer to deal with. Not good. On the whole, it seems best that the C standard library works with just fixed size data that can be worked with totally safely and let the programmer choose if he wishes to add a layer of complexity on top of that.

Now, of course, C++ doesn’t have this program, because it can actually make pointers on the heap last the lifetime of a function call, which is not possible as a core part of the C language because of how heaps are implemented for flexibility. A C++ class/function can simply allocate as much dynamic memory as it wants and have it tied to a destructor which will be called on the scope being exited - kind of like scope-oriented heap memory.

So, essentially, whenever you find yourself getting angry at C for not making your life easy, just think to yourself. Where do I expect this memory to come from? How do I expect the C language to give me this information? If the answer is just “magic”, maybe have a think to yourself about how to program without asking miracles of the C standard library. Afer all, K&R were geniuses, but they weren’t miracle workers.

Ethan Marshall

A programmer who, to preserve his sanity, took refuge in electrical engineering. What an idiot.


First Published 2021-06-26

Categories: [ Old Blog ]