char_magic - My learnings on writing my own string library in C

char_magic - My learnings on writing my own string library in C

Currently, I am trying to get deeper into software development and the origins of higher-level languages, like Java, JavaScript, Python, etc.

I was frustrated with the missing implementation of strings in C and that's why I tried to implement my strings in C, called char_magic

It has already been done multiple times

I am not the only one who did it. There must be thousands of libraries out there, that deal with just that. So don't get me wrong. I did not try to create something that brings value to production.

It was rather an experiment to get a grasp of how things work internally. What it means when we're talking about a plain char array with a fixed size. This was a great experience and I would appeal to every developer to make it.

What was the trigger to start this side project?

In Python, Javascript, Java, you name it, it's quite easy to split a string. Just refer to:

const str = "Hello World"

str.split(" ");
// will return ["Hello", "World"]

Well, it's not that easy to achieve in C.

What is a string at all?

A string is not just an array of chars. Strings in modern languages support a lot of things that I stopped thinking about:

  • getting the length of a string

  • concatenate a string

  • sorting

  • splitting a string

  • checking if it's equal to another

  • toLowercase() and toUppercase()

  • ...

While for a lot of programmers (me included), it first looks like a primitive type, it isn't. Check what I learned besides that.

What I've learned

Arrays are not lists/vectors.

You might be thinking, ok then... Just create an array of chars and append the new ones. Maybe some for loops here and there, and finished.

Well, you might have guessed it with this phrasing... It's not that simple.

In C, Arrays consist of a fixed size, that has to be defined at the declaration of the array. It's not possible to grow it dynamically out of the box.

For that reason, you need to rely on malloc and realloc.

I created a structure called cm_string_builder and gave it the following properties.

typedef struct {
    char* string;
    int length;
    int capacity;
} cm_string_builder;

You can see, that the actual string remains a char*. It receives a length and a capacity. The latter is required, to make the array grow in size.

Whenever I want to append something to the string, like it would be done in most languages, by just putting a + sign between two strings, I have to call a function cm_string_builder_append.

Don't be afraid when reading it. I will clear the details ASAP.

void cm_string_builder_append(cm_string_builder* builder, char c) {
    if (builder->length == builder->capacity) {
        size_t new_capacity = builder->capacity * 2;
        builder->string = realloc(builder->string, new_capacity);
        builder->capacity = new_capacity;
    }
    builder->string[builder->length++] = c;
}

It gets the reference to a cm_string_builder and the actual char to append.

The if clause then checks, if the current length of the string exceeds the allocated char array size. If so, the array gets reallocated with double the amount of the previous string.

The return value of realloc is a pointer to the new location of the string property.

printf will play games on you

const myStr[] = "Hello World";

printf("%s", myStr);

Will output Hello World. But the thing is, how does printf know, how long your string is?

It prints everything until it reaches a so-called NULL-Byte / NULL-terminator (\0). You can read more about it on the web.

That means if I concatenate two strings and I either accidentally copy the NULL-Byte in between, I'm fucked and pleased to find a bug for over an hour. Guess what happened :).

That's why you will find checks for this NULL byte in the code multiple times.

Splitting the string - Finally

It's not considered a learning and I think you are better up when reading the code yourself.

The final example of the usage of this library is as follows

int main(void) {
    cm_string_builder* builder = cm_string_builder_from_char_pointer("Alexander,Panov,C,Arrays,Suck");
    cm_string_builder_append_string_view(
        builder, cm_string_view_from_char_pointer(",Very,Hard")
    ); // create a string builder and append a string to it

    cm_string_view view = cm_string_builder_build(builder); // string builder to string view, frees the memory of string_builder
    cm_string_view args = cm_string_view_from_char_pointer(view.string); // actual char* with null byte

    cm_string_view* views = cm_string_view_split(args, ',');

    for (int i = 0; i != 100; i++) {
      cm_string_view current_view = views[0];
      if (current_view.string == NULL)
        break;
      printf("%s\n", current_view.string);
    }
    return 0;
}

Get deeper understanding

If you want to learn a bit more about my micro instant-suspended project, check out the GitHub Repository

https://github.com/IJustDev/char_magic/

Huge shout out to YouTube channels that taught me a lot about this recently: