Implementation of strtok C Library Function

Hello dear reader,

Today we will talk about the strtok function and how it is implemented.

How the function works

The C library function strtok breaks a string into tokens depending on the delimiter given.

Example: "Hope you are doing fine?" broken down by the delimiter " " gives the tokens hope, you, are, doing, fine. The short code below shows how it is used.

#include <string.h>
#include <stdio.h>

int main(void)
{
    char string[] = "Hope you are doing fine";
    char *delimiter = " ";
    char *token;

    printf("%s\n", string);

    token = strtok(string, delimiter);
    printf("%s\n", token);

    token = strtok(NULL, delimiter);
    printf("%s\n", token);

    token = strtok(NULL, delimiter);
    printf("%s\n", token);

    token = strtok(NULL, delimiter);
    printf("%s\n", token);

    token = strtok(NULL, delimiter);
    printf("%s\n", token);

    return (0);
}
code : example

Go ahead and run the above code with gcc main.c -o main && ./main and you get the tokenized output below.

Hope you are doing fine
Hope
you
are
doing
fine

For more explanation on how strtok works, check out here

Implementation of strtok.

The implementation is gotten from GlibC project. I will post the code and then go through it line by line.

char *_strtok(char *s, const char *delim)
{
    static char *olds;
    char *token;

    if (s == NULL)
        s = olds;

    /* Scan leading delimiters.  */
    s += strspn(s, delim);

    /* if *s points to the null byte \0, that means
        we have reached the end of the string and
        we return NULL
    */
    if (*s == '\0')
    {
        olds = s;
        return (NULL);
    }

    /* Find the end of the token.  */
    token = s;
    s = strpbrk(token, delim);
    if (s == NULL)
        /* This token finishes the string.  */
        olds = strchr(token, '\0');
    else
    {
        /* Terminate the token and make OLDS point past it.  */
        *s = '\0';
        olds = s + 1;
    }
    return (token);
}
code : 0
  • First we declare a static variable olds that keeps track of our string tokens in-between calls to _strtok.

Static variables are like global variables in that they are not destroyed even after the function in which they are called are removed from the stack.

We also define a variable token to store out tokenized string.

static char *olds;
char *token;
code: 1
  • Next we check if the string passed into the function is NULL, if so s is now made to point to olds, which is the next token.

More about this later. This should make sense if you know how strtok works.

if (s == NULL)
        s = olds;
code: 2
  • Then we check for the leading delimiters.

This means the delimiters before the next valid string token.

The whole string may even start with the delimiters. So we must do this to skip any delimiters to get to start of the next token.

strspn is a C library function that returns the number of characters in the initial segment of first string which consist only of characters from second string.

Read more about strspn here

The addition + to s moves s beyond the delimiters and points to the start of the next token.

    /* Scan leading delimiters.  */
    s += strspn(s, delim);
code: 3

if s now points to a null byte \0, we know we have reached the end of the string and we do not need to go ahead to tokenize the string.

We just make our olds to point to s (which is \0) and return NULL.

The string may in fact contain only delimiter values and this also takes care of that.

/* if *s points to the null byte \0, that means
        we have reached the end of the string and
        we return NULL
*/
    if (*s == '\0')
    {
        olds = s;
        return (NULL);
    }
code: 4
  • Next we need to find the end of the present token using strpbrk.

Read more about strbrk here.

We assign token to point at s here which is the beginning of the new string token.

We do this so as to still have a reference to the start of the string.

/* Find the end of the token.  */
    token = s;
    s = strpbrk(token, delim);
code: 5
  • Next we check if s is now null or if it is actually pointing to the start of a new token.

    if s is null, We set olds to s '\0' and then return NULL as we have reached the end of the string.

    if s is not null, we have gotten the beginning of the next token, we should also find it's end which is where we terminate the token with a null byte.

For this task we will use the strbrk C library function.

strbrk returns a pointer to the character in s1 that matches one of the characters in s2, else returns NULL.

Read more about strbrk here.

if it returns NULL, we use the strchr function to point olds to a null byte. Which is the last token.

if it returns a non null value terminate the token by making *s = '\0' then move olds one step ahead like this olds = s + 1 to point to the start of the new token.

if (s == NULL)
        /* This token finishes the string.  */
        olds = strchr(token, '\0');
    else
    {
        /* Terminate the token and make OLDS point past it.  */
        *s = '\0';
        olds = s + 1;
    }
code: 6

Then we now return token, which is the tokenized string.

return token.

code: 7

Calling the strtok function again, olds either points to the start of the new token or to null value and so on and so forth.

It take a bit of practice getting used to, but I hope it makes a little sense.

Feel free to ask your questions. See you next time.

Bye.