strtok implementation in C

Hello dear reader,

Today we will talk about the strtok function and how it is implemented.

How the function works

The C library function strtok breaks a string into tokens depending on the delimiter given.

Example: "Hope you are doing fine?" broken down by the delimiter " " gives the tokens hope, you, are, doing, fine. The short code below shows how it is used.

#include <string.h>
#include <stdio.h>

int main(void)
{
    char string[] = "Hope you are doing fine";
    char *delimiter = " ";
    char *token;

    printf("%s\n", string);

    token = strtok(string, delimiter);
    printf("%s\n", token);

    token = strtok(NULL, delimiter);
    printf("%s\n", token);

    token = strtok(NULL, delimiter);
    printf("%s\n", token);

    token = strtok(NULL, delimiter);
    printf("%s\n", token);

    token = strtok(NULL, delimiter);
    printf("%s\n", token);

    return (0);
}

code : example

Go ahead and run the above code with gcc main.c -o main && ./main and you get the tokenized output below.

Hope you are doing fine
Hope
you
are
doing
fine

For more explanation on how strtok works, check out here

Implementation of strtok.

The implementation is gotten from GlibC project. I will post the code and then go through it line by line.

char *_strtok(char *s, const char *delim)
{
    static char *olds;
    char *token;

    if (s == NULL)
        s = olds;

    /* Scan leading delimiters.  */
    s += strspn(s, delim);

    /* if *s points to the null byte \0, that means
        we have reached the end of the string and
        we return NULL
    */
    if (*s == '\0')
    {
        olds = s;
        return (NULL);
    }

    /* Find the end of the token.  */
    token = s;
    s = strpbrk(token, delim);
    if (s == NULL)
        /* This token finishes the string.  */
        olds = strchr(token, '\0');
    else
    {
        /* Terminate the token and make OLDS point past it.  */
        *s = '\0';
        olds = s + 1;
    }
    return (token);
}

code : 0

First we declare a static variable olds that keeps track of our string tokens in-between calls to _strtok.

Static variables are like global variables in that they are not destroyed even after the function in which they are called are removed from the stack.

We also define a variable token to store out tokenized string.

static char *olds;
char *token;

code: 1

Next we check if the string passed into the function is NULL, if so s is now made to point to olds, which is the next token.

More about this later. This should make sense if you know how strtok works.

if (s == NULL)
        s = olds;

code: 2

Then we check for the leading delimiters.

This means the delimiters before the next valid string token.

The whole string may even start with the delimiters. So we must do this to skip any delimiters to get to start of the next token.

strspn is a C library function that returns the number of characters in the initial segment of first string which consist only of characters from second string.

code: 3

if s now points to a null byte \0, we know we have reached the end of the string and we do not need to go ahead to tokenize the string.

We just make our olds to point to s (which is \0) and return NULL.

The string may in fact contain only delimiter values and this also takes care of that.

/* if *s points to the null byte \0, that means
        we have reached the end of the string and
        we return NULL
*/
    if (*s == '\0')
    {
        olds = s;
        return (NULL);
    }

code: 4

Next we need to find the end of the present token using strpbrk.

code: 5

Next we check if s is now null or if it is actually pointing to the start of a new token.

if s is null, We set olds to s '\0' and then return NULL as we have reached the end of the string.

if s is not null, we have gotten the beginning of the next token, we should also find it's end which is where we terminate the token with a null byte.

For this task we will use the strbrk C library function.

strbrk returns a pointer to the character in s1 that matches one of the characters in s2, else returns NULL.

code: 6

Then we now return token, which is the tokenized string.

return token.

code: 7

Calling the strtok function again, olds either points to the start of the new token or to null value and so on and so forth.

It take a bit of practice getting used to, but I hope it makes a little sense.

Feel free to ask your questions. See you next time.

Bye.

Implementation of strtok C Library Function

Table of contents