Thursday, April 16, 2015

source code of separation of string into substring at specified delimeter in C(Line parsing)

In writing a program , we may reach to a situation where we need to read the string and seperate the string depending upon certain symbol or delimeter. Here I have written a simple program to parse the string with new line character.

Requirement:

let's say I have a string as "this \n is  \n a \n test \n program" and I need to read the string and seperate the string. To acheive this, we have a strtok function included in string.h.

Source code:
#include <stdio.h>
#include <string.h>


int main()
{
	char *token, *remstr=NULL ;
char str[] = " this \n is \n the \n test\n program";
token = strtok_r(str,"\n",&remstr);



        

		while(token != NULL)
		{
		
		    printf("here i=%s\n",token);
		    printf("remstr=%s\n",remstr);

		    token = strtok_r(NULL, "\n", &remstr);
		
		}

		return 0;

}

strtok function takes the string and the seperation symbol and this function scan the seperator and save the seperated sub string at token .  At remstr the remaining string is stored and the loop is repeated until there is substring at the remaining string. The loop is repeated until token has null value. When the last substirng is reached, the strtok function will produce the null value and this null value is the indication of end of string.

Note: At the end of string , the remaining string will store a null value in mac where as in ubuntu/linux the empty string is stored.

similarly we can seperate the string to sub string depending upon different seperator as per our requirement  in the program using strtok or strtok_r function included in string.h header.


In large program , we may come across the situation that we need to accumulate the seperated token or substring in to the array of string  and return the array of substing to the main calling function to perform the specific task for each substring.

for accmulation of substring , I have used the list of glib library.

source code:
#include <stdio.h>
#include <string.h>
#include <glib.h>

char *col_trim_whitespace(char *str)
{
  char *end;

  // Trim leading space
  while(isspace(*str)) str++;

  if(*str == 0)  // All spaces?
    return str;

  // Trim trailing space
  end = str + strlen(str) - 1;
  while(end > str && isspace(*end)) end--;

  // Write new null terminator
  *(end+1) = 0;

  return str;
}


GSList* line_parser(char *str,GSList* list)
{


        
        char *token, *remstr=NULL ;
 
		token = strtok_r(str,"\n",&remstr);


		while(token != NULL)
		{
			if(token[0] == ' ')
			{

			token = col_trim_whitespace(token);
			if(strcmp(token,"")==0)
		         {
		             token = strtok_r(NULL, "\n", &remstr);
		              continue;
		          }
		    }

		    list = g_slist_append(list, token);
            token = strtok_r(NULL,"\n",&remstr);
            
             


		}


        

		
		return list;


}

int main()
{
	

 int *av,i,j,length;
 i=0;


char str[] = " this";

 GSList* list = NULL;

 
GSList *list1 = line_parser(str,list);
// printf("The list is now %d items long\n", g_slist_length(list));
 length = g_slist_length(list1);
for(int j=0;j<length;j++)printf("string = %s\n",(char *)g_slist_nth(list1,j)->data);
return 0;
}

No comments:

Post a Comment