How to split a string in C/C++, Python and Java?
Last Updated :
18 Apr, 2023
Splitting a string by some delimiter is a very common task. For example, we have a comma-separated list of items from a file and we want individual items in an array.
Almost all programming languages, provide a function split a string by some delimiter.
In C:
// Splits str[] according to given delimiters.
// and returns next token. It needs to be called
// in a loop to get all tokens. It returns NULL
// when there are no more tokens.
char * strtok(char str[], const char *delims);
C
#include <stdio.h>
#include <string.h>
int main()
{
char str[] = "Geeks-for-Geeks" ;
char *token = strtok (str, "-" );
while (token != NULL)
{
printf ( "%s\n" , token);
token = strtok (NULL, "-" );
}
return 0;
}
|
Output: Geeks
for
Geeks
Time complexity : O(n)
Auxiliary Space: O(n)
In C++
Note: The main disadvantage of strtok() is that it only works for C style strings.
Therefore we need to explicitly convert C++ string into a char array.
Many programmers are unaware that C++ has two additional APIs which are more elegant
and works with C++ string.
Method 1: Using stringstream API of C++
Prerequisite: stringstream API
Stringstream object can be initialized using a string object, it automatically tokenizes strings on space char. Just like “cin” stream stringstream allows you to read a string as a stream of words. Alternately, we can also utilise getline function to tokenize string on any single character delimiter.
Some of the Most Common used functions of StringStream.
clear() — flushes the stream
str() — converts a stream of words into a C++ string object.
operator << — pushes a string object into the stream.
operator >> — extracts a word from the stream.
The code below demonstrates it.
C++
#include <bits/stdc++.h>
using namespace std;
void simple_tokenizer(string s)
{
stringstream ss(s);
string word;
while (ss >> word) {
cout << word << endl;
}
}
void adv_tokenizer(string s, char del)
{
stringstream ss(s);
string word;
while (!ss.eof()) {
getline(ss, word, del);
cout << word << endl;
}
}
int main( int argc, char const * argv[])
{
string a = "How do you do!" ;
string b = "How$do$you$do!" ;
simple_tokenizer(a);
cout << endl;
adv_tokenizer(b, '$' );
cout << endl;
return 0;
}
|
Output : How
do
you
do!
Time Complexity: O(n)
Auxiliary Space:O(n)
Where n is the length of the input string.
Method 2: Using C++ find() and substr() APIs.
Prerequisite: find function and substr().
This method is more robust and can parse a string with any delimiter, not just spaces(though the default behavior is to separate on spaces.) The logic is pretty simple to understand from the code below.
C++
#include <bits/stdc++.h>
using namespace std;
void tokenize(string s, string del = " " )
{
int start, end = -1*del.size();
do {
start = end + del.size();
end = s.find(del, start);
cout << s.substr(start, end - start) << endl;
} while (end != -1);
}
int main( int argc, char const * argv[])
{
string a = "How$%do$%you$%do$%!" ;
tokenize(a, "$%" );
cout << endl;
return 0;
}
|
Output: How
do
you
do
!
Time Complexity: O(n)
Auxiliary Space:O(1)
Where n is the length of the input string.
Method 3: Using temporary string
If you are given that the length of the delimiter is 1, then you can simply use a temp string to split the string. This will save the function overhead time in the case of method 2.
C++
#include <iostream>
using namespace std;
void split(string str, char del){
string temp = "" ;
for ( int i=0; i<( int )str.size(); i++){
if (str[i] != del){
temp += str[i];
}
else {
cout << temp << " " ;
temp = "" ;
}
}
cout << temp;
}
int main() {
string str = "geeks_for_geeks" ;
char del = '_' ;
split(str, del);
return 0;
}
|
Time complexity : O(n)
Auxiliary Space: O(n)
In Java :
In Java, split() is a method in String class.
// expregexp is the delimiting regular expression;
// limit is the number of returned strings
public String[] split(String regexp, int limit);
// We can call split() without limit also
public String[] split(String regexp)
Java
import java.io.*;
public class Test
{
public static void main(String args[])
{
String Str = new String( "Geeks-for-Geeks" );
for (String val: Str.split( "-" , 2 ))
System.out.println(val);
System.out.println( "" );
for (String val: Str.split( "-" ))
System.out.println(val);
}
}
|
Output:
Geeks
for-Geeks
Geeks
for
Geeks
Time complexity : O(n)
Auxiliary Space: O(1)
In Python:
The split() method in Python returns a list of strings after breaking the given string by the specified separator.
// regexp is the delimiting regular expression;
// limit is limit the number of splits to be made
str.split(regexp = "", limit = string.count(str))
Python3
line = "Geek1 \nGeek2 \nGeek3"
print (line.split())
print (line.split( ' ' , 1 ))
|
Output:
['Geek1', 'Geek2', 'Geek3']
['Geek1', '\nGeek2 \nGeek3']
Time Complexity : O(N), since it just traverse through the string finding all whitespace.
Auxiliary Space : O(1), since no extra space has been used.
This article is contributed by Aarti_Rathi and Aditya Chatterjee.
Aarti_Rathi and Aditya Chatterjee
Please Login to comment...