Sometimes, while working with Python tuples, we can have a problem in which we need to perform concatenation of records from the similarity of initial element. This problem can have applications in data domains such as Data Science. Let’s discuss certain ways in which this task can be performed.
Input : test_list = [(5, 6), (5, 7), (5, 8), (6, 10), (7, 13)]
Output : [(5, 6, 7, 8), (6, 10), (7, 13)]
Input : test_list = [(5, 6), (6, 7), (6, 8), (6, 10), (7, 13)]
Output : [(5, 6), (6, 7, 8, 10), (7, 13)]
Method #1 : Using loop
This is brute way in which this task can be done. In this, we create new tuple, if we find no occurrence of similar tuple values previously. Slicing is used to add rest of elements to created tuple.
Python3
test_list = [( 5 , 6 ), ( 5 , 7 ), ( 6 , 8 ), ( 6 , 10 ), ( 7 , 13 )]
print ( "The original list is : " + str (test_list))
res = []
for sub in test_list:
if res and res[ - 1 ][ 0 ] = = sub[ 0 ]:
res[ - 1 ].extend(sub[ 1 :])
else :
res.append([ele for ele in sub])
res = list ( map ( tuple , res))
print ( "The extracted elements : " + str (res))
|
Output :
The original list is : [(5, 6), (5, 7), (6, 8), (6, 10), (7, 13)]
The extracted elements : [(5, 6, 7), (6, 8, 10), (7, 13)]
Time complexity: O(n) where n is the number of tuples in the list.
Auxiliary space: O(n) where n is the number of tuples in the list.
Method #2 : Using defaultdict() + loop
The combination of above functions can be used to solve this problem. The advantages it offers from above method are, that it reduces one check of initializing new key, and also works well even if similar elements are not consecutive.
Python3
from collections import defaultdict
test_list = [( 5 , 6 ), ( 5 , 7 ), ( 6 , 8 ), ( 6 , 10 ), ( 7 , 13 )]
print ( "The original list is : " + str (test_list))
mapp = defaultdict( list )
for key, val in test_list:
mapp[key].append(val)
res = [(key, * val) for key, val in mapp.items()]
print ( "The extracted elements : " + str (res))
|
Output :
The original list is : [(5, 6), (5, 7), (6, 8), (6, 10), (7, 13)]
The extracted elements : [(5, 6, 7), (6, 8, 10), (7, 13)]
Time Complexity: O(n)
Auxiliary Space: O(n)
Method #3: Using for loop
Python3
test_list = [( 5 , 6 ), ( 5 , 7 ), ( 6 , 8 ), ( 6 , 10 ), ( 7 , 13 )]
print ( "The original list is : " + str (test_list))
res = []
x = []
for i in test_list:
if i[ 0 ] not in x:
x.append(i[ 0 ])
for i in x:
p = []
p.append(i)
for j in test_list:
if i = = j[ 0 ]:
p.append(j[ 1 ])
res.append(p)
print ( "The extracted elements : " + str (res))
|
Output
The original list is : [(5, 6), (5, 7), (6, 8), (6, 10), (7, 13)]
The extracted elements : [[5, 6, 7], [6, 8, 10], [7, 13]]
Time complexity: O(n^2), where n is the length of the input list test_list.
Auxiliary space: O(n), where n is the length of the input list test_list.
Method #4: using dictionary and list comprehension
step-by-step algorithm for the given approach
- Initialize an empty dictionary temp_dict to store the unique initial elements as keys and a list of corresponding second elements as values.
- Iterate through each tuple x in the given list test_list.
- If the initial element of x already exists as a key in temp_dict, then append the second element of x to the corresponding value list.
If the initial element of x does not exist as a key in temp_dict, then add the initial element as a new key and a list containing the second element as its value.
- After all tuples are processed, create a new list res by iterating through the key-value pairs of temp_dict, and joining each key with its corresponding list of values.
- Return the final list res as the result.
Python3
test_list = [( 5 , 6 ), ( 5 , 7 ), ( 6 , 8 ), ( 6 , 10 ), ( 7 , 13 )]
print ( "The original list is : " + str (test_list))
temp_dict = {}
for x in test_list:
temp_dict[x[ 0 ]] = temp_dict.get(x[ 0 ], []) + list (x[ 1 :])
res = [(k,) + tuple (v) for k, v in temp_dict.items()]
print ( "The extracted elements : " + str (res))
|
Output
The original list is : [(5, 6), (5, 7), (6, 8), (6, 10), (7, 13)]
The extracted elements : [(5, 6, 7), (6, 8, 10), (7, 13)]
Time Complexity:
The time complexity of the above algorithm is O(n), where n is the length of test_list. It requires one traversal of test_list and temp_dict.
Auxiliary Space:
The auxiliary space complexity of the above algorithm is O(n), where n is the length of test_list. It requires temp_dict to store the values extracted from test_list. The size of temp_dict will be proportional to the number of unique keys in test_list.
Method #5: Using itertools.groupby()
Step-by-step approach:
- Import groupby from itertools.
- Loop through the groups in test_list generated by groupby function. Group the tuples based on their first element (key=lambda x: x[0]).
- Extract the values of each group and store them in a list (values).
- Append a new tuple to the result list res that contains the key and all the values.
- Print the result.
Below is the implementation of the above approach:
Python3
from itertools import groupby
test_list = [( 5 , 6 ), ( 5 , 7 ), ( 6 , 8 ), ( 6 , 10 ), ( 7 , 13 )]
print ( "The original list is : " + str (test_list))
res = []
for k, g in groupby(test_list, key = lambda x: x[ 0 ]):
values = [v for _, v in g]
res.append((k, * values))
print ( "The extracted elements : " + str (res))
|
Output
The original list is : [(5, 6), (5, 7), (6, 8), (6, 10), (7, 13)]
The extracted elements : [(5, 6, 7), (6, 8, 10), (7, 13)]
Time complexity: O(n log n), where n is the length of test_list because we are sorting the input list by the first element before grouping.
Auxiliary space: O(n), where n is the length of test_list because we are creating a new list to store the result.
Method #6: Using pandas
Import the pandas library.
Create a DataFrame from the list test_list, with column names “A” and “B” for the two elements in each tuple.
Group the DataFrame by column “A” and aggregate column “B” as a list for each group.
Reset the index of the resulting DataFrame.
Convert the DataFrame to a list of tuples.
Python3
test_list = [( 5 , 6 ), ( 5 , 7 ), ( 6 , 8 ), ( 6 , 10 ), ( 7 , 13 )]
import pandas as pd
df = pd.DataFrame(test_list, columns = [ "A" , "B" ])
grouped = df.groupby( "A" )[ "B" ]. apply ( list )
res = [ tuple ([k] + v) for k, v in grouped.items()]
print ( "The extracted elements : " + str (res))
|
OUTPUT :
The extracted elements : [(5, 6, 7), (6, 8, 10), (7, 13)]
Time complexity: O(n log n), where n is the length of test_list.
Auxiliary space: O(n), where n is the length of test_list.
Method #7 : Using recursion
Step-by-step approach:
Define a recursive function that takes the input list and an index as parameters.
Inside the function, check if the index is equal to the length of the list minus 1. If it is, return the list.
Otherwise, compare the initial element of the tuple at the current index with the initial element of the tuple at the next index.
If they are the same, concatenate the tuples and recursively call the function with the updated list and index.
If they are different, recursively call the function with the original list and the next index.
Call the recursive function with the input list and index 0.
Print the extracted elements.
Python3
def join_tuples(test_list, index):
if index = = len (test_list) - 1 :
return test_list
elif test_list[index][ 0 ] = = test_list[index + 1 ][ 0 ]:
test_list[index] + = test_list[index + 1 ][ 1 :]
test_list.pop(index + 1 )
return join_tuples(test_list, index)
else :
return join_tuples(test_list, index + 1 )
test_list = [( 5 , 6 ), ( 5 , 7 ), ( 6 , 8 ), ( 6 , 10 ), ( 7 , 13 )]
print ( "The original list is : " + str (test_list))
res = join_tuples(test_list, 0 )
print ( "The extracted elements : " + str (res))
|
Output
The original list is : [(5, 6), (5, 7), (6, 8), (6, 10), (7, 13)]
The extracted elements : [(5, 6, 7), (6, 8, 10), (7, 13)]
The time complexity of this method is O(n^2) in the worst case, where n is the length of the input list.
The auxiliary space complexity is O(1) since the modifications are done in-place.
Please Login to comment...