Convert multiple JSON files to CSV Python
In this article, we will learn how to convert multiple JSON files to CSV file in Python. Before that just recall some terms :
- JSON File: A JSON file may be a file that stores simple data structures and objects in JavaScript Object Notation (JSON) format, which may be a standard data interchange format. It is primarily used for transmitting data between an internet application and a server.
- CSV File: A CSV may be a comma-separated values file, which allows data to be saved during a tabular format. CSVs appear as if a garden-variety spreadsheet but with a .CSV extension. CSV files are often used with almost any spreadsheet program, like Microsoft Excel or Google Spreadsheets.
To form a CSV file from multiple JSON files, we have to use nested json file, flatten the dataframe or to load the json files into the form of dataframe, concatenate/merge/join these to form one dataframe (at least one column should be same in all json files) and at last convert this dataframe into CSV file. This full procedure of the given task can be understood with the help of examples which are given below :
Example 1: If all columns match
In this example, we will load two json files, concatenate one to another and convert to a CSV file. The json files used for this are :
file1.json
{ "ID":{ "0":23, "1":43, "2":12, "3":13, "4":67, "5":89 }, "Name":{ "0":"Ram", "1":"Deep", "2":"Yash", "3":"Aman", "4":"Arjun", "5":"Aditya" }, "Marks":{ "0":89, "1":97, "2":45, "3":78, "4":56, "5":76 }, "Grade":{ "0":"B", "1":"A", "2":"F", "3":"C", "4":"E", "5":"C" } }
file2.json
{ "ID":{ "0":90, "1":56, "2":34, "3":96, "4":45 }, "Name":{ "0":"Akash", "1":"Chalsea", "2":"Divya", "3":"Sajal", "4":"Shubham" }, "Marks":{ "0":81, "1":87, "2":100, "3":89, "4":78 }, "Grade":{ "0":"B", "1":"B", "2":"A", "3":"B", "4":"C" } }
Step 1: Load the json files with the help of pandas dataframe.
Step 2 : Concatenate the dataframes into one dataframe.
Step 3: Convert the concatenated dataframe into CSV file.
The complete code with the result is shown below :
Code:
Python3
# importing packages import pandas as pd # load json file using pandas df1 = pd.read_json( 'file1.json' ) # view data print (df1) # load json file using pandas df2 = pd.read_json( 'file2.json' ) # view data print (df2) # use pandas.concat method df = pd.concat([df1,df2]) # view the concatenated dataframe print (df) # convert dataframe to csv file df.to_csv( "CSV.csv" ,index = False ) # load the resultant csv file result = pd.read_csv( "CSV.csv" ) # and view the data print (result) |
Output:
ID Name Marks Grade 0 23 Ram 89 B 1 43 Deep 97 A 2 12 Yash 45 F 3 13 Aman 78 C 4 67 Arjun 56 E 5 89 Aditya 76 C ID Name Marks Grade 0 90 Akash 81 B 1 56 Chalsea 87 B 2 34 Divya 100 A 3 96 Sajal 89 B 4 45 Shubham 78 C ID Name Marks Grade 0 23 Ram 89 B 1 43 Deep 97 A 2 12 Yash 45 F 3 13 Aman 78 C 4 67 Arjun 56 E 5 89 Aditya 76 C 0 90 Akash 81 B 1 56 Chalsea 87 B 2 34 Divya 100 A 3 96 Sajal 89 B 4 45 Shubham 78 C ID Name Marks Grade 0 23 Ram 89 B 1 43 Deep 97 A 2 12 Yash 45 F 3 13 Aman 78 C 4 67 Arjun 56 E 5 89 Aditya 76 C 6 90 Akash 81 B 7 56 Chalsea 87 B 8 34 Divya 100 A 9 96 Sajal 89 B 10 45 Shubham 78 C
Example 2: If some columns match
In this example, we will load two json files, merge these and convert to a CSV file. The json files used for this are :
file3.json
{ "ID":{ "0":23, "1":43, "2":12, "3":13, "4":67, "5":89 }, "Name":{ "0":"Ram", "1":"Deep", "2":"Yash", "3":"Aman", "4":"Arjun", "5":"Aditya" }, "Marks":{ "0":89, "1":97, "2":45, "3":78, "4":56, "5":76 } }
file4.json
{ "ID":{ "0":23, "1":43, "2":12, "3":67, "4":89 }, "Name":{ "0":"Ram", "1":"Deep", "2":"Yash", "3":"Arjun", "4":"Aditya" }, "Grade":{ "0":"B", "1":"A", "2":"F", "3":"E", "4":"C" } }
Step 1: Load the json files with the help of pandas dataframe.
Step 2: Merge the dataframes by different methods as inner/outer/left/right joins.
Step 3: Convert the merged dataframe into CSV file.
The complete code with the result is shown below :
Code:
Python3
# importing packages import pandas as pd # load json file using pandas df1 = pd.read_json( 'file3.json' ) # view data print (df1) # load json file using pandas df2 = pd.read_json( 'file4.json' ) # view data print (df2) # use pandas.merge method df_inner = pd.merge(df1, df2, how = 'inner' , left_on = [ 'ID' , 'Name' ], right_on = [ 'ID' , 'Name' ]) df_outer = pd.merge(df1, df2, how = 'outer' , left_on = [ 'ID' , 'Name' ], right_on = [ 'ID' , 'Name' ]) df_left = pd.merge(df1, df2, how = 'left' , left_on = [ 'ID' , 'Name' ], right_on = [ 'ID' , 'Name' ]) df_right = pd.merge(df1, df2, how = 'right' , left_on = [ 'ID' , 'Name' ], right_on = [ 'ID' , 'Name' ]) # convert dataframe to csv file df_inner.to_csv( "CSV_inner.csv" , index = False ) df_outer.to_csv( "CSV_outer.csv" , index = False ) df_left.to_csv( "CSV_left.csv" , index = False ) df_right.to_csv( "CSV_right.csv" , index = False ) # load the resultant csv file result_inner = pd.read_csv( "CSV_inner.csv" ) result_outer = pd.read_csv( "CSV_outer.csv" ) result_left = pd.read_csv( "CSV_left.csv" ) result_right = pd.read_csv( "CSV_right.csv" ) # and view the data print (result_outer) print (result_inner) print (result_left) print (result_right) |
Output:
ID Name Marks 0 23 Ram 89 1 43 Deep 97 2 12 Yash 45 3 13 Aman 78 4 67 Arjun 56 5 89 Aditya 76 ID Name Grade 0 23 Ram B 1 43 Deep A 2 12 Yash F 3 67 Arjun E 4 89 Aditya C ID Name Marks Grade 0 23 Ram 89 B 1 43 Deep 97 A 2 12 Yash 45 F 3 13 Aman 78 NaN 4 67 Arjun 56 E 5 89 Aditya 76 C ID Name Marks Grade 0 23 Ram 89 B 1 43 Deep 97 A 2 12 Yash 45 F 3 67 Arjun 56 E 4 89 Aditya 76 C ID Name Marks Grade 0 23 Ram 89 B 1 43 Deep 97 A 2 12 Yash 45 F 3 13 Aman 78 NaN 4 67 Arjun 56 E 5 89 Aditya 76 C ID Name Marks Grade 0 23 Ram 89 B 1 43 Deep 97 A 2 12 Yash 45 F 3 67 Arjun 56 E 4 89 Aditya 76 C
Example 3: If nested json file is given
In this example, we will load nested json file, flatten it and then convert into CSV file. The json file used for this is :
file5.json
{ "tickets":[ { "Name": "Liam", "Location": { "City": "Los Angeles", "State": "CA" }, "hobbies": [ "Piano", "Sports" ], "year" : 1985, "teamId" : "ATL", "playerId" : "barkele01", "salary" : 870000 }, { "Name": "John", "Location": { "City": "Los Angeles", "State": "CA" }, "hobbies": [ "Music", "Running" ], "year" : 1985, "teamId" : "ATL", "playerId" : "bedrost01", "salary" : 550000 } ], "count": 2 }
Step 1: Load the nested json file with the help of json.load() method.
Step 2: Flatten the different column values using pandas methods.
Step 3: Convert the flattened dataframe into CSV file.
Repeat the above steps for both the nested files and then follow either example 1 or example 2 for conversion. To convert a single nested json file follow the method given below.
The complete code with the result is shown below :
Code:
Python3
# importing packages import pandas as pd import json # load json file using json.load with open ( 'file5.json' ) as file : data = json.load( file ) # view data print (data) # form the dataframe df = pd.DataFrame(data[ 'tickets' ]) # view dataframe print (df) # flatten the dataframe and remove unnecessary columns for i, item in enumerate (df[ 'Location' ]): df[ 'location_city' ] = dict (df[ 'Location' ])[i][ 'City' ] df[ 'location_state' ] = dict (df[ 'Location' ])[i][ 'State' ] for i, item in enumerate (df[ 'hobbies' ]): df[ 'hobbies_{}' . format (i)] = dict (df[ 'hobbies' ])[i] df = df.drop({ 'Location' , 'hobbies' }, axis = 1 ) # view dataframe print (df) # convert dataframe to csv file df.to_csv( "CSV.csv" , index = False ) # load the resultant csv file result = pd.read_csv( "CSV.csv" ) # and view the data print (result) |
Output:
{‘tickets’: [{‘Name’: ‘Liam’, ‘Location’: {‘City’: ‘Los Angeles’, ‘State’: ‘CA’}, ‘hobbies’: [‘Piano’, ‘Sports’], ‘year’: 1985, ‘teamId’: ‘ATL’, ‘playerId’: ‘barkele01’, ‘salary’: 870000}, {‘Name’: ‘John’, ‘Location’: {‘City’: ‘Los Angeles’, ‘State’: ‘CA’}, ‘hobbies’: [‘Music’, ‘Running’], ‘year’: 1985, ‘teamId’: ‘ATL’, ‘playerId’: ‘bedrost01’, ‘salary’: 550000}], ‘count’: 2}
Location Name hobbies playerId \
0 {‘City’: ‘Los Angeles’, ‘State’: ‘CA’} Liam [Piano, Sports] barkele01
1 {‘City’: ‘Los Angeles’, ‘State’: ‘CA’} John [Music, Running] bedrost01
salary teamId year
0 870000 ATL 1985
1 550000 ATL 1985
Name playerId salary teamId year location_city location_state \
0 Liam barkele01 870000 ATL 1985 Los Angeles CA
1 John bedrost01 550000 ATL 1985 Los Angeles CA
hobbies_0 hobbies_1
0 Piano Music
1 Sports Running
Name playerId salary teamId year location_city location_state \
0 Liam barkele01 870000 ATL 1985 Los Angeles CA
1 John bedrost01 550000 ATL 1985 Los Angeles CA
hobbies_0 hobbies_1
0 Piano Music
1 Sports Running