Wednesday, September 16, 2015

Transposing csv data using Python

Here is the case:

  • You have this original csv data:

name,course,number
A,Genius,12
A,Super,13
B,Goddess,15
C,The Man,17
C,The Woman,10
C,Harvard,11
C,StanFord,18

  • and you want to create another csv file that uses name's values as columns and group the course's value as rows:

A,B,C
Genius,Goddess,The Man
Super,'',The Woman
'','',Harvard
'','',StanFord


Here is what I did to achieve that:

1. Read the original csv file and convert it into a list of dicts

2. Transpose the list by grouping the name column:

  • Get all the name's value

group_cols = []
for o in original_list:
if o[group_col_name].strip() not in group_cols:
group_cols.append(o[group_col_name].strip())

  • For each of the name, loop through the original list and append the according course's value to a tmp list (with the name as the first item), then add all of the tmp lists to a new list:
thelist = []
for c in group_cols:
line = []
line.append(c)
for o in original_list:
if o[group_col_name].strip() == c:
line.append(o[data_col_name].strip())
thelist.append(line)

=> thelist = [['A', 'Genius', 'Super'], ['B', 'Goddess'], ['C', 'The Man', 'The Woman', 'Harvard', 'Stanford']]
  • Equalize all the child lists's length (here):
thelist = equalize_lists_length(thelist)

=> thelist = [['A', 'Genius', 'Super', '',''], ['B', 'Goddess','','',''], ['C', 'The Man', 'The Woman', 'Harvard', 'Stanford']]
  • Transpose the new list using itertools.izip:
thelist = list(izip(*thelist))


=> thelist = [['A', 'B', 'C'], ['Genius', 'Goddess', 'The Man'],...]

3. Write the new list to the csv file

Here is the full code:



Usage:

>>> transposed_list_to_csv('original_csv_path', 'new_csv_path', 'name', 'course')