Thursday, December 5, 2013

Python - A hack to process a long list of data faster

I've learned a hack to process a long list of data faster from a colleague. It's just great. Here is the case:


Assuming I have list of thousands of users info to send email and do some other stuffs with those users.

1. First, when I loop through each user, translate the username to number with:

def str_to_num(input_str):
    return_num = 0
    for ch in input_str:
        return_num += ord(ch)

    return return_num




2. Get the result of:

num = translated_username % 10


3. If the result of the previous calculation is in the input condition (when I run the script), then execute the main function (email,...)


if num in condition_number:
      main_function()



The main function:

...

def get_dict_data_from_csv_file(csv_file_path):
    csv_file = open(csv_file_path, 'rb')
    csv_file.seek(0)
    sniffdialect = csv.Sniffer().sniff(csv_file.read(10000), delimiters='\t,;')
    csv_file.seek(0)
    dict_reader = csv.DictReader(csv_file, dialect=sniffdialect)
    csv_file.seek(0)
    dict_data = []
    for record in dict_reader:
        dict_data.append(record)

    csv_file.close()

    return dict_data



def main_function(src_data, condition_number):

    for dat in src_data:
         num = str_to_num(dat['username']) % 10
         if num in condition_number:
              process()
...


if __name__ == '__main__':
    src_data = get_dict_data_from_csv(sys.argv[1])


    condition_number = map(int, sys.argv[2].split(','))


    main_function(src_data, condition_number)




Now I can split the whole list of data and process simultaneously with the input condition range from 0 to 9:

0, 1, 2, 3, 4, 5, 6, 7, 8, 9

For example, I can run the script in 5 different terminal windows with 5 different groups of input condition such as:

Terminal #0:

$ python myscript /home/trinh/Documents/data.csv 0,1

Terminal #1:

$ python myscript /home/trinh/Documents/data.csv 2,3

Terminal #2:

$ python myscript /home/trinh/Documents/data.csv 4,5

Terminal #3:

$ python myscript /home/trinh/Documents/data.csv 6,7

Terminal #4:

$ python myscript /home/trinh/Documents/data.csv 8,9


It means that I can do the job 5 times faster than normal.

Awesome!