Monday, October 3, 2016

COUB popular songs analysis

I've written simple script in Python to get all best of the week coubs.
First version of the script was really slow. It required almost 5 minutes to get all 6084 coubs.
So I've rewritten it using multiprocessing module for parallelism. And I got almost x6 speedup - multiprocessing version has taken just half a minute to get all 6084 coubs.
Then I've used Counter from collections to get most common song titles.
Also I've updated most common titles using algorithm that searches similar (but not equal) titles by replacing some of its symbols.
Here is statistics I got:

I've published sources on my github https://github.com/delimitry/coub_utils.

No comments:

Post a Comment