sort is very slow for big files
By John Peck •
I want to sort file (with removing duplicates) which contains wordlist with a size of almost 25GB. I am using sort command in Ubuntu terminal but it takes hours to output sorted file, the command is:
sort -u input.txt>output.txtIs there some alternative and efficient way to do the same?
82 Answers
From my personal experience: if you want unique lines make sure you also use
LC_ALL=C sort -u I see speed improvements by a factor 10 but it probably depends on the characters in the file (I often have to use it for translations so I have accented characters in the file).
A really incredible tool is xsv (). Running it for a 3173959 rows file
xsv sort input.txt >output.csvI have the output in 3 seconds
2