M HYPE SPLASH
// news

sort is very slow for big files

By John Peck

I want to sort file (with removing duplicates) which contains wordlist with a size of almost 25GB. I am using sort command in Ubuntu terminal but it takes hours to output sorted file, the command is:

sort -u input.txt>output.txt

Is there some alternative and efficient way to do the same?

8

2 Answers

From my personal experience: if you want unique lines make sure you also use

LC_ALL=C sort -u 

I see speed improvements by a factor 10 but it probably depends on the characters in the file (I often have to use it for translations so I have accented characters in the file).

A really incredible tool is xsv (). Running it for a 3173959 rows file

xsv sort input.txt >output.csv

I have the output in 3 seconds

2

Your Answer

Sign up or log in

Sign up using Google Sign up using Facebook Sign up using Email and Password

Post as a guest

By clicking “Post Your Answer”, you agree to our terms of service, privacy policy and cookie policy