M HYPE SPLASH
// general

Calculation percentage of my data for different number ranges [closed]

By Sarah Scott

I am trying to find out how to calculate percentage of my data for different number ranges. So I have a data that looks like this:

0.81761
0.255319
0.359551
0.210191
0.374046
0.188406
0.179487
0.265152
0.207792
0.202614
0.150943

..and I have these ranges:

0-0.3
0.3-0.7
0.7-1

I want to know out of my data, what is the percentage that fall into a specific number range. So, for example:

0-0.3 -> 72.7%
0.3-0.7 -> 18.18%
0.7-1 -> 9.09%

Does anybody knows how to do this calculation?

3

2 Answers

Using awk:

awk ' # Count occurencies { if ($1 < 0.3) a++ else if ($1 > 0.7) c++ else b++ } # Print Percentage of count/NR (num records) END { printf "< 0.3: %.2f%%\n",a/NR*100 printf "> 0.3 and < 0.7: %.2f%%\n",b/NR*100 printf "> 0.7: %.2f%%\n",c/NR*100 }
' file
0

You can use the histogram function from numpy

Ex.

$ python
Python 2.7.12 (default, Nov 12 2018, 14:36:49)
[GCC 5.4.0 20160609] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>>
>>> import numpy as np
>>>
>>> data = np.loadtxt('datafile')
>>> hist = np.histogram(data,[0,0.3,0.7,1.0])
>>> print 100.0 * hist[0]/sum(hist[0])
[ 72.72727273 18.18181818 9.09090909]
>>>

See for example NumPy - Histogram Using Matplotlib (of course, you don't have to plot the result).

2