statistics.median_grouped(data, interval = 1)
The statistics.median_grouped function calculates the median of the elements in data grouped in blocks of size interval (parameter that defaults to 1). More information in this web site.
The code for this function (simplified) is as follows:
data = sorted(data)
n = len(data) # Number of points
x = data[n//2] # Value at the midpoint (centre of the interval)
L = x - interval / 2 # Lower limit of the median interval
l1 = _find_lteq(data, x) # Position of leftmost occurrence of x in data
l2 = _find_rteq(data, x) # Position of leftmost occurrence of x in data
cf = l1 # Number of points before the first occurrance of x
f = l2 - l1 + 1 # Number of occurrences of x
return L + interval * (n / 2 - cf) / f # Interpolation
As can be seen, after ordering the data and obtaining the number of elements (variable n), the value that occupies the central position (variable x) is calculated. This value matches the result returned by the statistics.median_high function.
Once the median of the data is obtained, the limits of the interval to which it belongs are calculated. In reality, only the lower limit of this interval is of interest, which is stored in the variable L:
Next, the lowest and highest position of the median in data is obtained (variables l1 and l2). From these values the number of occurrences of the median in data is calculated:
The _find_lteq and _find_rteq functions can be implemented in a simple (although not very efficient) way with the following code:
return data.index(x)
data.reverse()
return len(data) - data.index(x) - 1
Finally, the value to return is interpolated based on the size of the interval:
- data: Sequence or iterable from whose data we want to calculate the grouped median.
- interval: Optional argument. Size of the interval to consider.
The statistics.median_grouped function returns a real number.
Suppose we start from the following iterable of five elements:
...and that the size of the interval is 2:
The median of our list is 5, so the interval to which it belongs is [4, 6) (interval closed on the left and open on the right) and this value appears two times in the list. Namely:
L = 4 (lower limit of the interval to which the median belongs)
n = 5 (number of elements in the iterable)
cf = 2 (number of values less than the median)
f = 2 (number of occurrences of the median)
If we apply these values to the expression
...we obtain:
...value that matches the one returned by the function: