Skip to content

Firionus/FastRunningMedian.jl

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

86 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

FastRunningMedian.jl

This Julia Package allows you to calculate a running median - fast.

Getting Started

In Julia, install with:

]add FastRunningMedian

Example Usage:

julia> using FastRunningMedian

julia> running_median([1,9,2,3,-9,1], 3)
6-element Vector{Float64}:
 1.0
 2.0
 3.0
 2.0
 1.0
 1.0

High-level API

# FastRunningMedian.running_medianFunction.

running_median(input, window_length, tapering=:symmetric; kwargs...) -> output

Run a median filter of window_length over the input array and return the result.

If the input array is multidimensional, the median will be run only over the first dimension, i.e. over all columns indepedently.

Taperings

The tapering decides the behaviour at the ends of the input. The available taperings are:

  • :symmteric or :sym: Ensure that the window is symmetric around each point of the output array by always growing or shrinking the window by 2. The output has the same length as the input if window_length is odd. If window_length is even, the output has one element less.
  • :asymmetric or :asym: Always adds or removes one element when calculating the next output value. Creates asymmetric windowing at the edges of the array. If the input is N long, the output is N+window_length-1 elements long.
  • :asymmetric_truncated or :asym_trunc: The same as asymmetric, but truncated at beginning and end to match the length of :symmetric.
  • :none or :no: No tapering towards the ends. If the input has N elements, the output is only N-window_length+1 long. Equivalent to "roll" from RollingFunctions.
  • :beginning_only or :start: At the beginning, always grow the window by one but do not taper the end. This is equivalent to asymmetric but truncated at the end such that the output length matches the input length. Equivalent to "run" from RollingFunctions.

If you choose an even window_length, the elements of the output array lie in the middle between the input elements on a continuous underlying axis.

With the exception of :beginning_only, all taperings are mirror symmetric with respect to the middle of the input array.

Keyword Arguments

  • nan=:include: By default, NaN values in the window will turn the median NaN as well. Use nan = :ignore to ignore NaN values and calculate the median over the remaining values in the window. If there are only NaNs in the window, the median will be NaN regardless.
  • output_eltype=Float64: Element type of the output array. The output element type should allow converting from Float64 and the input element type. The exception is odd window lengths with taperings :no or :sym, in which case the output element type only has to allow converting from the input element type.

Performance

The underlying algorithm should scale as O(N log w) with the input length N and the window_length w.

source

Taperings Visualized

Each data point is shown as a cross and the windows are visualized as colored boxes, the input is grey.

Tapering Examples

Performance Comparison

Benchmark Comparison

For large window lengths, this package performs even better than calling runmed in R, which uses the Turlach implementation written in C. For small window lengths, the Stuetzle implementation in R still outperforms this package, but the overhead from RCall doesn't seem worth it. Development of a fast implementation for small window lengths is ongoing, see the corresponding issues for details.

In contrast to this package, SortFilters.jl supports arbitrary probability levels, for example to calculate quantiles.

You can find the Notebook used to create the above graph in the benchmark folder. I ran it on an i7-2600K with 8 GB RAM while editing and browsing in the background.

Mid-level API

You can take control of allocating the output vector and median filter with a lower-level API. This is useful when you have to calculate many running medians of the same window length.

# FastRunningMedian.running_median!Function.

running_median!(mf::MedianFilter, output, input, tapering=:sym; nan=:include) -> output

Use mf to calculate the running median of input and write the result to output.

For all details, see running_median.

Examples

input = [4 5 6;
         1 0 9;
         9 8 7;
         3 1 2;]
output = similar(input, (4,3))
mf = MedianFilter(eltype(input), 3)
for j in axes(input, 2) # run median over each column
    # re-use mf in every iteration
    running_median!(mf, @view(output[:,j]), input[:,j])
end
output

# output
4×3 Matrix{Int64}:
 4  5  6
 4  5  7
 3  1  7
 3  1  2

source

Stateful API

The stateful API can be used for streaming data, e. g. to reduce RAM consumption, or building your own high-level API.

# FastRunningMedian.MedianFilterType.

MedianFilter([T=Float64,] window_length) where T <: Real

Construct a stateful running median filter, taking values of type T.

Manipulate with grow!, roll!, shrink!, reset!. Query with median, length, window_length, isfull.

Examples

julia> mf = MedianFilter(Int64, 2)
MedianFilter{Int64}(MutableBinaryHeap(), MutableBinaryHeap(), Tuple{FastRunningMedian.ValueLocation, Int64}[], 0, 0)

julia> grow!(mf, 1); median(mf) # window: [1]
1

julia> grow!(mf, 2); median(mf) # window: [1,2]
1.5

julia> roll!(mf, 3); median(mf) # window: [2,3]
2.5

julia> shrink!(mf); median(mf) # window: [3]
3

source

# FastRunningMedian.grow!Function.

grow!(mf::MedianFilter, val) -> mf

Grow mf with the new value val.

If mf would grow beyond maximum window length, an error is thrown. In this case you probably wanted to use roll!.

The new element is pushed onto the end of the circular buffer.

source

# FastRunningMedian.roll!Function.

roll!(mf::MedianFilter, val) -> mf

Roll the window over to the next position by replacing the first and oldest element in the ciruclar buffer with the new value val.

source

# FastRunningMedian.shrink!Function.

shrink!(mf::MedianFilter) -> mf

Shrinks mf by removing the first and oldest element in the circular buffer.

Will error if mf contains only one element as a MedianFilter with zero elements would not have a median.

source

# FastRunningMedian.reset!Function.

reset!(mf::MedianFilter) -> mf

Reset the median filter mf by emptying it.

source

# FastRunningMedian.medianFunction.

median(mf::MedianFilter; nan=:include)

Determine the current median in mf.

NaN Handling

By default, any NaN value in the filter will turn the result NaN.

Use the keyword argument nan = :ignore to ignore NaN values and calculate the median over the remaining values. If there are only NaNs, the median will be NaN regardless.

Implementation

If the number of elements in MedianFilter is odd, the low_heap is always one element bigger than the high_heap. The top element of the low_heap then is the median.

If the number of elements in MedianFilter is even, both heaps are the same size and the median is the mean of both top elements.

source

# Base.lengthFunction.

length(mf::MedianFilter)

Returns the number of elements in the stateful median filter mf.

This number is equal to the length of the internal circular buffer.

source

# FastRunningMedian.window_lengthFunction.

window_length(mf::MedianFilter)

Returns the window_length of the stateful median filter mf.

This number is equal to the capacity of the internal circular buffer.

source

# FastRunningMedian.isfullFunction.

isfull(mf::MedianFilter)

Returns true when the length of the stateful median filter mf equals its window length.

source

Sources

W. Hardle, W. Steiger 1995: Optimal Median Smoothing. Published in Journal of the Royal Statistical Society, Series C (Applied Statistics), Vol. 44, No. 2 (1995), pp. 258-264. https://doi.org/10.2307/2986349

(I did not implement their custom double heap, but used two heaps from DataStructures.jl)

Keywords

Running Median is also known as Rolling Median or Moving Median.