I think Transducers are a fundamental primitive that decouples critical logic from list/sequence processing, and if I had to do Clojure all over I would put them at the bottom.
– Rich Hickey
Transducers are an ergonomic and extremely memory-efficient way to process a data source. Here “data source” could mean an ordinary Table, but also potentially large files or generators of infinite data.
Transducers…
- allow the chaining of operations like
map
andfilter
without allocating memory between each step. - aren’t tied to any specific data type; they need only be implemented once.
- vastly simplify “data transformation code”.
- are a joy to use!
Looking for Transducers in other Lisps? Check out the Emacs Lisp and Common Lisp implementations!
Originally invented in Clojure and later adapted to other Lisps, Transducers are an excellent way to think about - and efficiently operate on - collections or streams of data. Transduction operations are strict and don’t involve “laziness” or “thunking” in any way, yet only process the exact amount of data you ask them to.
This library consists of only a single module, so it’s simple to vendor into your own projects.
(local t (require :transducers))
(t.transduce (t.take 3) t.add [1 2 3 4 5])
;; The fundamental pattern.
(t.transduce <transducer-chain> <reducer> <source>)
Data processing largely has three concerns:
- Where is my data coming from? (sources)
- What do I want to do to each element? (transducers)
- How do I want to collect the results? (reducers)
Each full “transduction” requires all three. We pass one of each to the
transduce
function, which drives the process. It knows how to pull values from
the source, feed them through the transducer chain, and wrap everything together
via the reducer.
- Typical transducers are
map
,filter
, andtake
. - Typical reducers are
add
,count
, andfold
. - Typical sources are tables and files.
Generators are a special kind of source that yield infinite data. Typical
generators are repeat
and cycle
.
Let’s sum the squares of the first 1000 even integers:
(t.transduce
(t.comp (t.filter #(= 0 (% $1 2))) ;; (2) Keep only even numbers.
(t.take 1000) ;; (3) Keep the first 1000 filtered evens.
(t.map (fn [n] (* n n)))) ;; (4) Square those 1000.
t.add ;; (5) Reducer: Add up all the squares.
(t.ints 1)) ;; (1) Source: Generate all positive integers.
Two things of note here:
comp
is used here to chain together different transducer steps. Notice that the order appears “backwards” from usual function composition. It may help to imagine thatcomp
is acting like the->>
macro here.- The reduction via
add
is listed as Step 5, but really it’s occuring throughout the transduction process. Each value that makes it through the composed transducer chain is immediately added to an internal accumulator.
Explore the other transducers and reducers to see what’s possible!
As a convenience, this library also exposes a simple interface for reading and writing streams of CSV data.
To sum the values of a particular field:
(local t (require :transducers))
(t.transduce (t.comp (t.filter-map #(. $1 :Age))
(t.filter-map tonumber))
t.add (t.csv-read "foo.csv"))
To reduce the file to certain fields and write the data back out:
(local t (require :transducers))
(t.transduce t.pass
(t.csv-write "out.csv" ["Name" "Age"])
(t.csv-read "in.csv"))
Summing a numeric field in a 45mb CSV file.
Runtime | Average Time (sec) |
---|---|
LuaJIT | 1.38 |
Lua 5.4 | 2.56 |
Lua 5.2 | 3.03 |
The associated code can be found in the examples folder, alongside a hand-written version using only Fennel primitives. Interestingly, this hand-written version performs slightly worse, implying that the overhead from Transducers themselves is minimal.