If you have been using Scala, or even Java 8, you have probably seen the concept of a Stream
.
A Stream
is a lazy collection. This means a Stream
can been seen as a list where each element is computed/evaluated as needed. Several other properties make Streams
different from lists, sets and other common collections. You can see those differences in detailed both on Java Stream docs and Scala Stream docs.
But are Scala Streams really lazy?
According to the official scala documentation: The class Stream implements lazy lists where elements are only evaluated when they are needed.
.
But is it really true?
While working on a side project I came across an unexpected exception due to eager evaluation of a stream. And so I started to dig a bit more and found out this can be easily reproduced:
scala> Stream(1/0)
java.lang.ArithmeticException: / by zero
... 28 elided
But this could be just because the argument is being passed by value and not by name. And it’s a valid point. But look at this more complex expression:
scala> Stream.continually(1).partition(_ % 2 == 0)
java.lang.OutOfMemoryError: GC overhead limit exceeded
at scala.collection.immutable.Stream$$$Lambda$1035/560897187.get$Lambda(Unknown Source)
at java.lang.invoke.LambdaForm$DMH/589431969.invokeStatic_L_L(LambdaForm$DMH)
at java.lang.invoke.LambdaForm$MH/1747585824.linkToTargetMethod(LambdaForm$MH)
at scala.collection.immutable.Stream$.$anonfun$continually$1(Stream.scala:1236)
at scala.collection.immutable.Stream$$$Lambda$1035/560897187.apply(Unknown Source)
at scala.collection.immutable.Stream$Cons.tail(Stream.scala:1169)
at scala.collection.immutable.Stream$Cons.tail(Stream.scala:1159)
at scala.collection.immutable.Stream.filterImpl(Stream.scala:503)
at scala.collection.immutable.Stream.filterImpl(Stream.scala:201)
at scala.collection.TraversableLike.filter(TraversableLike.scala:259)
at scala.collection.TraversableLike.filter$(TraversableLike.scala:259)
at scala.collection.AbstractTraversable.filter(Traversable.scala:104)
at scala.collection.immutable.Stream.partition(Stream.scala:586)
... 19 elided
This OutOfMemoryError
indicates that there is something happening (i.e., being computed), while the Stream
should be lazy.
The evil Cons
So what’s going on here? Are streams lazy or not?
In fact they are. But Streams are built upon a structure called Cons, that is kinda of a pair: A lazy cons cell, from which streams are built.
But Cons
is a pair that’s not lazy (it is eager) on the first element. This means that when you instantiate a Cons
, you will evaluate the first element, even if the second it’s not evaluated immediately. And Streams are a sequence of Cons
. So when you create a Stream
, you are defining a lazy list of Cons
, where the first element (the Stream head) needs to be evaluated. And this is why we get these errors.
Solution
But how hard can it be to solve this problem? Well, there are good news: Scala 2.13 (to be released somewhere in 2018) solves this problem, not on the Stream
class, but by providing a new lazy collection called LazyList as part of the work on scala collections.
Conclusion
While Streams are a natural structure for lazy collections one must have in mind the detail on the eagerness of Cons
to avoid unexpected errors. If this is a no go for you, you should look for alternatives while Scala 2.13 is not released: