Add Clojure's seq abstraction #32

gilch · 2017-02-27T17:55:38Z

We're running into occasional annoying problems because the ubiquitous Python iterators are mutable. Clojure doesn't have this problem because it uses the immutable seq abstraction instead. A seq can still have the lazyness of some iterables, but getting the next item won't change what the first seq means.

I've been playing around with a Python implementation of seq. Here's a rough attempt. This could easily be translated to Hy.

from collections.abc import Iterable

class LazyCons:
    __slots__ = ['car', 'cdr']
    def __init__(self, car, cdr):
        self.car = lambda: car
        def lazy_cdr():
            res = cdr()
            self.cdr = lambda: res
            return res
        self.cdr = lazy_cdr
        
class Seq(LazyCons, Iterable):
    __slots__ = ()
    def __init__(self, iterable):
        it = iter(iterable)
        car = next(it)
        super().__init__(car, lambda: Seq(it))
    def __iter__(self):
        def iterator():
            seq = self
            while 1:
                yield seq.car()
                seq = seq.cdr()
        return iterator()

And a function to demonstrate the laziness.

def verboseRange(*stop_startstopstep):
    for i in range(*stop_startstopstep):
        print('yielding', i)
        yield i

And the demonstration:

>>> spam = Seq(iter(verboseRange(5)))  # note iter() call
yielding 0
>>> spam.car()
0
>>> spam.car()
0
>>> spam.cdr()
yielding 1
<__main__.Seq object at 0x0000019C4D773D68>
>>> spam.cdr().car()
1
>>> list(spam)
yielding 2
yielding 3
yielding 4
[0, 1, 2, 3, 4]
>>> spam.car()
0

That iter() call isn't required, but it demonstrates converting a mutable iterator into an immutable seq (provided nothing else has a reference to that iterator).

Some more ideas:

Clojure often converts collections to seq's automatically. Rather than add that to a lot of Hy functions, we might instead have a short reader macro to do it. Although, (seq xs) is not that long to begin with.

We could also change the __repr__ of a Seq to print off up to the first 20 items or something. This would make seq's a lot more friendly in the repl, and lazy Python iterables too especially, when combined with the reader macro.

The text was updated successfully, but these errors were encountered:

Kodiologist · 2017-02-27T18:00:46Z

How does this proposal compare to @tuturto's hy.contrib.sequences?

gilch · 2017-02-27T18:07:54Z

They look pretty different to me. They both have the property of caching the calculated values. @tuturto's sequences seems like more of an alternative to generators. My seq abstraction would probably make that obsolete, since you could just use seq on a normal yield generator (or genexpr) to make it cache its values. My seq works on any iterable or iterator.

tuturto · 2017-02-27T19:32:47Z

They're pretty different. Mine is a (possibly recursive) sequence defined in terms of index. Think of fibonacci:

=> (defseq fibonacci [n]
...    (if (= n 0) 0
...        (= n 1) 1
...        (+ (get fibonacci (- n 1))
...           (get fibonacci (- n 2)))))

=> (get fibonacci 6)
8

One can iterate over it, but the iterator is mutable (because it's just a normal iterator):

=> (setv fibs (iter fibonacci))
=> (first fibs)
0
=> (first fibs)
1
=> (first fibs)
1
=> (first fibs)
2

gilch's approach takes any iterator and creates a new immutable iterator:

=> (setv fibs (Seq fibonacci))
=> (first fibs)
0
=> (first fibs)
0
=> (first fibs)
0

However, since it doesn't define __getitem__ magic method, indexed access is not supported.

So they're solving related, but slightly different problems.

gilch · 2017-02-27T21:29:09Z

gilch's approach takes any iterator and creates a new immutable iterator:

More precisely, it takes any iterable and makes a new lazy immutable iterable from it. The iterables are those things that you can use iter() on to get an iterator. It so happens that in Python, all iterators are themselves iterables (their __iter__() method returns self), so you can do things like this:

>>> tuple([1,2,3,4,5])
(1, 2, 3, 4, 5)
>>> tuple(iter(iter(iter([1,2,3,4,5]))))
(1, 2, 3, 4, 5)
>>> spam = iter([1,2,3])
>>> iter(spam) is spam
True
>>> iter(iter(iter(spam))) is spam
True

A better optimized Seq implementation would similarly avoid creating a new Seq when it's created from a Seq or even from an iterator returned by a Seq.

You can use a Seq to convert a (mutable) iterator to an immutable iterable and still use it in any Python function that needs an iterable. So the Python interop is pretty seamless.

If you have an iterator, but want to use it more than once, the usual way of handling that in Python is to realize the iterator into a tuple, which is immutable, but still iterable. This doesn't work well if the iterator is sufficiently large. The itertools.tee() function is a lazy alternative, but the iterators it returns are still mutable.

I think we could also add a cdrs() method to Seq that gets a Seq of all the cdrs, like how {}.items() gives you a different view of the data.

See Clojure's seq abstraction for what else we could do with this. The LazyCons class is inspired by Clojure's lazy-seq.

tuturto · 2017-02-28T07:08:35Z

I would really love to see this in Hy. While I like using iterators, I know from first hand that some cases are hard to work with and can introduce subtle bugs. Should come up with good naming though, that separates these two from each other. Since @gilch's is closer to what clojure's sequences are, I would propose to renaming my solution to something different.

refi64 · 2017-03-06T16:37:14Z

This would be pretty great, and it seems like it wouldn't cause any trouble with Python integration.

refi64 · 2017-03-06T16:44:42Z

Actually there is one thing that worries me: how is the memory usage here? Often in Python, iterators are used when processing large data sequences, and I'm just not sure how the storage of numerous attached functions and such would work out...

gilch · 2017-03-06T19:13:23Z

We'd have to design it carefully. My proof of concept might have a bit of a problem there. I haven't tested it enough (and have already uncovered other problems). But in principle, if you don't keep a reference to the head, the garbage collector is free to reclaim any lazy cons cells you've passed. So besides a small fixed overhead for the current value, you're not taking up any more memory than any other lazy iterator.

I think Python can reuse the function bytecode and only change the values in the closures, so I'm not too worried about the functions taking up too much memory. They're also generated on the fly, so it's still lazy.

gilch mentioned this issue Feb 27, 2017

Remove car and cdr in favor of first and rest hylang/hy#1241

Merged

refi64 added the feature New feature or request label Mar 13, 2017

gilch mentioned this issue Jul 20, 2017

add destructure to contrib hylang/hy#1328

Closed

This was referenced Mar 21, 2018

Looks like replace does unnecessary work hylang/hy#1519

Closed

Consider making Hy models immutable. hylang/hy#1542

Closed

unquote/unpacking improvements hylang/hy#1545

Closed

gilch mentioned this issue May 26, 2022

Interest for #, (comma tuple) tag macro in core? #44

Open

Kodiologist transferred this issue from hylang/hy Mar 5, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add Clojure's seq abstraction #32

Add Clojure's seq abstraction #32

gilch commented Feb 27, 2017 •

edited

Loading

Kodiologist commented Feb 27, 2017

gilch commented Feb 27, 2017 •

edited

Loading

tuturto commented Feb 27, 2017

gilch commented Feb 27, 2017

tuturto commented Feb 28, 2017

refi64 commented Mar 6, 2017

refi64 commented Mar 6, 2017

gilch commented Mar 6, 2017 •

edited

Loading

Add Clojure's seq abstraction #32

Add Clojure's seq abstraction #32

Comments

gilch commented Feb 27, 2017 • edited Loading

Kodiologist commented Feb 27, 2017

gilch commented Feb 27, 2017 • edited Loading

tuturto commented Feb 27, 2017

gilch commented Feb 27, 2017

tuturto commented Feb 28, 2017

refi64 commented Mar 6, 2017

refi64 commented Mar 6, 2017

gilch commented Mar 6, 2017 • edited Loading

gilch commented Feb 27, 2017 •

edited

Loading

gilch commented Feb 27, 2017 •

edited

Loading

gilch commented Mar 6, 2017 •

edited

Loading