This module provides several trivial wrappers of iterable object that make iterators running in parallel easily and simply. Why its name is futureutils is that it introduces the concept of futures and promises into Python iterators and generators.
It works well on Python 2.5+ — tested on CPython 2.5+, PyPy 1.4+, IronPython 2.6+. (Unfortunately there’s no plan to support Python 3+ yet, but see PEP 3148 also.)
The easiest way to install futureutils is using pip or easy_install.
$ pip install futureutils # or $ easy_install futureutils
You can install it from the source code in Mercurial repository also, if you want:
$ hg clone https://bitbucket.org/dahlia/futureutils $ cd futureutils/ futureutils$ python setup.py install
There are only two functions in the module: promise() and future_generator(). The former is a lower but more general interface. The latter is a decorator-style higher interface, cannot be used for all iterators but only for generator functions.
If your iterator is a generator which is made by a generator function, just use future_generator() decorator:
import lxml.html from futureutils import * @future_generator def list_hrefs(url): html = lxml.html.parse(url) for href in html.xpath('//a[@href]/@href'): href = href.strip() if href and not href.startswith('#'): yield href
Then, iterators made by your generator function are automatically running in parallel and yield items like normal iterators. Whatever you apply future_generator() decorator to your generator function or not, its behavior is the same always. Because it doesn’t change its semantics, but only efficiency.
If your generator function already is parallelized, its semantics could be probably changed, so be careful in this case.
If your iterator is not a generator, you could use promise() function. What it does is very simple: takes an iterable object then returns a wrapping iterator.
import lxml.html from futureutils import * def list_hrefs(url): html = lxml.html.parse(url) for href in html.xpath('//a[@href]/@href'): href = href.strip() if href and not href.startswith('#'): yield href iterator = list_hrefs('http://dahlia.kr/') parallelized_iterator = promise(iterator)
Read the following API references for details.
Promised iterators have their own buffer queue internally, and every queue has their maximum size. It intends to avoid wasting memory unlimitedly in case of infinite iterators.
This constant is a default size of a queue.
The internal-use only flag constants.
Promises the passed iterable object and returns its future iterator.
>>> import time, datetime >>> def myiter(): ... for x in xrange(5): ... yield x ... time.sleep(0.5) ... >>> it = promise(myiter()) >>> time.sleep(2) >>> start = datetime.datetime.now() >>> list(it) [0, 1, 2, 3, 4] >>> delta = datetime.datetime.now() - start >>> delta.seconds 0 >>> delta.microseconds > 500000 True
It could be used for simple parallelization of IO-bound iterable objects.
It propagates an inner exception during iteration also as well as a normal iterator:
>>> def pooriter(): ... yield 1 ... raise Exception('future error') ... >>> it = promise(pooriter()) >>> it.next() 1 >>> it.next() Traceback (most recent call last): ... Exception: future error
It can deal with infinite iterators as well also:
>>> import itertools >>> it = promise(itertools.cycle('Hong Minhee ')) >>> ''.join(itertools.islice(it, 23)) 'Hong Minhee Hong Minhee'
Every future iterator has its own buffer queue that stores iterator’s result internally, and every queue has their maximum size. It intends to avoid wasting memory unlimitedly in case of infinite iterators. You can tune the queue buffer size through buffer_size option.
>>> import itertools >>> def infloop(): ... i = 0 ... while True: ... print i ... yield i ... i += 1 ... >>> list(itertools.islice(promise(infloop(), buffer_size=5), 5) ... ) 0 1 2 3 4 [0, 1, 2, 3, 4]
a promised future iterator
The decorator that makes the result of decorated generator function to be promised and return a future iterator.
It’s a simple decorator wrapper of promise() for generator functions.
>>> import time, datetime >>> @future_generator ... def mygenerator(): ... for x in xrange(5): ... yield x ... time.sleep(0.5) ... >>> it = mygenerator() >>> time.sleep(2) >>> start = datetime.datetime.now() >>> list(it) [0, 1, 2, 3, 4] >>> delta = datetime.datetime.now() - start >>> delta.seconds 0 >>> delta.microseconds > 500000 True
|Parameters:||function (callable object) – a generator function to make to future generator|
|Returns:||a future generator function|
|Return type:||callable object|