6

Pipeline processing of data can be done either,

using generators only

enter image description here

Example generator in pipeline:

def grep(source, pattern=None):
   patternObject = re.compile(pattern)
   for line in source:
      if patternObject.search(line):
         yield line

or

using coroutines only

enter image description here

Example coroutine in pipeline:

@coroutine
def grep(pattern):
   print "Looking for %s" % pattern
   while True:
      line = (yield)
      if pattern in line:
         print line,

Generators can be used to multiplex data from multiple sources, using threads.

enter image description here

Generators can be used to broadcast data to multiple consumers.

enter image description here

Generators are equally powerful in Concurrent programming


Question:

Why do we need couroutine using send()(for ex: python)? When to use coroutine over generator?

user1787812
  • 739
  • 7
  • 16
  • I think, to start a generator, we need source readily available. To start a coroutine, we do not require source readily available and can be later `send(data)`. source can be data/sock_conn/whatever.. – user1787812 Jul 11 '17 at 21:05

1 Answers1

9

Coroutines are generators - they just have the extra ability to receive arguments after invocation.

Data pipelines (ETL stuff) are a bad example to show off their strengths. Pipelines rarely need the full power of coroutines - the data is static and unchanging, and does not need dynamic modification outside the context of your data and the encapsulating function.

However, there are certainly good reasons to use a coroutine. Two examples:

  1. Dynamic injection of new contexts: You have multiple incoming server requests to handle, and want to avoid a process-per-request model a la Apache. Interleaving coroutines on top of each other and passing in the context of the request (cookies, sessions, etc.) at arbitrary points allow you to get the benefit of an N:M process-per-request model and dynamically evaluate requests using the new configuration that can be supplied at ad-hoc intervals.

  2. Inter-generator communication: You've organised your architecture to be peer-driven, where each generator must talk to the other in some way. Maybe you're trying to coordinate across coroutines to save on resources, for example, or perhaps you want to use Lamport vector clocks rather than a call to time.time(). A pure generator could only return values, but not accept new configurations or input values - a coroutine can.

Whenever you need this ability, coroutines are useful!

Akshat Mahajan
  • 430
  • 2
  • 10
  • When you say, *input values - a coroutine can*, Is it not the flip side of generator? and nothing more than that – user1787812 Jul 11 '17 at 21:08
  • That is the technical definition of a coroutine, yes. A coroutine in theory should only _accept_ data, and not return it. In practice, people write functions that both accept dynamic inputs and yield dynamic outputs, so it's very easy to mix the concept of a coroutine and a generator, and then call the resulting combination a coroutine. It happens, it's not a bad thing, and it makes the resulting code very powerful. – Akshat Mahajan Jul 11 '17 at 21:14
  • @user1787812 [Here's an implementation of a coroutine that mixes the two concepts](https://stackoverflow.com/questions/19302530/python-generator-send-function-purpose), and calls the result a coroutine. Note the _two_ yields - this function cycles between being a generator and being a coroutine at different points in its execution. – Akshat Mahajan Jul 11 '17 at 21:17
  • Mixing is not recommended – user1787812 Jul 11 '17 at 21:18
  • @user1787812: Why? – Robert Harvey Jul 11 '17 at 21:20
  • 1
    @user1787812 Recommendations are _guidelines_. They're best to follow except when a performance requirement must be met, or when the alternatives are worse. Python's `asyncio` and Pypy, for instance, abstract mixed coroutines heavily to give you what are effectively green threads without userspace scheduling. Mixing the two concepts may lead to deadlock, of course, but the benefits outweigh the drawbacks in the majority of cases, where you only want simple interactions. – Akshat Mahajan Jul 11 '17 at 21:24