How to design for an ordered list of unrelated events

Question

This is a bit of an invented example but I think it best illustrates my question: Say I'm creating a chess replay event API. Say I have a lot of different "events" I want to keep track of, in a specified order. Here might be some examples:

A move event — this contains the previous and new square.
A timer event — this contains the timestamp that the timer was toggled between players
A chat message event — this contains the player ID, the message and time sent

...etc. The point is that the data model for each event is very different — there isn't much of a common interface.

I want to design an API that can store and expose essentially a List<Event> to a client who can choose to process these different events as they wish. We don't know what clients will do with this information: Perhaps one client may need to do text analysis on the ChatMessageEvents, and one may consume and replays these events in the UI. The challenge is that ordering between events must be preserved, so I can't separate by methods like getMoveEvents and getTimerEvents since a TimerEvent can happen between move events and the client may need that information.

I could expose a visitor to allow clients to handle each event type differently in the list, but I'm wondering if there's a better way to handle a situation like this.

Edit: I want to design this with one main priority: provide clients with an easy and flexible way to iterate through these events. In an ideal scenario, I would envision the end user writing handlers to the event types they care about, and then be able to iterate through without casting based on the runtime type.

Don't be afraid of creating an enum that identifies the "event type". These events can inherit from a base event type that exposes that enum as a read-only property. A visitor, or a C# pattern-matching statement (which can be an `if` or a `switch`) can be used. Moreover, LINQ can perform filtering for you, so that events can be filtered on event type, or a more complex conditional statement where the event enum is one of the input. — rwong, Dec 16 '20 at 06:12
In some places, C# generics can be used. This need to be carefully designed, though, so that the benefits outweigh the complexity and the cost of imposed restrictions. — rwong, Dec 16 '20 at 06:13
@rwong thank you. Regarding the enum approach, if I did it that way in Java (and didn't use a visitor), my understanding is I'd need to do an explicit cast to the associated event class type in order to access any of the event-specific data. So for example, `if(event.getEventType() == TIMER_EVENT) { processTime( (TimerEvent) event)) }` Would that be considered an antipattern, or is this an acceptable approach? — rb612, Dec 16 '20 at 06:34
"I want to cut this board in half. What kind of hammer should I use?" Don't limit yourself by trying to solve every problem with a design pattern (or with OOP, for that matter). Rather, find the most elegant solution to the problem, and then if parts of that solution happen to include design patterns, feel free to use that language to describe it. — Ray, Dec 16 '20 at 15:12
You might be looking for defining `Event` as an [algebraic data type](https://en.wikipedia.org/wiki/Algebraic_datatype). How exactly to write that would depend on the concrete programming language, though. — Bergi, Dec 16 '20 at 16:53
Like @yoozer8 said: add a timestamp to your event. Make an interface for that which you include in all your events: bam you can make a list which you can order. — Pieter B, Dec 18 '20 at 11:25

Doc Brown · Accepted Answer · 2020-12-18T06:22:53.120

I am under the strong impression you are overthinking this.

The challenge is that ordering between events must be preserved, so I can't separate by methods like getMoveEvents and getTimerEvents

Then simply don't offer such methods in your API. Let the client filter out the events they need, and do not implement anything in your API which could become error prone.

I could expose a visitor to allow clients to handle each event type differently in the list

This sounds overengineered. You described the requirement as getting something like a List<Event>, containing recorded events. For this, a simple method List<Event> getEvents() would be totally sufficient (maybe an IEnumerable<Event> would be enough). For reasons of efficiency, it may be necessary to offer some methods for restricting the result set to certain conditions.

but I'm wondering if there's a better way to handle a situation like this

Asking for a "better" (or "best", or "correct") approach is way too unspecific when you don't know any criteria for what you actually mean by "better". But how do find criteria for what is "better"? The only reliable way I know for this problem is:

Define some typical use cases for your API!

Do this in code. Write down a short function which tries to use your API, solving a real problem you know for sure the clients will encounter (even if the API does not exists or is not implemented yet).

It may turn out the client will need something like a property to distinguish event types. It may turn out the client needs something to get only the events from the last hour, or the last 100 events, since providing him always a full copy of all former events may not be effcient enough. It may turn out the client needs to get a notification whenever a new event is created.

You will only be able to decide this when you develop a clear idea of the context in which your API will be used.
If you add some code to this function which verifies the API's result, and place this code into a the context of a unit testing framework, then you are doing "Test Driven Development"
But even if you don't want to use TDD or don't like TDD, it is best to approach this from the client's perspective.
Don't add anything to your API where you have doubts if there will ever be a use case for. Chances are high noone will ever need that kind of function.

If you don't know enough about the use cases of the API to use this approach, you will probably do some more requirements analysis first - and that is something we cannot do for you.

Let me write something to your final edit, where you wrote

and then be able to iterate through without casting based on the runtime type.

Casting based on the runtime type isn't necessarily an issue. It becomes only a problem when it makes extensions to the Event class hierarchy harder, because existing Client code would be forced to change with each extension.

For example, let's say there is client code handling all chat events by a type test plus a cast for ChatEvent. If a new event type is added which is not a chat event, existing code will still work. If a new chat-like event is added, as a derivation of ChatEvent, existing code will also still work as long as the ChatEvent type conforms to the LSP. For specific chat events, polymorphism can be used inside the ChatEvent part of the inheritance tree.

So instead of avoiding type tests and casts superstitiously under all circumstances, because you have read in a text book "this is generally bad", reflect why and when this really causes any problems. And as I wrote above, writing some client code for some real use cases will help you to get a better understanding for this. This will allow you also to validate what will happen when your list of events get extended afterwards.

It's usually also a good idea to look at other popular APIs that have similar problems and see what they did. What immediately springs to mind here is Roslyn which has a similar problem (the "events" are nodes representing syntax or semantical information about code, admittedly it's not necessarily linear). Beware to still not fall into the overengineering trap - a solution for a compiler used by millions can have different non-functional requirements ;-) — Voo, Dec 16 '20 at 13:27
@Voo: is is usually a good idea to look at other popular APIs - **if they are aiming at compareable use cases**. — Doc Brown, Dec 16 '20 at 13:30
@DocBrown thank you for the insight. I realized my usage of “pattern” was erroneous, I meant it as a synonym for architecture/interface rather than meaning one of the GoF design patterns, so I’ve edited my question. Following up on the point of just exposing the list of events, my question is more at, “Is this an acceptable way to expose a list of events for consumers who handle the underlying events based on type?” From my understanding, a consumer in a language like Java would need to switch on the type and do an explicit cast to the underlying, which is what I feel should be avoidable. — rb612, Dec 16 '20 at 18:35
An ideal consumer would be able to a) implement handling of only the event types they need such that client code doesn’t need to change on adding a new event, and b) allowing them to iterate without having to check “is instance of” and cast to the appropriate type. Seems like the only way about this is a visitor, but that doesn’t satisfy (a). So this question is about providing the end consumer a clean way to iterate, whether that be a client-side utility or organized as part of the API. — rb612, Dec 16 '20 at 18:44
@rb612: a switch on the individual event type is not as bad as lots of people think, especially for the purpose of filtering. For example, handling all chat events: if a new event type is added which is not a chat event, existing code will still work. If a new chat-like event is added, as a derivation of the existing chat event type, existing code will also still work as long as your ChatEvent type conforms to the LSP. Type testing becomes only a problem when you are writing code which requires to deal with *every* possible type of events. — Doc Brown, Dec 16 '20 at 19:04
... and though your edit invalidated parts of my answer, my main recommendation still holds: I would write some tests or prototyping code which represents a consumer and some real use cases of those consumers - that is the only reliable way of making sensible API decisions. — Doc Brown, Dec 16 '20 at 19:10

score 7 · Answer 2 · answered Dec 16 '20 at 09:33

Instead of concentrating on the data, try to think more about what it is supposed to do.

So, an Event is supposed to record something that happens in the game. I imagine the only thing you would really want from an Event is to replay it (I know you have other use-cases, just hear me out :). That would mean something like:

public interface Event {
   void replayOn(Game game);
}

Note, you can "preserve ordering", because you don't have to know the exact type of event you're trying to replay. You don't have to have an enum or any other "properties" to distinguish between different types of events. These would be anti-patterns anyway.

However, you still have to define the Game. This is where you describe things that can happen in your definition of a chess game:

public interface Game {
   void move(...);
   void toggleClock(...);
   void sendText(...);
}

Now, if you want to analyze chats, you would make an implementation of the Game interface that ignores all methods other than sendText() for example, and you let all the events replay on this implementation. If you want to replay on the UI, you create an implementation of a Game for that. And so on.

Also note, that in this scenario you don't have to expose a List<Event> structure, just an Event. An Event can contain multiple "atomic" events if it wants to, since it is just defined in terms of what it does, not what it contains.

So for example this is a valid event:

public final class Events implements Event {
   private final List<Event> events;
   ...
   @Override
   public void replayOn(Game game) {
      events.forEach(event -> event.replayOn(game));
   }
}

As for what "pattern" this is, it doesn't really matter. One could argue it is a form of event-sourcing, since the state of the game is built from state transitions. It is also almost doing double-dispatching/visitors, except it is not using types to do the second step, but real domain-relevant methods.

It is certainly object-oriented though, because at no point is data pulled out of an object.

score 4 · Answer 3 · answered Dec 16 '20 at 11:25

I agree with the posted answer that you are overengineering your approach. Additionally, there are several options here, and you've been quite light on details and considerations that would help decide between those options.

But I happen to have worked on a similar problem not too long ago, so I wanted to give you a real world example of how your issue can be tackled.

Backend

In our case, we were returning a series of events of all types (user created, user updated, ...) but it had to be a single list, without specific filters (other than pagination).

Because there were myriad event types, and due to considerations they were kept as minimal as possible, we opted to serialize the event data and store it this way. This means that our data store didn't have to be updated every time a new event was developed.

A quick example. These were the captured events:

public class UserCreated
{
    public Guid UserId { get; set; }
}

public class UserDeleted
{
    public Guid UserId { get; set; }
}

Note that our events were truly kept minimal. You'd end up with more data in here, but the principle remains the same.

And instead of storing these directly in a table, we stored their serialized data in a table:

public class StoredEvent
{
    public Guid Id { get; set; }
    public DateTime Timestamp { get; set; }
    public string EventType { get; set; }
    public string EventData { get; set; }
}

EventType contained the type name (e.g. MyApp.Domain.Events.UserCreated), EventData contained the serialized JSON (e.g. { "id" : "1c8e816f-6126-4ceb-82b1-fa66e237500b" }).

This meant that we wouldn't need to update our data store for each event type that was added, instead being able to reuse the same data store for all events, since they were part of a single queue anyway.

Since these events did not need to be filtered (which is also one of your requirements), this meant that our API never had to deserialize the data to interpret it. Instead, our API simply returned the StoredEvent data (well, a DTO, but with the same properties) to the consumer.

This concludes how the backend was set up, and it directly answers the question you're posing here.

In short, by returning two properties (i.e. the serialized event data and the specific type of event), you are able to return a large variation of event types in a single list, without needing to update this logic whenever a new event type would be added. It's both future-proof and OCP friendly.

The next part focuses on the particular example of how we chose to consume this feed in our consumer applications. This may or may not match with your expectations - it's just an example of what you can do with this.

How you design your consumers is up to you. But the backend design discussed here would be compatible with most if not all ways you could design your consumers.

Frontend

In our case, the consumer was going to be another C# application, so we developed a client library that would consume our API, and would deserialize the stored events back into their own respective event classes.

The consumer would install a Nuget package we made available, which contained the event classes (UserCreated, UserDeleted, ...) and an interface (IHandler<TEventType>) that the consumer would use to define how each event needed to be handled.

Internally, the package also contains an event service. This service would do three things:

Query the REST API to fetch the events
Convert the stored events back to their individual classes
Send each of these events to their registered handler

Step 1 is nothing more than an HTTP Get call to our endpoint.

Step 2 is surprisingly simple, when you have the type and data:

var originalEvent = JsonConvert.DeserializeObject(storedEvent.EventData, storedEvent.EventType);

Step 3 relied on the consumer having defined handlers for each type they're interested in. For example:

public class UserEventHandlers : IHandler<UserCreated>, IHandler<UserDeleted>
{
    public void Handle(UserCreated e)
    {
        Console.WriteLine($"User {e.UserId} was created!");
    }

    public void Handle(UserDeleted e)
    {
        Console.WriteLine($"User {e.UserId} was deleted!");
    }
}

If a consumer wasn't interested in a specific event type, they would simply not create a handler for that type and therefore any events of that type would effectively be ignored.

This also kept things backwards compatible. If a new event type was added tomorrow, but this consumer wouldn't be interested in it, then you could keep this consumer untouched. It wouldn't break because of the new event type (it would just ignore those new types), and it wouldn't force you to redeploy your application.

The only real cause for redeployment would be if a change was made to the event types that the consumer was actually interested in, and that's logically inevitable.

Thank you! I think the end result for the client seems like a very nice solution with the way you have it setup to consume on the frontend. Could you please elaborate more on step 3 how you provided the bridge between the the handlers and consuming the heterogenous collection? That’s where I think the core logic of my design for this should lie - making it easier for consumers to handle these events of different types. — rb612, Dec 16 '20 at 18:49
@rb612: When you convert the event, you obviously know its type. So at that point, it's quite easy to look for the correct handler (based on that event type you know) among the registered handlers, and send your event to that handler. As to how you register those handlers, you could either automatically find them all using reflection, or you could manually register them (it's situational). In either case, once you have that list, it's fairly easy to do something like `myHandlers.Single(h => h is IHandler)` (or `Where`, if you allow multiple handlers). — Flater, Dec 16 '20 at 20:46
@rb612: Due to our situation (which I omitted as it's not relevant here), we went a quite complex route using reflection and automatic instantiation. I suggest keeping it simple. By doing manual registration, you can do something along the lines of `myHandlerDictionary.Add(typeof(UserCreated), new UserCreatedHandler())` so you've already instantiated the handlers and you can easily retrieve the handlers later on by looking through the dictionary. As an aside, I know that some DI libraries could be used for this, but opinions on whether that is a clean approach are divided as far as I've seen. — Flater, Dec 16 '20 at 20:51

score 1 · Answer 4 · answered Dec 19 '20 at 07:37

In an ideal scenario, I would envision the end user writing handlers to the event types they care about, and then be able to iterate through without casting based on the runtime type.

I can empathize with the sentiment here: surely there must be some other way to do this because looking at the type is a code smell, right? We've probably all seen code that does hairy things by taking in an object and doing some poor type checking on it, leading to some anti-patterns.

Let's look at another section of your question:

The challenge is that ordering between events must be preserved, so I can't separate by methods like getMoveEvents and getTimerEvents since a TimerEvent can happen between move events and the client may need that information.

Extending this logic - if we are looking at a truly generic handler we are saying that:

It could care about handling any single type
Or it could care about handling multiple types
Different types might not be independent of each other
It needs the items of different types in interleaved order

Basically, this boils down to saying that we don't know the interdependencies in processing logic, only that it must be time ordered. This means we can't write single-type handlers, and if we wrote something like "get all the items of type A and B and C, and dispatch them using handler A and B and C" we might find that handler A and B needed to work together to do the processing - which complicates things enormously. Is there something simpler but still flexible?

Well, how have programmers historically solved this type of problem? First, I think it's worth pointing out that there are lots of interrelated terms that show up in the comments and answers here that point to the basically the same solution: "algebraic data types" and "sum types", and I'll add a few as well - "discriminated union", "tagged union", and "variant". There might be some differences here, but the theme is that they can all look very much like your description of Event - they're subtypes that can carry data, but they should be distinct from say, the more generic object. Another related term mentioned is "pattern-matching", which ties into how you work with your discriminated unions.

As you may have guessed from the many names in use above, this is indeed a recurring problem; this tends to be a recurring solution to it across languages. These constructs are typically implemented at a language level - or emulated when the language doesn't support it. It's also not just something from the distant past or fully replaced by another construct - for example, C# 8.0 is expanding on pattern matching from C# 7.0 as of 2019.

Now, I'm afraid if you haven't seen it before - you may not like what this time-honored solution looks like. Here's the older C# 7.0 code example from the link above:

Fruit fruit = new Apple { Color = Color.Green };
switch (fruit)
{
  case Apple apple when apple.Color == Color.Green:
    MakeApplePieFrom(apple);
    break;
  case Apple apple when apple.Color == Color.Brown:
    ThrowAway(apple);
    break;
  case Apple apple:
    Eat(apple);
    break;
  case Orange orange:
    orange.Peel();
    break;
}

Or a Swift example:

    switch productBarcode {
    case let .upc(numberSystem, manufacturer, product, check):
        print("UPC : \(numberSystem), \(manufacturer), \(product), \(check).")
    case let .qrCode(productCode):
        print("QR code: \(productCode).")
    }
    // Prints "QR code: ABCDEFGHIJKLMNOP."

And if you clean this up, you can something like this in F#:

let getShapeWidth shape =
    match shape with
    | Rectangle(width = w) -> w
    | Circle(radius = r) -> 2. * r
    | Prism(width = w) -> w

And so we have come back full circle, at least if we squint a bit. The widely recurring solution has some smarts and syntactic sugar, but ... it looks like a more type safe version of a switch case!

Does the language you are working in have some version of this concept?

score 0 · Answer 5 · answered Dec 16 '20 at 20:55

Consider using sequence numbers. However, I think its worth looking at your requirements first:

I would envision the end user writing handlers to the event types they care about, and then be able to iterate through without casting based on the runtime type.

This is directly in opposition with

I want to design an API that can store and expose essentially a List to a client who can choose to process these different events as they wish.

You literally cannot do both. You either expose the information in at typed form, or a generic form. But to have an API that outputs it in a generictyped (or typedgeneric) form isn't really possible. You either erase the information or not.

As a solution, we can relax one of your rules

I can't separate by methods like getMoveEvents and getTimerEvents since a TimerEvent can happen between move events and the client may need that information.

Consider this as a solution: Every event in the system is assigned a unique "sequence number" which starts at 1 and counts upwards (I like to start at 1 so that 0 can be "invalid sequence number"). That sequence number is stored in the Event objects.

Now you can have getMoveEvents(), which returns an ordered list of all MoveEvents, and a getTimerEvents(), which returns an ordered list of all TimerEvents. Any algorithm which needs to understand the interplay between events of different types can look at the sequence number. If I have [Move(seqnum=1), Move(seqnum=3)] and [Timer(seqnum=2)], it is quite easy to see that the order of events was Move, Timer, Move.

The logic here is that your user knows the type of data they wish to operate on (such as MoveEvents). Its reasonable then for them to know a type-specific function to call to get a list.

The user can then merge the events in whatever way they please. As an example, consider an algorithm which looks at MoveEvents and TimerEvents, and nothing else. It could have an API like:

enum EventType {
    MOVE,
    TIMER
};
bool        moveNext(); // returns true if there's another event to move to
EventType   getCurrentType();
MoveEvent   getCurrentMoveEvent();  // error if current type is TIMER
TimerEvent  getCurrentTimerEvent(); // error if current type is MOVE

It then merely needs to iterate through each list, find which list has the lower-numbered sequence number, and that's the next event. Note that I did no casting, and the enumeration is algorithm specific - a different algorithm can maintain its own list of enuemrated Events to consider.

If you see a sequence number jump (by more than 1), then you know that there were events that occurred of a type you aren't handling. Its up to your algorithm to decide if that's an error, or if you can simply ignore unrecognized events. Typically it's pretty obvious which.

If your Event class has something other than a sequence number, you might also expose List<Event> with all events as a way to walk through them. One could always find the sequence number of an event of interest, then seek it out in the typed events that it knows of. However, if you expose no additional information, there's no need for this List<Event>. We know the order that the event sequence numbers proceed in: 1, 2, 3, 4...

An example algorithm that could use this pattern: Assign each move a range of times at which the move could have taken place. If you scan only the MoveEvent and TimerEvent lists, you can find the two TimerEvents whose sequence number bounds each MoveEvent. Because you know the events happen in sequence number order, you know that the move must have taken place between the timestamp on the first TimerEvent and the second.

score 0 · Answer 6 · answered Dec 17 '20 at 00:26

While your example source code is heavily Java inspired, what you're asking for is sum types, which is a type formed of a union of other types.

In your example above, in a language like rust:

struct Move {
    previous: (u8,u8),
    new: (u8,u8)
}

struct GameTimer {
    timestamp: i64,
    new_player_id: i64,
}

struct Message {
    timestamp: i64,
    new_player_id: i64,
    message: String
}

enum Event {
  Move(Move),
  Timer(GameTimer),
  ChatMessage(Message)
}

fn length_all_chats(events: Vec<Event>) -> usize {
    events.iter().fold(0, |sum, event| 
        sum + match event {
            Event::Move(_) => 0,
            Event::Timer(_) => 0,
            Event::ChatMessage(Message{message: msg, ..}) => msg.len(),
        }
    )
}

The above length_all_chats returns the sum of the lengths of all chat messages in the list of events.

If a new event type was introduced, the consumers would need to update in order to compile (or provide a catch-all pattern). This is a different way of implementing runtime polymorphism, allowing for more powerful patterns, like multiple dispatch where you can call a different function based on the types of two (or more) arguments.

How to design for an ordered list of unrelated events

6 Answers6

Define some typical use cases for your API!

Backend

Frontend