2

I have some classes being serialized via frugal (this is an open source extension of Thrift). The primary reason this is relevant is that the serialization process is out of my control and needs to support multiple languages, so I cannot easily modify it.

Imagine my classes look like:

class response_component:
    value = 'string value'
    another_value = 4
    another_string = 'another string'

class the_response:
    contained = response_component

This gets serialized into json similar to:

{"response": {"the_response": {"value": "string value", "another_value": 4, "another_string": "another string"}}}

This can be interpreted and recreated by the underlying messaging framework into classes on the recipient end. What I am interested in doing is having middleware make modifications to underlying attributes.

For example, I may wish to change the "another_string" value in the response example here to "hello."

The difficulty I have is understanding how to denote this reference in such a way that is meaningful within the json context.

I can do something which feels hokey like articulating the absolute path through the classes, such as RESPONSE.contained.another_string and then in the deserialization process unpacking this (or, in python, doing a series of getattr until I find the last value. This might look like, in Python:

r = the_response()
print(r.contained.another_string)
path = 'the_response.contained.another_string'

v = None
paths = path.split('.')
for p in paths[1:-1]:
    if not v: v = r
    print(p)
    v = getattr(v, p)


setattr(v, paths[-1], 'hello')

print(r.contained.another_string)

communicated via json such as:

{"modified_value": {"key": "the_response.contained.another_string", "val": "hello"}}

However, I really dislike the path approach and am hoping for a different and better way to convey where to modify the values.

enderland
  • 12,091
  • 4
  • 51
  • 63

1 Answers1

1

If you have a JSON document and want to modify specific keys, there's no good way around specifying absolute paths. This changes when you can deserialize into classes under your control, because you can apply the composite/visitor pattern.

In the pure-JSON case where entries are not self-describing beyond their key names, you only have the key names available to navigate the data structure. Therefore, no solution but using absolute paths is possible.

There is a RFC 6901 “JSON Pointers” which provides a syntax for these paths, with Python implementations available. This is approximately XPath-ish, but can only address a single element – there are no wildcards. This makes JSON Pointers unsuitable e.g. when you want to change every entry in an array. It may therefore be sensible to implement your own, more flexible syntax, as you sketched out in your question.

With custom classes, you can add behaviour to these classes that simplifies the search of entries to be changed significantly. This requires two changes:

  • All of these classes implement the Visitor Pattern so that an externally defined operation can be applied to the instance, but only to a specific type.
  • Each class knows which of its members are also visitable, and distributes the operation to its children – an instance of the Composite Pattern.

If you dislike patterns, all of this can be also implemented on a good-enough basis using reflection features.

This requires your data model to have a clear concept of types so that changes can be restricted to the correct types.

The Visitor/Composite solution would look something like this:

class response_component:
  ...
  def accept_visitor(self, visitor):
    visitor.visit_response_component(self)

class the_response:
   ...
   def accept_visitor(self, visitor):
     visitor.visit_the_response(self)
     # distribute the operation to relevant childs
     self.contained.accept_visitor(visitor)

class SetAnotherString(object):
   def __init__(self, value):
     self.value = value

   def visit_the_response(self, r): pass
   def visit_response_component(self, c): c.another_string = self.value

json_data.accept_visitor(SetAnotherString('hello'))

As your actions seem to be rather dynamic, it might make sense to replace the visitor's method dispatch with a dictionary lookup:

class response_component:
  ...
  def accept_visitor(self, visitor):
    visitor(self, 'response_component')

class the_response:
  ...
  def accept_visitor(self, visitor):
    visitor(self, 'the_response')
    # distribute the operation to relevant childs
    self.contained.accept_visitor(visitor)

class SetValues(object):
  def __init__(self, changes_for_class: Dict[str, Dict[str, Any]]):
    self.changes_for_class = changes_for_class

  def __call__(self, instance, classname):
    try: changes = self.changes_for_class[classname]
    except KeyError: return
    for k, v in changes.items():
      setattr(instance, k, v)

json_data.accept_visitor(SetValues({
    'response_component': dict(another_string='hello'),
}))

Compared with the double-dispatching visitor, these updates are easier to represent in a language-agnostic manner. But compared with JSON Pointers, both visitor approaches need to search the complete document to find all instances that must be updated, whereas the Pointer only resolves a single element. It may be possible to create an ad-hoc query language that combines both of these approaches, e.g. using a JSON Pointer to specify a prefix of the data structure where the visitor-based search should start.

amon
  • 132,749
  • 27
  • 279
  • 375