4

I need some advice on an approach to log raw request/response data from a few webapps, for all operations and hits on all APIS accessed via HTTP methods (mostly HTTP POSTs), that I have hosted.

Objective: capture raw response body at the end of request-response life-cycle. The app is built using Python/Java and nginx is the web-server frontending these apps.

Approach 1: Use the lua logging extensions and capture raw response body at the end of req-resp cycle at nginx. Either directly by parsing the response body and logging it to a flatfile while relaying response back to the requestor, or by relaying it to another remote logger in a non-blocking way.

Advantages:

  • very straight forward and easy to achieve.
  • additional logic coded in lua runs in a very jailed env so safety is by design.

Disadvantages:

  • Considerable hit on performance since the buffer sizes at web-server's end is limited and huge response bodies cause multiple buffer re-read ops.
  • Programming is limited to lua, might not appeal to all.

Approach 2: Use a Python/Java based middleware which relays the request from web-server to the web-app and do the manipulation of request/response to or from the webapp. Either something in twisted/tornado etc.

Advantages:

  • Not too difficult to implement
  • filtering/logging logic can be coded in multiple high-level langs/frameworks
  • web-server need not be bothered about having business logic or any further additional dependency.

Disadvantages:

  • Operational overheads associated with maintaining a different middleware

Which of these do you feel is an elegant design? The languages and frameworks for the apps can be any of Python/Java, but the webserver would be nginx, on Linux.

ksvrgh
  • 41
  • 4
  • What is the purpose you need this kind of logging for? – Bart van Ingen Schenau Mar 10 '17 at 11:19
  • 1
    Whatever approach is chosen, I would recommend having the ability to turn it on/off. If everything is running normally, logging requests/response is just a resource hit to the system which may be acceptable in development or testing environments. If you have auditing requirements, then there may be a need leave it on. – Jon Raynor Mar 10 '17 at 15:07
  • The main objective is to stash and store the raw request and response bodies for statistical analysis later. Auditing is one of the byproducts, but not the main aim. – ksvrgh Mar 13 '17 at 10:01
  • @JonRaynor : "having the ability to turn it on/off." very much! – ksvrgh Mar 13 '17 at 15:30

1 Answers1

2

The suggestion that I have if you desire to log the raw HTTP traffic is not to do this in code at all and instead utilize a webserver frontend to do this for you.

Both Apache and Ngnix have the capability to log incoming and outgoing traffic on seperate servers. You can set them up to be a reverse proxy to your application servers giving you the ability to take advantage of other features and benefits like serving static web content so that application servers are not burdened with such requests (assuming you are not already using a CDN). Furthermore you can take advantage of additional security and authentication modules and SSO at the web layer.

maple_shaft
  • 26,401
  • 11
  • 57
  • 131
  • The objective here being logging entire request and response bodies, would it be a good idea to do it at the web-server end? Or would it be better handled at the middleware/web-app level? Considering that if raw request response body needs to be logged at nginx, one would be limited to using lua. Active code gets mixed with configuration, which might not be an elegant way of doing things. Also, extensibility and penalty on performance would matter. The latter, since response streams would have to be read off the nginx buffer over multiple iterations which could impact performance. – ksvrgh Mar 13 '17 at 15:26
  • @ksvrgh I dont see why logging the raw requests and responses at ngnix will constrain your middleware. I also dont see how this would hurt performance unless your middleware needed to do analysis real time. It would only help middleware performance. You could realistically just stream log files over to your middleware and process them there when you need to run analysis jobs. – maple_shaft Mar 13 '17 at 18:09
  • I ran a few tests where I logged raw response bodies at nginx level and the same request response were replayed without logging to measure response time and compare them. The results are at https://www.pastiebin.com/58c77417d3130 So, am exploring the possibility of a better approach that doesn't impact performance to that level. Am not bothered about middleware's performance, am bothered about nginx (web-server's) performance since that would not be proxying a lot of other apps as well (which would not need logging of request/response bodies) behind it, along with the app in question. – ksvrgh Mar 14 '17 at 04:42
  • @ksvrgh Ahhh I see! I guess my answer is not so helpful then. – maple_shaft Mar 14 '17 at 16:05