2

I need to create a very small service which will programmatically make a database insertion once a day. As our stack primarily uses Node.js, we’re going to write a JS script responsible just for making the insertion and then ending execution. We will then cron that script for daily execution. However, the script itself, while still theoretically useful on its own, doesn’t really fulfill a major need without the scheduling component.

In addition, we try to setup all our services to be able to automatically deploy themselves in the event of an environment move/rebuild or something, so there’s a need to make sure the scheduling aspect of our application is captured in that automation. The question is how?

In my opinion, the service should also be responsible for scheduling itself. My personal opinion is to handle this in the package.json file, which would have a build option that runs something that adds itself the to the cron. This way, I think we’re still able to achieve a separation of concerns: the script itself which is able to do its insertion, and the build file/command, which is responsible for scheduling the script.

However, I’m not sure this is the best way, just the best way I can immediately think of without going overboard on what is an otherwise very simple script. Thoughts?

user3781737
  • 129
  • 2

3 Answers3

1

Pretty much anything that aligns with build/deploy cycle should be made part of it. In lieu of any other considerations, I see no reason to require a separate management of such things. I find your approach to be reasonable if we consider the scheduling to be part of the build.

I'm a little unsure of why you would have something like this in a design in the first place and I associate the use of cron jobs with production issues and vulnerabilities. But given this design, I don't see a problem with your plan other than what Ewan mentions in his answer about potential scaling issues.

JimmyJames
  • 24,682
  • 2
  • 50
  • 92
  • Well I didn't go into detail for fear of people giving me opinionated solutions that would require me to bend over backwards for very little practical difference, but the use case is that we need to make daily data from an external API available to a reporting engine. For reasons related to network security, auth complexity, efficiency, maintenance, and reliability, it makes far more sense for us to replicate the new data into our own DB via a script than it would be to build a persistent interface between our reporting engine and this external API. – user3781737 Oct 18 '21 at 21:14
0

Create a script which creates all your cron jobs, idempotently. You deploy it just the same way you deploy your services. Now all of your schedule is in one place, you want to change your schedule for anything you change it there and deploy it, you don't have to change the code or config of any service - which you might not want to do to just change the schedule ...

The schedule for your (periodic) services is a separate concern from the services themselves, and should be created/managed separately.

(Just as the health check that ensures your services are always running is a separate concern from your services and you create/manage that separately - even though each service provides it's own heartbeat/health check url it does not - cannot! - ensure it stays healthy itself).

davidbak
  • 712
  • 1
  • 7
  • 10
  • If by "all your cron jobs" you mean every single cron I already have running on this box, then I think this breaks separation of concerns. If I ever need to break this box in half or deploy this script on its own to a different environment, then that master schedule-creation script is suddenly in charge of way too much, no? – user3781737 Oct 18 '21 at 18:29
  • scripts can be modularized just like services. and: this particular script is _simple_. If you have to "break this box in half" or "deploy this script on its own to a different environment" you have a small thing to consider, reconstruct, rewrite. whatever. you have weigh your _present_ concerns against your _future_ flexibility, and also the _likelihood_ that any of your scenarios will actually _happen_, as is usual in design. – davidbak Oct 18 '21 at 19:26
  • My question then, is what is the advantage of your solution over a "build" option in the application which does the same thing, just more granularly? – user3781737 Oct 18 '21 at 20:34
  • The advantage is: no _build_. You may want to change the schedule for a service without redeploying the service, much less rebuilding it. Consider your development/deployment process - including whatever you do for testing. Consider _who_ is doing the build/deployment: the dev team, the ops team, who? Consider _when_ the build/deployment is needed: when your primary dev team is at work, or in the middle of the night when it's up to the on-call? There are a _lot_ of advantages to separating the schedule from the build - you can think about it and decide what applies to _you_. – davidbak Oct 18 '21 at 20:38
  • I think my frame of reference is very different from yours, so apologies where I'm misunderstanding. Maybe my use of the word "build" implies it's doing more than it actually is, but imagine I modularized your solution and simply made the relevant module the target of my `npm build` command for this service. This command would do nothing other than `node ./this-module.js`. No additional re-deployment, etc. What's the difference from that, and just including the `this-module.js` file in the application? – user3781737 Oct 18 '21 at 21:01
  • You asked a nice general question about where scheduling functionality should go in a multi-service system where you have different deployment scenarios. I answered that the best way I could. _Now_ you say you're only interested in this one specific scenario you mention. Ok, I answered that _too_ (in the comments above): "you have weigh your present concerns against your future flexibility, and also the likelihood that any of your scenarios will actually happen, as is usual in design." And you've done that! You've decided you don't see a benefit in this approach. There you go! – davidbak Oct 18 '21 at 22:43
-2

I always feel that whenever you have to do something every X time there is a design fail.

Either you aren't capturing triggering events, or you have missed a way of calculating the event after the fact.

However, if you have to, then my I would advise having a service that runs 24/7 and does its own scheduling rather than using a scheduler to run a one off task every X time units.

This removes the problem of "setup the schedule" and gives you more control over things like:

  • I've installed to two boxes, how do i make sure i don't get twice as many events?

  • The scheduler crashed and I didn't notice until after the event should have happened

  • The task took longer than the scheduled interval, should I start it again or wait for it to finish?

  • Various failure other scenarios

  • I need to run more often than once a min

Ewan
  • 70,664
  • 5
  • 76
  • 161