4

I recently started moving a monolithic application to microservices architecture using docker containers. The general idea of the app is:

scraping data -> format the data -> save the data to MySQL -> serve data via REST API.

I want to split each of the steps into a separate service. I think I have two choices, what is the best practice in microservices architecture here?

Option one
Scraper service - scrapes and publishes to Kafka
Formatter service - consumes messages from Kafka and formats it
API service - consumes Kafka messages, updates MySQL and exposes a REST API
Drawback: If I'm not wrong, docker containers should preferably run only one process per container

Option two
Scraper service - scrapes and publishes to Kafka
Formatter service - consumes messages from Kafka and formats it
Saving to DB service - receives the formatted information and just updates MySQL (runs as python process)
API service - exposes a REST API that serves requests with python flask.
Drawback: Two services connecting to the same DB, supposely not recommended as they would not be decoupled

What is the best practice here? should I go with option one and run flask server and kafka listener in the same container?

Thanks!

rogamba
  • 151
  • 6
  • 2
    Surely if you are using a REST API you should be able create (POST) new resources as well a read (GET) existing resources, wouldn't a single REST service satisfy both save and serve – AChampion Sep 24 '16 at 04:39
  • 1
    Yes, but for the high volume of data in the scraping stream I don't think it would be the ideal way of doing it, the server would saturate with those requests. – rogamba Sep 28 '16 at 22:28

2 Answers2

2

I would suggest something along the following lines.

  • Scraper: scrapes the data and published to Kafka
  • Formatter/Persistence: Reads from Kafka, sends data to the storage layer
  • Storage: 1 "real" database where you performs writes. Replicate this db to as many read only copies as you need.
  • API: Accesses only the read-only replicas to serve the data.

The concept of eventual consistency comes into play here. You can spin up as many replicas and API containers as you need to meet demand, at the cost of them sometimes returning different (old) data. At some point the replica dbs get refreshed and the API starts serving the newest data. This way, writing new data doesn't bottleneck the response times of your reads.

RubberDuck
  • 8,911
  • 5
  • 35
  • 44
1

Without any doubt it's option two, and the drawback you evocate is the same for option one since you would have one service ("API service") with 2 really different responsabilities (save to DB + expose to API) grouped in one deployment package.

These 2 services (save to DB and expose to API) could share a common DAO layer though, duplicated in both services. OR the "expose to API service" is read-only, so they would be fully independant services even if they interact with the same db.

UPDATE : just if you need to see that sharing a databases between 2 microservices is not an antipattern : http://microservices.io/patterns/data/shared-database.html

Tristan
  • 207
  • 2
  • 9
  • I wouldn't say it's not an anti-pattern, just that it's an option. If you have to share a database, you just have a distributed monolith. – mgw854 Mar 12 '17 at 17:23
  • I would caution against this kind of dogmatic commitment. You will often have different access patterns around the same data (i.e. streaming ingestion and RPC, or even different groups of RPC endpoints) such that it makes sense to run distinct processes that access the same data. The monolith smell starts to creep in when services that have wholly separate responsibilities and datasets are sharing a DB. Several microservices can (and should!) be cooperating components of the same subsystem with the same dataset. Don't stick your users table on it, but other scraping components are fine. – closeparen Jun 10 '17 at 18:14
  • 1
    My employer recognizes `services` as containing one or more `applications`, typically cut from a monorepo. Each application has its own entrypoint, but the expectation is that applications of a service share code and data. So in this case you have a scraping service with crawler, formatter, ingestion, and API components. – closeparen Jun 10 '17 at 18:16