Let's assume I have access to two rest endpoints, /persons
and /products
. If I need to retrieve 100K persons with their respective products, what is the best way to do that?
How can I join them efficiently ? Implementing temporary table in DB, joining them in memory? For example, If I do it in memory, then I found myself doing RDBMS work, when the join condition is only on some ID key it can be acceptable, but if they are more complex join conditions using web services, in this case, seems to be very inefficient.
Join with WS:
- HTTP GET 100K persons in JSON format
- HTTP GET all the products filtered by the person ID
- Create some array structure of products indexed with the person ID
- Loop over persons then adding into person their products when person.id = products.person_id
- Use the data
Join with some table dump:
- FTP get persons.sql
- FTP get (linked) products.sql
- Load them into temporary table in DB (doing the same with some JSON or XML need transformations)
- Run SQL query
- Use the data
So in the WS simple scenario, the disadvantages are the massive use of memory and the complexity of doing the RDBMS (join not even talking about some other case like ordering grouping if needed) work and in this case, there are only two entities.
(The idea behind this reflexion is to study the possibility to completely remove EAI in enterprise for SOA/ESB architecture)