43

At the past two companies, I've been at, REST API's exist for querying data via webapp - i.e. instead of having the webapp do SQL directly it calls a REST API and that does the SQL and returns the result.

My question is - why is this done?

If it was going to be exposed to third-parties I could understand. Better to expose a limited REST API than the full DB. But in both of these companies that's not the the case.

It's been suggested to me that these REST API's make it easier to switch between DBMS. But isn't that the point of a database abstraction layer (DBAL)? Maybe you use an ORM as your DBAL or maybe you could just write raw SQL and have your DBAL translate the DB specific stuff if appropriate (e.g. translate LIMIT for MySQL to TOP for MSSQL).

Either way it seems unnecessary to me. And I think it makes diagnosing issues more difficult as well. If a report on the webapp is giving the wrong numbers you can't just dump out the SQL query - you have to dump the REST URL and then go into the project that's serving as the REST API and pull out the SQL from that. So it's an extra layer of indirection that slows down the diagnostic process.

neubert
  • 568
  • 1
  • 5
  • 7
  • 4
    It sounds like you've only worked with apps that are basically CRUD-- some users enter data with forms and other users read the data with the same forms or with report print outs? If you've never worked with a system that has the need for a complex and sophisticated domain model, then I can see how you would have that particular mindset. Many applications require that extra layer of indirection to do stuff. – RibaldEddie Mar 30 '15 at 03:44
  • 1
    I've worked with API's (not necessarily REST) that (among other things) performed calculations on the parameters that were passed to it. Maybe a DBMS is utilized in those calculations but presumably a lot of the logic does not live in the DB. However, the internal API's at the companies I've worked at do not do this. They just query a DBMS and spit out the results of the SQL query verbatim. It just seems to me that REST API's are often (not always - often) written to be trendy - not to be practical. – neubert Mar 30 '15 at 05:15
  • 2
    There are definitely quirks to REST API design that make it difficult to design a complex domain well-- most developers I've met over the years don't care about design. They want to churn out code as fast as possible so that their bosses will love them and think that they are rock stars. When you combine that fact with a trend like REST, you get trendy but impractical spaghetti API. It's got nothing to do with REST itself. – RibaldEddie Mar 30 '15 at 05:30
  • 3
    ever wonder how some web companies report that their entire user records were stolen by a hacker? Ever wonder how the hacker did it? When you think that a web server has a direct connection to the DB, you realise once the web server is hacked the attacker has full and unrestricted access to select anything from the DB he likes. Stick it behind a middle tier and the attacker can then only call methods on the middle tier. I won't say that's instant security, but its significantly better. – gbjbaanb Jun 17 '15 at 09:21
  • @gbjbaanb: The web server must be able to access all relevant data through the rest API anyway, otherwise it wouldn't work, so I dont see how the data is fundamentally better protected behind a rest tier. – JacquesB Jun 15 '16 at 06:46
  • @JacquesB no, the webserver acts as a gateway to the rest of the system, it should not be the system. If you have a web server that acts as a "client-by-proxy" you have a much more secure system - as the hacker has to hack first the webserver and then, from that position, hack the actual server. This is *much* harder than exploiting an unpatched webserver using a pre-packaged script. Just think of it like this when designing your systems: the webserver is the client. – gbjbaanb Jun 15 '16 at 07:31
  • 1
    @gbjbaanb: My point is the webserver can access the data through the rest server, so if the webserver is hacked, the attacker can also access the data through the rest server without having to hack the rest server. – JacquesB Jun 15 '16 at 08:57
  • 1
    @JacquesB no they can't - or at least they cannot access what they like, all they can do is query the REST server. eg. they will not be able to dump the entire user table and crack passwords at leisure. If the rest server is also secured and requires auth, then they will have to also fake the login to the server too, which may be even more difficult for them. Assuming no auth on the rest server, all an attacker can do is call its API, which is just a more complicated way of simply using the web interface! – gbjbaanb Jun 15 '16 at 09:31
  • @gbjbaanb In your comments it seems you have moved the focus of whole discussion on one point i.e. 'security' but lets say that is not point of concern but performance is. Then? – sactiw Nov 19 '16 at 14:51
  • @sactiw Performance can be increased by using a 3-tier model. OK, 1 user will not see a perf increase, but if you have a great many, then using a 3-tier model will let you scale significantly more, and so performance will increase over a monolithic single-server design, even though a developer may think the code-path has lengthened. (and besides, I've seen plenty of single-server designs that have many layers or DALs and DIs and so on that make for a very long code-path anyway!). PS security is always important for web servers, who cares about performance when you've lost all your data?! – gbjbaanb Nov 19 '16 at 22:40
  • @gbjbaanb "Performance can be increased by using a 3-tier model." only as long as their is sharing of data and query results are good candidates for caching. Say the CRUD operations being performed are on separate set of data (rows) and their is no sharing of data in true sense? – sactiw Nov 21 '16 at 06:35
  • @sactiw you're over-thinking it. This is nothing to do with caching. I recommend reading up on the scalability benefits of the n-tier model, there's plenty of online resources to explain it to you. – gbjbaanb Nov 21 '16 at 10:27
  • @gbjbaanb Okay, is there something (link/doc/book) at top of your head? Something that explains in a easy way to understand and apply in practice? (Trying to save time in finding something good :)) – sactiw Nov 21 '16 at 16:31
  • Google Firebase is a good example that highlights that per-entity REST APIs are not necessary. Security is enforced by the database. Store the data, then have a CloudFunction trigger from the necessary data condition. see https://softwareengineering.stackexchange.com/questions/343991/where-to-put-business-logic-if-using-firebase/390910#390910 and https://stackoverflow.com/questions/54994228/how-to-minimise-firebase-function-latency – Kind Contributor Aug 09 '20 at 11:19
  • The REST APIs do use ORM internally. All application integration is done through message passing. Firstly there is a mandate to have a standardized format of the message, And then, there are three MODES of message processing : a) Realtime process b) Near-Realtime process c) Batch process. APIs are used mostly in the Realtime mode of message processing where the client system want to see the data in the form of stateful Entities in JSON format. The other two, the near-Realtime processing and Batch processing of messages, use Message Queue and ETL workflows, consecutively – Subhadeep Ray Nov 24 '21 at 13:47

7 Answers7

34

If you allow a client to access the database directly - which they would do, even with a database abstraction layer, then:

  • You get a coupling between their code and yours - particularly, there is a very strong coupling between your database structure and their code;
  • Your client may do some pretty undesirable stuff on your database - whether it be updating data that they should not, writing a query that takes too much time, deadlocking something because they do not acquire locks cleanly...
  • If you have made some less than optimal choice in your database structure, then moving out of that choice may be very hard, especially if you do not have a good way to make your clients migrate over to new structures.

That is, I am not touching at all on the REST part - isolating your database behind an API is simply a more sensible choice if the team that maintains the database and the teams that use it are not in sync, as it allows these parts to evolve at their own pace.

jhominal
  • 1,448
  • 10
  • 17
27

You are right, there is no clear benefit to introduce a REST API layer between a web app and a database, and it has a cost in complexity and performance overhead.

The reason you are getting contradictory answers is confusion about what is the 'client' in your architecture.

In your architecture (if I understand it correct), you have browsers interacting with a single web app, which in turn is interacting with the database. Introducing a REST API layer between the web app and the database has no benefit. All the stated benefits (caching, isolation of database etc) can be achieved with data access layer(s) in code.

But there are some other architectures where a REST API makes sense:

  • If you have multiple clients accessing the database - that is, not a single web app but multiple independent web apps accessing the same database. It may have benefit to create a common REST interface to allow sharing of data model, caching etc. Sure you can get some of the benefit by sharing the same DAL libraries, but that wont work if the apps are developed in different languages and on different platforms. This is common in enterprise systems.

  • If you have multiple desktop apps accessing the database directly. This is the classic "two-tier" architecture, which have fallen out of favor compared to web apps. Introducing a REST layer allows you to centralize data access logic and especially it allows tighter control of security, since it is risky to have multiple distributed clients accessing the same database directly.

  • If you have JavaScript code which directly fetches data from the server, then you need something like a REST API in any case.

JacquesB
  • 57,310
  • 21
  • 127
  • 176
  • 1
    I liked your answer but I have few more queries that comes with it. Like what about the performance loss with introduction of another layer of abstraction? Also, doesn't makes it single point of failure (if this goes down everything else goes down) and possible bottleneck (each app waiting for DB connection from the pool)? – sactiw Nov 19 '16 at 14:48
  • @satich: I don't understand exactly what you are asking, can you be more specific? Are you asking about single-point-of-failure with or without a REST tier? – JacquesB Nov 20 '16 at 10:24
  • The extra layer can have a use if you have more than one app communicating with it – Ewan Nov 20 '16 at 14:20
  • @Ewan: Yes, this is what I state in the first bullet point. – JacquesB Nov 20 '16 at 16:19
  • 1
    @JacquesB Assume multiple web-apps share the same DB but not same data i.e. each doing CRUD operations on a separate set of data within that DB, basically there is no sharing of data in true sense. So does putting apps behind a Restful persistence framework makes any sense (also assume DB provides good level of concurrency in queries)? Moreover, won't that framework be a bottleneck as well as single point of failure for so many web-apps interacting via it? – sactiw Nov 21 '16 at 06:32
  • @sactiw: If the data is completely separate (that is, the schema and business logic is also different) for each app, then you don't get the benefit of a shared data model. You would need separate rest services for each app. But in a CRUD app, the bottleneck and single-point-of-failure is the database. But obviously introducing an additional tier introduces new ways the app can fail. – JacquesB Nov 22 '16 at 09:29
15

Warning: big post, some opinions, vague 'do what works best for you' conclusion

Generally, this is done as a means of implementing 'hexagonal architecture' around your database. You can have web applications, mobile applications, desktop applications, bulk importers, and background processing all consume your database in a uniform way. Certainly you could accomplish the same thing to some extent by writing a rich library for accessing your database, and having all of your processes use that library. And indeed, if you're in a small shop with a very simple system, that's actually probably a better route to go; It's a simpler approach and if you don't need the advanced capabilities of a more complicated system, why pay for the complexity? However, if you're working with a large, sophisticated set of systems that all need to interact with your database at scale, there's a lot of benefits to putting a web service between your applications and your data:

Platform independence & maintenance

If you have a database, and you write a Python library to interact with that database, and everybody pulls in that library to interact with the database, that's great. But let's say suddenly you need to write a mobile app, and that mobile app now needs to talk to the database as well. And your iOS engineers don't use Python, and your Android engineers don't use Python. Maybe the iOS guys want to use Apple's languages and the Android engineers want to use Java. Then you'd be stuck writing and maintaining your data access library in 3 different languages. Maybe iOS and Android devs decide to use something like Xamarin to maximize the code they can share. Perfect, except you're probably still going to have to port your data access library to .NET. And then your company just purchased another company who's web application is a disparate but related product, and the business wants to integrate some of the data from your company's platform into the newly acquired subsidiary's platform. Only there's one problem: The subsidiary was a start-up and decided to write the bulk of their application in Dart. Plus, for whatever reasons (reasons probably beyond your control) the mobile team that was piloting Xamarin decided it's not for them, and that they'd rather use the tools and languages specific to the mobile devices they'll be developing for. But while you were in that phase, your team had already delivered a large portion of your data access library in .NET, and another team in the company was writing some crazy Salesforce integration stuff and decided to do all of that in .NET since there was already a data access library for .NET and it seemed like a good idea because mobile was initially planning to use .NET as well.

So now, because of a very realistic turn of events, you have your data access library written in Python, .NET, Swift, Java, and Dart. They're not as nice as you'd like them to be, either. You couldn't use an ORM as effectively as you'd like to, because each language has different ORM tools, so you've had to write more code than you would have liked to. And you haven't been able to devote as much time to each incarnation as you would have wanted, because there's 5 of them. And the Dart version of the library is especially hairy because the you had to roll-your-own transactional stuff for a some of it because the libraries and support just wasn't really there. You tried to make the case that because of this, the Dart application should have only had read-only functionality for your database, but the business had already made up their minds that whatever features they were planning were worth the extra effort. And it turns out there's a bug in some of the validation logic that exists in all of these incarnations of your data access library. Now you have to write tests and code to fix this bug in all of these libraries, get code reviews for your changes to all of these libraries, get QA on all of these libraries, and releases your changes to all of the systems using all of these libraries. Meanwhile, your customers are displeased and have taken to Twitter, stringing together combinations of vulgarities you never would have imagined could be conceived, let alone targeted at your company's flagship product. And the product owner decides to be not very understanding about the situation at all.

Please understand that in some environments, the above example is anything but contrived. Also take into consideration that this sequence of events may unfold over the course of a few years. Generally, when you get to the point where architects and business people start talking about hooking up other systems to your database, that's when you're going to want to get 'putting a REST API in front of the database' onto your roadmap. Consider if early on, when it was clear that this database was going to start being shared by a few systems, that a web service/REST API was put in front of it. Fixing your validation bug would be a lot quicker and easier because you're doing it once instead of 5 times. And releasing the fix would be a lot easier to coordinate, because you're not dependent on several other systems releasing in order to get your change out there.

TLDR; It's easier to centralize the data access logic and maintain very thin HTTP clients than it is to distribute the data access logic to each application that needs to access the data. In fact, your HTTP client may even be generated from meta-data. In large systems, the REST API lets you maintain less code

Performance and scalability

Some people may believe that talking to the database directly instead of going through a web service first is faster. If you have only one application, that's certainly true. But in larger systems, I disagree with the sentiment. Eventually, at some level of scale, it's going to be very beneficial to put some kind of cache in front of the database. Maybe you're using Hibernate, and want to install an Infinispan grid as an L2 cache. If you've got a cluster of 4 beefy servers to host your web service separate from your applications, you can afford to have an embedded topology with synchronous replication turned on. If you try to put that on a cluster of 30 application servers, the overhead of turning on replication in that setup will be too much, so you'll either have to run Infinispan in a distributed mode or in some kind of dedicated topology, and suddenly Hibernate has to go out over the network in order to read from the cache. Plus, Infinispan only works in Java. If you have other languages, you'll need other caching solutions. The network overhead of having to go from your application to your web service before reaching the database is quickly offset by the need to use much more complicated caching solutions that generally come with overhead of their own.

Additionally, that HTTP layer of your REST API provides another valuable caching mechanism. Your servers for your REST API can put caching headers on their responses, and these responses can be cached at the network layer, which scales exceptionally well. In a small setup, with one or two servers, your best bet is to just use an in memory cache in the application when it talks to the database, but in a large platform with many applications running on many servers, you want to leverage the network to handle your caching, because when properly configured something like squid or varnish or nginx can scale out to insane levels on relatively small hardware. Hundreds of thousands or millions of requests per second of throughput is a lot cheaper to do from an HTTP cache than it is from an application server or a database.

On top of that, having a ton of clients all pointed at your database, instead of having them all pointed at a few servers which in turn point to the database, can make tuning the database and connection pooling a lot harder. In general, most of the actual workload on an application server is application stuff; waiting for data to come back from the database is often time consuming, but generally not very computationally expensive. You may need 40 servers to handle your application's workload, but you probably don't need 40 servers to orchestrate fetching the data from the database. If you dedicate that task to a web service, the web service will probably be running on far fewer servers than the rest of the application, which means you'll need far fewer connections to the database. Which is important, because databases generally don't perform as well when they're servicing tons of concurrent connections.

TLDR; It's easier to tune, scale and cache your data access when it's something that happens inside of a single dedicated web service than it is when it's something that happens across many different applications using different languages and technologies

Final thoughts

Please don't come away from this thinking "Oh wow, I should always be using REST APIs to get my data" or "This idiot is trying to say we're doing it wrong because our web app talks to the database directly, but our stuff works fine!". The major point I'm trying to make is that different systems and different businesses have different requirements; In a lot of cases, putting a REST API in front of your database really doesn't make sense. It is a more complicated architecture that requires justifying that complexity. But when the complexity is warranted, there's a ton of benefits to having the REST API. Being able to weigh the different concerns and choose the right approach for your system is what makes a good engineer.

Additionally, if the REST API is getting in the way of debugging things, there's likely something wrong or missing in that picture. I don't believe having that added abstraction layer intrinsically makes debugging harder. When I work with large, n-tier systems, I like to make sure I have a distributed logging context. Perhaps when a user initiates a request, generate a GUID for that request and log the username of that user and the request they made. Then, pass that GUID on as your application talks to other systems. With proper log aggregation and indexing, you can query your entire platform for the user reporting the issue, and have visibility into all of their actions and they trickle through the system to quickly identify where things went wrong. Again, it's a more complicated architecture, so you should probably have more complicated infrastructure in place to facilitate supporting that architecture.

Sources: http://alistair.cockburn.us/Hexagonal+architecture https://github.com/brettwooldridge/HikariCP/wiki/About-Pool-Sizing

Dogs
  • 1,166
  • 7
  • 11
6

If I understand correctly what a DBAL is, then the answer is that a REST interface allows you to use any language for its clients, while a DBAL is a library that allows you to use a single language for its clients.

This, in turn, can be an advantage for a company where there are many development teams and not all of them are proficient in the same language. Allowing their software to query directly the DB would be equivalent in functionality, but as you say "better to expose a limited REST API than the full DB".

In more abstract terms, you yourself are answering the question:

So it's an extra layer of indirection that slows down the diagnostic process

... since there is this famous aphorism that states: "All problems in computer science can be solved by another level of indirection". :)

logc
  • 2,190
  • 15
  • 19
6

Just because you're inside the same company doesn't mean you should expose everything to everyone. REST APIs are a way to define a limited consumer/provider relationship between teams in a company, with a clear contract. Amazon has been a pioneer in this form of organization.

APIs also provide a layer of abstraction, allowing you to use a specific set of idioms -- you don't necessarily want to talk to your consumers in the same terms that are used in your database. You also don't necessarily want to talk to each consumer the same way.

guillaume31
  • 8,358
  • 22
  • 33
3

You are thinking that REST is for database queries and it is not. REST represents the state of something at the moment. Using REST changes or retrieves a representation but that is all. If that state becomes available by database, it doesn't matter and no one cares because HOW that representation comes to be is not part of REST and neither are database queries.

Rob
  • 709
  • 7
  • 10
  • I'm not suggesting that database queries == REST. Certainly REST is capable of being much more than a database abstraction layer but in the past two companies that I've worked for that is essentially all that it is - a database abstraction layer. It doesn't do anything *other* than translate HTTP requests to DB queries. And if that's all you're doing it seems to me that you'd be better served by DBAL. Indeed, it seems to me that the only reason some people are using REST these days is because it's trendy - not because it's the best solution out there for the task at hand. – neubert Mar 30 '15 at 03:01
  • @neubert does DBAL work directly over the internet like REST does? – Rob Mar 30 '15 at 03:30
  • Sure. You can tell MySQL to use a IP address / domain name / port that belongs to another computer on the internet. You can use SSH tunneling as well as (I believe) SSL auth as well. Presumably other DBMS's work similarly. – neubert Mar 30 '15 at 05:18
  • @neubert: in that case, the REST API *is* a DBAL, isn't it? – RemcoGerlich Jun 17 '15 at 08:33
  • 2
    @RemcoGerlich - sure, but using REST API as your DBAL may be adding a middle layer that's unnecessary and hinders the diagnosis of problems. I mean, if you're gonna use a sufficiently broad definition of DBAL then you could consider the Google SERP's to be DBAL's. You just have to parse HTML to get the paginated data from Google's servers... – neubert Aug 26 '15 at 01:48
  • Accessing table data directly from a MySQL server over the internet seems to me like a very bad idea. APIs allow for a security gateway to be established through Web Tokens or API keys that can be disabled to prevent usage by a mal-practicing client instantly. Not only that, they allow for the implementation of data access permissions. Security should always be the first priority when developing any internet facing communication no matter how "internal" the software is. Listen to an episode of *Security Now!* and you'll understand why. – Nick Bedford Nov 06 '18 at 00:07
1

DBAL is concerned only about accessing data, but in most applications, you aren't only concerned about accessing data, but also performing all the side effects of an operation.

For example, an ecommerce system. When an order comes into the system, you'd want to deliver an email to the customer for their receipt, you'd want to connect to your accounting system to update the balance sheet, you'd want to connect to the payment processor to process credit card, you'd want to submit the order details to fraud detection service to verify if a card payment is likely to become a chargeback (common in fraudulent orders), you'd want to connect to the warehouse system to dispatch the order, you'd want to connect to your own suppliers to order more inventories when your supplies are running low, you'd want to run a machine learning algorithms to provide a cross selling product recommendations, you'd want to connect to your CRM to raise a flag about orders that failed, you'd want to log the order to your analytic service for reporting and alerting, etc, etc, etc.

Most of these operations need to be done on a trusted system, i.e. on the server, and these side operations are not database related and are often too complex to develop on the DBAL layer.

Lie Ryan
  • 12,291
  • 1
  • 30
  • 41