It is reasonable to write non-functional requirements like this. But your partners are right that this isn't trivial to measure.
It is important to keep in mind how networks work. To make a REST API call you may need to perform DNS lookup, perform a TLS handshake, transmit a request, wait until the request is processed, and receive the response. All those steps take time. Aside from the processing of the server, these times largely depend on the network latency (colloquially the “ping”) and to some degree on the available bandwidth.
For the purpose of a requirement, we can now define a test scenario where those variables are fixed. E.g. we can a assume: A pretty atrocious round-trip ping of 400ms. Transmitted data has negligible size. A connection has already been established. We can then phrase a requirement such as “On a network connection with a round-trip time of no more than 400ms, the time to first byte for any request is less than 450ms”. That leaves 50ms for the server software to do its work.
Alternatively, we can assume that the timings are taken directly in front of the target server, so that the network is negligible. The response times could then be collected continuously and be displayed on a dashboard. We could phrase this requirement as “Ignoring any network effects, the server responds to any request within 50ms”.
Sometimes, exceptional circumstances occur, for example packet loss. So it is not reasonable to demand that this requirement is met for every connection, just for the majority of connections. Using an average response time is not sufficient because many requests may still experience very long response times, but they would be averaged away. Instead, it is common to use a percentile. Using the 95% or 99% percentile is common. Since this is a statistical metric you need a sampling window, for example 1 minute. We could now clarify this requirement:
Over any time window of 60 seconds, the 95th percentile response time of the server must be below 50ms. This is measurement ignores any network effects such as latency.
Note that the number of requests over that time period must be high enough that the chosen percentile is meaningful. E.g. if that time period only sees 5 requests, then the 95th percentile response time is more or less the same as the worst response time, but that metric is very sensitive to outliers and thus unsuitable. You want more than a handful of events above the chosen percentile for this metric to be meaningful. To get this you can either increase the sampling time to get more events or decrease the chosen percentile, but both actions decrease the sensitivity of this metric.
Specifying measurable response times is important because it is now possible to determine if there's degraded service or an outage. E.g. response times of 80 seconds are absolutely unacceptable for most use cases. While the service might technically be working, it does not satisfy your needs. When the service fails to meet the required response time, that is downtime that would count against your agreed upon uptime / service level agreement.