How to structure your URIs?

Question

I am making a web UI and an HTTP API for editing JSON documents in collaboration (role and versioning system).

There are several types of JSON documents. Each type is described by a JSON schema, let us say:

schema_a, schema_b

Each user is assigned a role for editing a JSON document, among:

editor_1, editor_2, reviewer

Besides the "initial" JSON document, each revision of a JSON document is stored, and only one can be marked as "final":

initial, rev_1, rev_2, rev_3, …, final

In the web UI, the user first selects a schema (which then displays the list of documents following that schema), then a document, then a role (which then displays the list of revisions for that role), then a revision (among "initial", "rev_1", "rev_2", "rev_3", …, "final"). In this order. Then the user loads the selected revision in the editor. He works on it and eventually saves his work, which creates a new revision with the current revision number + 1. Before saving, he can mark his revision as "final", in which case the new revision is saved as "final" instead.

What is the best URI structure for this hierarchical model?

Here are the two structures that come to mind (notice the trailing slashes, denoting collection resources, as opposed to item resources):

Structure 1

In this structure, path segments are organized in a sequence of collection resource–item resource pairs:

/
/schemas/
/schemas/{schema}
/schemas/{schema}/documents/
/schemas/{schema}/documents/{document}
/schemas/{schema}/documents/{document}/roles/
/schemas/{schema}/documents/{document}/roles/{role}
/schemas/{schema}/documents/{document}/roles/{role}/revisions/
/schemas/{schema}/documents/{document}/roles/{role}/revisions/{revision}

With this structure I would allow GET on all the resources, PUT and DELETE on all the item resources and POST only on this collection resource: /schemas/{schema}/documents/{document}/roles/{role}/revisions/.

Examples. — I have omitted the headers to simplify.

Request 1:

GET /schemas/ HTTP/1.1

Response 1:

HTTP/1.1 200 OK

["/schemas/schema_a", "/schemas/schema_b"]

Request 2:

GET /schemas/schema_a HTTP/1.1

Response 2 (I use JSON Schema):

HTTP/1.1 200 OK

{
  "type": "object",
  "properties": {
    "x": {"type": "number"},
    "y": {"type": "boolean"},
    "z": {"type": "string"}
  },
"required": ["x", "y"]
}

Structure 2

In this structure, all path segments denote collection resources but the last one which denotes an item resource if the path is complete (the longest URIs):

/
/documents/
/documents/{schema}/
/documents/{schema}/{document}
/revisions/
/revisions/{schema}/
/revisions/{schema}/{document}/
/revisions/{schema}/{document}/{role}/
/revisions/{schema}/{document}/{role}/{revision}
/schemas/
/schemas/{schema}

With this structure I would allow GET on all the resources, PUT and DELETE on all the item resources and POST only on this collection resource: /revisions/{schema}/{document}/{role}/.

Examples. — I have omitted the headers to simplify.

Request 1:

GET /documents/ HTTP/1.1

Response 1:

HTTP/1.1 200 OK
            
["/documents/schema_a/", "/documents/schema_b/"]

Request 2:

GET /documents/schema_a/ HTTP/1.1

Response 2:

HTTP/1.1 200 OK

["/documents/schema_a/document_foo", "/documents/schema_a/document_bar"]

Request 3:

GET /documents/schema_a/document_foo HTTP/1.1

Response 3:

HTTP/1.1 200 OK

{
  "x": 48,
  "y": true
}

@RobertHarvey My document `{id}` have 2 components: `{schema}` and `{document}`. — Géry Ogam, May 01 '19 at 14:58
@RobertHarvey `{schema}` is globally unique (the name of the schema), `{document}` is *not* (the name of the document), and `{schema}/{document}` *is* (the identifier of the document). — Géry Ogam, May 01 '19 at 15:02
I think your scheme is more complex than it needs to be. But you haven't provided much background about your application, so it's hard to tell. — Robert Harvey, May 01 '19 at 15:04
Portions of an URL should be *organizational* in nature, i.e. you should think of them as "drawers" or "folders." A schema is not a folder; it's just a type of document. It's metadata. Revision is not a folder either; it's metadata. Role doesn't even have anything to do with the document. — Robert Harvey, May 01 '19 at 15:06
And the decision to require "two forms of identification" on a document is going to complicate your life. — Robert Harvey, May 01 '19 at 15:09
@RobertHarvey Okay, but what would be a "folder" then? Do you have an example? To me a "folder" is a *namespace*. Since 2 documents can have the same name but 2 different structures (schemas), a schema is a "folder" for them. — Géry Ogam, May 01 '19 at 15:12
OK, so all of the information in `/schemas/{schema}/documents/{document}/roles/{role}/revisions/{revision}` ... What is it used for? — Robert Harvey, May 01 '19 at 15:16
@RobertHarvey No they can't have the same `id`, that's why we need a schema *namespace* to identify them: `id_1 = schema_a/foo` and `id_2 = schema_b/foo`. — Géry Ogam, May 01 '19 at 15:17
Alright. Well, to answer your obvious question, you can organize your Url's any way you want. There isn't any right or wrong, and the ones you've chosen are as good as any other. I wouldn't design the ID's that way, but it's your call to make. — Robert Harvey, May 01 '19 at 15:18
@RobertHarvey All the information is used to identify the revision. — Géry Ogam, May 01 '19 at 15:18
@RobertHarvey Here `{revision}` is a version number (`1`, `2`, `3`, `final`), not a global identifier like a UUID or a hash. That is why I need the context, that is to say the namespace. Version numbers are unique only in a particular context (= for a specific schema, document and role here). — Géry Ogam, May 01 '19 at 15:23
Makes sense to me. Hopefully you still have a way in the system to provide a single {id} that identifies each unique schema/document/role/revision. — Robert Harvey, May 01 '19 at 15:26
@RobertHarvey Well, the single `{id}` of a revision that you're talking about is actually the concatenation of `{schema}/{document}/{role}/{revision}` (you can remove the slashes or hash it to have another form, but it's already unique so it's already an identifier). — Géry Ogam, May 01 '19 at 15:31
@RobertHarvey But my question was which of the 2 URI structures is the more appropriate? What are the advantages/drawbacks of each? — Géry Ogam, May 01 '19 at 15:32
I don't think there's enough information in your question about your application to make that determination. You seem to have already found a scheme that works for you. — Robert Harvey, May 01 '19 at 15:33
@RobertHarvey Okay, I am adding some context (the HTTP methods that I want to use). — Géry Ogam, May 01 '19 at 15:34
Based on my experience, I: A) always give each 'entity' (or 'item') a globally unique ID (typically a UUID). B) Use 'nested' entity references for lists only, e.g., the departments of a business might be `/business/123/departments/`. C) Individual entities details are always retrieved directly, never from a nested context, so to get a department, you would use `/departments/321/`. This gives you flexibility in the future if your relations change. I have never found it necessary or useful to nest more than one level deep, though that's less firm in my mind. — zanerock, May 01 '19 at 15:36
Thanks for the additional information, but I kinda already knew how the gets and the posts would play out. You're focusing on technical details; *does your URI scheme satisfy the behavior you want from your application? Does it meet your software's specific requirements?* — Robert Harvey, May 01 '19 at 15:47
Are the representations of the documents for each schema the same? Or does `schema_a` have different fields from `schema_b`? — Jeff Lambert, May 01 '19 at 17:10
@JeffLambert No, `schema_a` describes a particular document structure, while `schema_b` another one. So a document following `schema_a` has not the same fields than a document following `schema_b`, but they *might* have the same name `foo`. To identify each document: `schema_a.foo` and `schema_b.foo` (`foo` alone is a name, not an identifier). — Géry Ogam, May 01 '19 at 21:51
@zanerock Interesting, so as Robert Harvey would say, you follow the [Law of Demeter](https://en.wikipedia.org/wiki/Law_of_Demeter) (a resource should only know its immediate neighbors, not its neighbors' neighbors)? — Géry Ogam, May 01 '19 at 21:55
@Maggyero I had not heard that coining before, but yes. I found that having the resource URL reflect deeper structure might be aesthetic, but it doesn't really buy you any concrete advantage and can greatly complicate refactoring. — zanerock, May 02 '19 at 16:13
@Maggyero I was in the middle of proofreading my (real) answer and I had what I think is an important question. Are these documents which are being published being *made available* to people in particular roles or is the `role` trying to capture some security context of the publisher? If {1}, then AHA! If {2}, then...that seems like you need to provide a better description of why you are trying to put `role` and `revision` in this hierarchy in this way. From what you are describing, I'm thinking that proper content negotiation might serve your system a world of good. — K. Alan Bates, May 03 '19 at 02:19
The documents are made available to people in particular roles, so {1}. In the web UI, the user first selects a *schema* (which then displays the list of documents following that schema), then a *document*, then a *role* (which then displays the list of revisions for that role), then a *revision* (among "initial", "rev1", "rev2", "rev3", etc., "final"). In this order. — Géry Ogam, May 03 '19 at 08:50
… Then the user loads the selected revision in the editor. He works on it and eventually saves his work, which creates a new revision named with current revision number + 1. Before saving he can mark its revision as "final", in which case the new revision is saved as "final" instead. So there is a clear hierarchy, and even if there wasn't ({2}) I would still be interested in how you handle a hierarchy problem ({1}), like the classic `/artists/{artist}/albums/{album}/songs/{song}` case. — Géry Ogam, May 03 '19 at 08:57
… But since you talked about it, I am also curious about how you would have handled a non hierarchal problem like {2} with content negotiation. So if you could also briefly talk about {2} in your answer (without focussing on it too much as the real thing is {1}, so maybe as a footnote), it would be even more awesome. @K.AlanBates — Géry Ogam, May 03 '19 at 10:10
@RobertHarvey I have added the user scenario in the web UI, as well as request and response examples for the 2 URI structures, so that hopefully it's more understandable. — Géry Ogam, May 03 '19 at 10:56
@K.AlanBates I feel that you are preparing something exceptional =). — Géry Ogam, May 04 '19 at 09:44
@Maggyero Oh gosh;no pressure lol ...I was going to finish it up over lunch yesterday, but had a meeting. I will definitely make progress towards wrapping it up this morning, but I will say that I'm trying to keep it focused and concise. — K. Alan Bates, May 04 '19 at 12:36
@K.AlanBates Fantastic, I just can't wait =). By the way, here are what I wrote yesterday in the chat with @JeffLambert below about my thoughts on this hierarchical problem (but using the `/artists/{artist}/albums/{album}/songs/{song}` example): https://chat.stackexchange.com/transcript/message/50149592 I am linking this so that you can take it into account in your answer, in case it can shed some light. — Géry Ogam, May 04 '19 at 19:20
Hi @K.AlanBates! Do you plan to publish your answer? I am still very interested. — Géry Ogam, Apr 03 '20 at 08:55

Jeff Lambert · Answer 1 · 2019-05-01T18:09:11.770

5

One issue you have is I think it would be too easy to have duplicate data in your requests. If I understand your design correctly, if I wanted to create a document and a document has 3 fields: schema_id, title, and last_modified, I could make this request:

POST /documents/schema_a/
{
  schema_id: 'schema_a',
  title: 'A fancy title',
  last_modified: '2019-05-01'
}

What if instead a client made this request:

POST /documents/schema_a/
{
  schema_id: 'schema_b',
  title: 'A fancy title',
  last_modified: '2019-05-01'
}

What schema would you expect that document to be in after this request? Would the resource server raise an error, or would it just silently make a default choice? Would that choice be the same choice the client would expect it to make? If the server makes a choice, what's the purpose of having the one it didn't choose?

My suggestion is to break your URIs into 4 resources: schemas, documents, revisions, and roles. You would then have these resource URIs available for listings:

GET /schemas
GET /documents
GET /roles
GET /revisions

And these URIs available to fetch individual entities:

GET /schemas/{id}
GET /documents/{id}
GET /roles/{id}
GET /revisions/{id}

And these URIs for updating/deleting:

PUT /documents/{id}
DELETE /documents/{id}
PUT /revisions/{id}
DELETE /revisions/{id}
(... etc)

And whatever URIs you need for creating:

POST /documents
POST /roles

All of the data you're trying to put in the URI IMHO belongs either in the POST/PUT body, or as a query parameter. For instance, in your question you have this URI: /schemas/{schema}/documents/ Just looking at it, I would expect this URI to return all documents in the given schema. You can just as easily accomplish this using query parameters instead:

GET /documents?schema={schema}
GET /documents?schema={schema}&role={role}
GET /documents?role={role}&schema={schema}

The last example shows that query parameters used this way are commutative, but putting data in the path of the URI is not. This has the benefit that you can mix and match different query parameters without having to make an entirely new route into your application. Your list of routes e.g. currently cannot handle this query:

GET /revisions?schema={schema}

This organization to me is much more REST like and treats each resource equally. It also doesn't take nearly as much inside knowledge of the organizational structure to consume. I can know nothing about how documents and schemas and revisions are related, start consuming, and infer the relationships just based on the data returned. If you include linked actions in your result set data (as suggested by HATEOAS), then I don't need to infer anything at all, I can just start consuming your data and you get to tell me everything I can do with that data in the data itself.

edited May 01 '19 at 18:09

answered May 01 '19 at 18:04

Jeff Lambert

559
2
9

1

Excellent. This is what I was trying to explain in my comments but had trouble articulating. – Robert Harvey May 01 '19 at 18:26
Thanks for this thorough answer! To answer your first paragraph, no I don't allow POST requests on `/documents/{schema}/`, only on `/revisions/{schema}/{document}/{role}/` (see my answer). In my suggested URI structures, the representations of collection resources (such as `/documents/{schema}/`) consist only of links to child resources (such as `/documents/{schema}/{document}`), they have no own data. If you want to access or modify a schema you use this URI: `/schemas/{schema}`. – Géry Ogam May 01 '19 at 18:39
1

@Maggyero So to fully request an entire list of documents, you have to make 1 request to get the links, and then N requests to fetch the data for each document in the list for a total of [N+1](https://www.infoq.com/articles/N-Plus-1) requests? One problem people seem to always have when designing APIs is they attempt too much to predict how a client will use them. A client will use it as it is required to, and requirements change over time. Your API should be flexible enough to not require a ton of changes every time a requirement gets added to the client. – Jeff Lambert May 01 '19 at 18:43
Yes, but it is the same with your URI structure: `GET /documents/` return the links, as I understand it. – Géry Ogam May 01 '19 at 18:47
@Maggyero Sorry, that's not how I had understood it. I would suggest returning the actual collection of data and not just links to the data for that very reason. If the data set is potentially large, you can paginate the results on the server using either page-based or cursor-based pagination. – Jeff Lambert May 01 '19 at 18:48
Okay, why not, but I think it is an orthogonal design decision since I can do the same with my 2 URI structures (make collection resources return the child data instead of the child links). And since we are talking about that, the REST article on Wikipedia suggests to return the links (URIs) for collection resources, not the data: https://en.wikipedia.org/wiki/Representational_state_transfer#Applied_to_Web_services – Géry Ogam May 01 '19 at 18:55
@Maggyero You're right it does, but I still disagree with what that table is saying (or we're both misunderstanding what it's trying to convey). One of the biggest performance issues in REST APIs is the overhead of the HTTP request itself, in my mind it makes no sense to do something in N+1 requests that you can just as easily handle in one, _especially_ if you start having thousands of documents. Multiply that by even just a few concurrent users and you're easily flooding your API with requests. – Jeff Lambert May 01 '19 at 18:59
About query parameters, I have thought about using them, but the [URI RFC 3986](https://tools.ietf.org/html/rfc3986#section-3.3) states that the path component should be used for hierarchical data while the query component should be used for non-hierarchical data. And I thought that schema, document, role and revision are hierarchical data, since a revision is produced by a particular role which has been assigned for a particular document which follows a particular schema. In other words, each resource has a unique identifier *within the scope of its parent resource* (no global identifier). – Géry Ogam May 01 '19 at 19:17
@Maggyero Do your documents have a single primary key, or do they have composite primary keys? If you separate the data you have out of your path, it is no longer hierarchical in nature. You're filtering or _querying_ your data set, but all that depends on how you decide to look at it. I answered another question [here](https://softwareengineering.stackexchange.com/questions/368117/what-are-the-best-practice-to-manage-related-resource-when-designing-rest-api/368131#368131) that is somewhat related. There's tons of heavily used APIs out there that do that sort of thing through query parameters – Jeff Lambert May 01 '19 at 19:23
My documents have a name, which is not necessarily unique across schemas. So you can see that as a composite primary key yes (schema_name + document_name is unique). – Géry Ogam May 01 '19 at 21:35
@Maggyero My standard practice is to give everything a primary key. Everything is UUID. This has saved tons of headaches because every single piece of data is uniquely identifiable. – Jeff Lambert May 01 '19 at 21:53
"Your list of routes e.g. currently cannot handle this query: `GET /revisions?schema={schema}`" That's a great point! But does it mean one should never use hierarchical data in URIs (the path component is made for that according to the RFC on URI)? – Géry Ogam May 01 '19 at 22:12
But there is one problem with *UUID*-based URIs compared to *name*-based URIs: with name-based URIs, to get the names of the resources, let's say the schemas, I only need to send a single `GET /schemas/` request and I get the names in the response payload body (names are part of the URIs): `["/schemas/a", "/schemas/b", "/schemas/c"]`. But with UUID-based URIs, `GET /schemas/` would return (UUID are part of the URIs): `["/schemas/", "/schemas/", "/schemas/"]`, so then I would have to send 3 `GET /schemas/` requests to get the names embedded in each representation. – Géry Ogam May 01 '19 at 23:31
Let us [continue this discussion in chat](https://chat.stackexchange.com/rooms/93154/discussion-between-jeff-lambert-and-maggyero). – Jeff Lambert May 02 '19 at 16:43
@JeffLambert I have added the user scenario in the web UI, as well as request and response examples for the 2 URI structures, so that hopefully it's more understandable. – Géry Ogam May 03 '19 at 10:58

How to structure your URIs?

Structure 1

Structure 2

1 Answers1

Linked