1

I read a lot about using ongoing numbers vs UUIDs as the primary key and had an idea how it might be possible to combine both and profit from their advantages, without their disadvantages.

The table would have an id column that is auto-incremented and another column that contains a random string [A-Za-z0-9].

A blog article URL would look like this:

https://myexampleblog.com/article/236424.As5df8

This way the number is used for lookup and the string is used to check if it is someone trying to randomly guess URLs by incrementing the id.

Is this worth implementing, are there any flaws that I am missing?

Mointy
  • 119
  • 4
  • 3
    I'm a bit confused about your goals. The main value of UUIDs is that they can be assigned independently without a central database. The potential problem with auto-increment IDs is that they are predictable. Without further checks, this can lead to unauthorized Direct Object Reference vulnerabilities. It can also leak the total number of items per the German Tank Problem. In many contexts like here on Stack Exchange, those problems aren't relevant. If you want to get rid of the information leak, use sufficiently long random IDs. To ensure authorization, require users to log in. – amon Apr 10 '21 at 08:14
  • @amon Using UUIDs would make it impossible to access an URL that hasn't explicitly shared with you because of the randomness, but it is way slower than auto-increment IDs. In my way, I still have IDs for lookup and enough randomness to prevent URL guessing. But you are right, it still leaks the total number of items. – Mointy Apr 10 '21 at 13:01
  • 2
    Yes, UUID-4 (fully random) may be suitable for a [capability-based security model](https://en.wikipedia.org/wiki/Capability-based_security) where having the link indicates your right to access the resource. Speed doesn't matter in many applications. Of course a 128-bit UUID is larger than a database ID (often 31 to 64 bits), but UUID-4 can be indexed by the database just as well and most blogs don't have enough IDs for concerns of scale to apply. So not really slower in any meaningful way. – amon Apr 10 '21 at 13:12

3 Answers3

1

The main flaw is complexity. The first concern should be what you are protecting by making part of the URL random. If it's confidential or otherwise valuable information you want a whole lot of randomness in there, so that it's completely infeasible, even with a large bot net, to guess even a single one. The second concern is why the order of the URLs is relevant. For a site like Stack Overflow, I believe the reason for combining IDs with names is to avoid collisions while making it trivial to rename URLs - the unique ID has nothing to do with hiding something.

I don't see why your solution achieves the advantages of both without either of their disadvantages, and complexity is a significant disadvantage.

l0b0
  • 11,014
  • 2
  • 43
  • 47
  • My idea was not to use UUIDs because of performance while preventing anyone to guess an URL that hasn't explicitly shared with that person. I agree, it is not long enough to be completely infeasible and still has the disadvantage of using auto-increment IDs. I guess using auto-increment IDs is the way to go for my project. – Mointy Apr 10 '21 at 13:41
  • 1
    @Mointy: not using UUIDs for performance reasons is premature optimization. Unless you are creating ten's of thousands of IDs per second, performance is a non-factor. And even then, how much does some extra hardware cost compared to paying an engineer to essentially rebuild UUIDs from scratch? – Greg Burghardt Apr 11 '21 at 12:57
1

TL;DR: Depends on what you want to achieve

I read a lot about using ongoing numbers vs UUID's as the primary key and had an idea how it might be possible to combine both and profit from their advantages, without their disadvantages.

Have you also read, why people use UUID's as external ids? Stating the goals of this would greatly help answering this question:

Is this worth implementing, are there any flaws that I am missing?

I can see two reasons:
a.) Make it impossible access an URL that hasn't explicitly shared with you. For example if you have are image hoster that allows you to share short URLs in a chat, you don't want people to be able to guess the URL and allow anyone to see the pictures people privately share. There have been cases where this had been done wrong, and people's nudes where accessible.

b.) If you have are a business you also want to avoid leading information. Let's say you are a candy store and you have incrementing ids for your users, orders and types of candies. If you expose those ids externally an attacker can easily get the number of users, types of candies and orders by looking where you have valid ids, or even check the total number of orders every day and see what you sold each day.

You solution will mostly partially with situation a.) but not really address b.: You might make it harder to iterate, but even with knowing a single order number from your own booking, you can get information.

When I say that partially, I mean that any mechanism that relies on secrets in a url isn't 100% since a lot of people have browsers and web analytics tools that treat URLs as public knowledge and are happy to share it with the world:
https://www.fullcontact.com/blog/2016/04/29/never-put-secrets-urls-query-parameters/ https://neilmadden.blog/2019/01/16/can-you-ever-safely-include-credentials-in-a-url/

If you really want to keep your URL hidden from people you didn't explicit allow to see your URL you should find a better way to do authorization.

Going back to the use case you mentioned:

A blog article URL would look like this:

https://myexampleblog.com/article/236424.As5df8

I don't really see why you would want to keep your blog post a seceret anyways. Most bloggers like their URL to be indexed and available to as many people possible. You can still add a paywall if you want to monetize your content.

What you commonly see is URLs that have an ID followed by a human readable name:

Better than ongoing integer and uuid as primary key

But this is usually done for SEO reasons (not sure whether it still matters to google) and leaving out the last part will often still give you the same page.

Helena
  • 797
  • 5
  • 10
  • The blog was just an example, but you are still right. I know that my solution still leaks information, but it should help in situation a). But I guess it would still be bypassed by the smallest botnet as long as it is not a complete UUID. – Mointy Apr 10 '21 at 13:20
  • You can avoid the botnet issue by throttling by id. – Helena Apr 10 '21 at 13:58
0

I agree entirely with your approach.

Internally within the database, auto-incrementing integers work just fine.

But ...

Externally-visible tokens should be UUIDs, or something similar. (Your random-string is fine.) It's a so-called "moniker," or "handle." These values should be stored in a separate field with a UNIQUE index. In true "third normal form" parlance, they should contain no embedded information, and they should be "un-guessable." A purely-arbitrary, intrinsically meaningless string.

External [URL...] references should always reference this field. You should never publish "actual database primary keys" to the outside world.

Mike Robinson
  • 1,765
  • 4
  • 10
  • 1
    What about this URL? https://softwareengineering.stackexchange.com/a/425298 - Does it really matter that it contains an ID which is almost certainly the exact same ID that StackExchange uses internally for storing this post in their DB? Does it matter that anyone can enumerate this field in the URL to find other questions? – Ben Cottrell Apr 10 '21 at 00:48
  • @BenCottrell only really ... people would do that - trying to *guess* what items exist when this information completely public in sitemap... not really practical. And it does not really matter how external party constructs urls for DOS... The case OP brought up is totally different - they want to hide content rather than push it out. (Whether post ID is actually PK - open question. I don't see why it could have been one at some point... or forever) – Alexei Levenkov Apr 10 '21 at 02:05