0

Moved

I originally posted this on SoftwareEngineering because that's where this related question was; but having looked into the Help in detail, I think my question is more on-topic for stackoverflow, and I'm moving it there. Will probably delete this question once the move is complete. Sorry I can't move the comments, which have been helpful, along with the question.

Original question

I'm trying to modify a SQLite query (in Android) to return its results in pseudorandom order. As in this question, the order needs to be stable over time (e.g. paging, screen rotation, etc.), so I can't just use ORDER BY RANDOM(). Instead I want to use a hash function that depends on a couple of input values that provide stability and sufficient uniqueness. One of these values is ROWID of the table, which is a set of integers fairly close together; the other value is more like a session ID, that remains invariant within this query.

According to this well-researched answer, FNV-1 and FNV-1a are simple hash functions with few collisions and good distribution. But as simple as they are, FNV-1 and FNV-1a both involve XOR operations, as well as looping over the bytes of input.

Looping within each row of a query is pretty awkward. One could fake it by unrolling the loop, especially if only a few bytes are involved. I could make do with two bytes, combining LSBs from the two input values (val1 & 255 and val2 & 255).

XOR isn't supported directly in SQLite. I understand A ^ B can be implemented as (A | B) - (A & B). But the repetition of values, combined with the unrolling of the loop, starts to get unwieldy. Could I just use + (ignoring overflow) instead of XOR? I don't need very high quality randomness. The order just needs to look random to a casual observer over small-integer scales.

So I'm wondering if anyone has already implemented such a thing. Given how widely used this hash function is, it seems like there would likely already be an implementation for this situation.

Here's my attempt at implementing FNV-1a:

SELECT ..... ORDER BY (((fnvbasis + val1 & 255) * fnvprime) + val2 & 255) * fnvprime;

I'm ignoring the fact that in FNV, the XOR operation (which I've replaced with +) is only supposed to affect the lowest 8 bits of the hash value. I'm also ignoring any overflow (which I hope just means the upper bits that I don't care about are lost).

For fnvbasis I'll use 16777619, and for fnvprime I'll use 2166136261. These are the specified values for 32 bit input, since I don't see a specified value for 16 bit input.

So is this a reasonable way to approximate FNV-1a in a SQLite query? Is there a better, existing implementation? I.e. will it actually produce an ordering that looks pretty random to a casual user, despite my mutilating the operations of the real FNV-1a?

LarsH
  • 154
  • 2
  • 9
  • 3
    Downvoter, please comment on how the question could be improved. – LarsH Mar 20 '20 at 21:19
  • 3
    (Disclaimer: I am **not** the downvoter and I **don't** know why they did.) (1) `ROWID` can change (for various operations other than CRUD). It is better to use some other columns as input to your hash function. (2) It is easier for your application to compute the FNV1a using the input data and include it as part of the insert/update query. (3) The "extensible" way for SQLite to compute a function that it doesn't natively support would require calling into C. See [sqlite3_create_function](https://sqlite.org/c3ref/create_function.html) – rwong Mar 20 '20 at 21:23
  • @rwong: Thanks for your comments. (1) Can you elaborate on how ROWID can change? (A link is fine.) That will help me evaluate whether it can change at a time that could cause a problem for this use case. I do have another ID I could use, but since it's a string, I would probably have to hash it. – LarsH Mar 20 '20 at 21:31
  • 2
    Wouldn't it be simpler to add a column and assign a random number to it on insert. Then pull back the results in that order? No need for a hash here. – GrandmasterB Mar 20 '20 at 21:32
  • (2) I could do that, but due to the structure of the db it may take a lot longer because there are far more rows to deal with than in the query above. (But it would only have to be done at the beginning of each session...) – LarsH Mar 20 '20 at 21:33
  • (3) I'm not sure if creating a sqlite function in C is even possible on Android; at the least, it opens up a whole new non-trivial development environment and build process. – LarsH Mar 20 '20 at 21:34
  • @GrandmasterB: It could be done, but I would like the order to change for each session. So this would have to be done again at the beginning of each session. See (2) above. – LarsH Mar 20 '20 at 21:36
  • 2
    @LarsH Is Sqlite being used as a temporary ("session") database that will be destroyed and re-created often? If so, I suppose the ROWID change wouldn't be a problem, since it's going to change anyway. On Android it will require compiling Sqlite with NDK (Android Native Development Kit). Basically, if you can do the hashing **outside** Sqlite, it will be easier to implement. – rwong Mar 20 '20 at 21:54
  • @rwong No, the database won't be destroyed and re-created. But I could add an integer primary key autoincrement column to the table and use that instead of using ROWID directly. Regarding NDK, it adds a lot of complexity to the development process (and I don't know whether even then it would allow me to add new functions in SQLite -- see e.g. https://stackoverflow.com/questions/6716822/create-user-defined-function-in-sqlite-android). That's why this question asks how to implement the hash function directly in SQLite. – LarsH Mar 20 '20 at 22:05
  • Actually, based on both of your suggestions, I could precompute the first part of the hash, `(fnvbasis + ROWID & 255) * fnvprime`, or even `(fnvbasis + RANDOM() & 255) * fnvprime`, as column `fnv1a1` for each row. This value would never have to be changed. Then when I need to sort pseudorandomly for the current session, `ORDER BY (fnv1a1 + sessionId & 255) * fnvprime`. That would eliminate half the performance cost of FNV-1a per session. – LarsH Mar 21 '20 at 01:09
  • I originally posted this on SoftwareEngineering because that's where https://softwareengineering.stackexchange.com/a/145633/8465 was; but having looked into the Help in detail, I think my question is more on-topic for stackoverflow, and I'm moving it there. Will probably delete this question once the move is complete. Sorry I can't move the comments, which have been helpful, along with the question. – LarsH Mar 23 '20 at 13:06

1 Answers1

1

I would do this: "calculate the pseudo-random order separately, and store it separately." Now you can JOIN by that table of randomly-ordered keys and thus obtain the selected sequence any number of times. The order will remain stable until you change it by recalculating the content of the randomization-table.

You could also in this way easily have an arbitrary number of separate generations of the random ordering, just by adding a new column to the table.

Mike Robinson
  • 1,765
  • 4
  • 10
  • Thanks for the suggestion. This seems similar to rwong's and GrandmasterB's suggestions to precompute and store the order value, except that you store the value "separately", by which you seem to mean that it's in a separate table. I don't see what you gain by putting it in a separate table and then having to do a join. Maybe can you elaborate? – LarsH Mar 23 '20 at 15:06
  • P.S. I had a comment but I've moved it to the top of the question: I've moved this question to https://stackoverflow.com/q/60816104/423105 – LarsH Mar 23 '20 at 15:10
  • @LarsH since you want to do this per session (which I assume means each time the app is loaded), you may be able to make that 2nd table a faster in-memory one that gets wiped when the app closes. – GrandmasterB Mar 23 '20 at 16:12
  • @GrandmasterB In the case of this app, a session means each time the user (re)takes a quiz. This can happen multiple times during a single app run. – LarsH Mar 23 '20 at 17:26