I have around 200 million new objects coming in, and a 90 day retention policy, so that leaves me with 18 billion records to be stored in the form of key-value pairs.
Key and value both will be a string. It is basically a mapping between a unique identifier for the object in the application to the unique identifier for the object in the actual object storage.
There is an application which loads objects into a Web OS. For each object it loads, it creates a 16 character string key, say DataID. The Web OS itself creates a 40 character string key, say ObjectID. So what I'm trying to do is create a mapping between DataID -> ObjectID for 18 billion objects. I'm don't know the mechanism being used to create the IDs.
I will have to deal with:
write(key,value)
read(key)
delete(key,value)
I am looking for ideas for an optimal way to implement this. It should be optimized for reads & writes. Space optimization is secondary.
I know Hadoop/NoSQL is one way to go, and probably another solution would be distributed Hash tables, but a few more options would help me decide which is the best solution. A relational database is not an option as we don't have an existing RDBMS in the current environment.