1

I am coming into a project that uses an Amazon S3 Bucket for a process.

For my application, I need to know (within a reasonable time - order of a few minutes, at most) whether a new file has beeen added to the bucket, and then download that file.

Currently, an SQS queue already serves this purpose for another application - but I believe I cannot read from that queue, that would steal the notifications the other application depends on.

The naive approach would be to set up a duplicate queue and a duplicate event notification, but that is not allowed.

The correct solution, I believe, if the constraints allowed it, would be to do fanout, as suggested by amazon, by creating a topic.

Unfortunately, I am not allowed to interfere with the current process - it must stay exactly as-is, without the topic.

So... is there another way to learn whether new files have been added to the bucket? I am considering a polling approach, if that is at all possible, but I cannot find any good answers (Every guidance I find tells me how to set up an SQS queue, which is clearly the correct solution to this problem if you don't have my current constraints)

SRNissen
  • 151
  • 5

1 Answers1

1

It's a bit challenging with the constraints that you've outlined, but there are a few possible ways to achieve this, although they might not be as efficient as having a second SQS queue.

1. Polling:

You could indeed poll the S3 bucket periodically to check for new files. This could be done using the AWS SDK, using the ListObjects or ListObjectsV2 API operation, and then compare the current list of objects with the previous list to detect any new files. Note that polling too frequently could lead to increased costs, due to the charges associated with S3 API requests, and also might not meet your requirements for notification time if the bucket has many objects (as listing them can take time).

2: Lambda to Push Notifications:

Another solution could be to create an AWS Lambda function that gets triggered when a new file is added to the S3 bucket. This function could then use an API to notify your application directly, or it could write to a second SQS queue that your application has access to. If this method is permissible, it would not interfere with the current process and SQS queue, but would still provide a way to get real-time notifications.

3. S3 Event Notifications to another Service:

If other AWS services can be used, you could create an S3 event notification that triggers an AWS EventBridge (formerly CloudWatch Events) event or sends a notification to an AWS SNS topic that your application is subscribed to.

4. S3 Access Logs:

If the other methods are not viable or allowed, another possible way is to enable S3 access logging, which will create log files in the bucket whenever there are requests for any object. You can parse these logs periodically to detect operations of new object creations. However, note that there might be a delay for the log delivery, and also parsing logs could be more complicated.

Remember to consider the additional costs and any latency involved in these methods. AWS S3 event notifications triggering SQS directly is the most efficient way, but if that's not possible due to constraints, these methods should work as alternatives.

tomasantunes
  • 183
  • 2