I have the following requirement:
I Need to implement a cloud-based solution to process files of size ranging from few hundred GBS to few hundred tbs (maybe even bigger). The solution should read such large files (already uploaded to cloud by some user), perform operations like sorting, filtering, dedup etc. and then put these files back on the cloud. I want to specifically use AWS (Amazon Web Services) for achieving above goal. Files will be present on AWS S3. I need to know which technologies/tools should I use to do this, or simply, how the above scenario can be built? This solution should be fully automated.
Also, please let me know if any existing cloud-based solutions are available which are similar to above scenario. Please feel free to contact me.
Thanks in advance,
Software/Hardware used: Amazon web services, Hadoop, Abinitio, Shell, Python