I have the following requirement:
I Need to implement a cloud-based solution to process files of size ranging from few hundred GBS to few hundred tbs (maybe even bigger). The solution should read such large files (already uploaded to cloud by some user), perform operations like sorting, filtering, dedup etc. and then put these files back on the cloud. I want to specifically use AWS (Amazon Web Services) for achieving above goal. Files will be present on AWS S3. I need to know which technologies/tools should I use to do this, or simply, how the above scenario can be built? This solution should be fully automated.
Also, please let me know if any existing cloud-based solutions are available which are similar to above scenario. Please feel free to contact me.
Thanks in advance,
Software/Hardware used: Amazon web services, Hadoop, Abinitio, Shell, Python
Free Guide: Managing storage for virtual environments
Complete a brief survey to get a complimentary 70-page whitepaper featuring the best methods and solutions for your virtual environment, as well as hypervisor-specific management advice from TechTarget experts. Don’t miss out on this exclusive content!