Writing 100GB data in elastic search within 5 minutes

Tags:
API
Elastic Search
Gigabyte
Hi Folks,
I need an urgent solution.
I have around 100 GB of data that is generated in multiple files (around 250-280) over a period of 5 minutes. I need this data to be inserted in elastic search (that what I'm using) which is then available via search through the API. All this has to be achieved within 5 minutes.
I am using kafka, elastic search at present, is it the good enough to handle this.
If yes how exactly should the architecture be? How many instances of ES etc?
How do I achieve the read write of so much data within a window of every 5 minutes.
Regards,
Satya
0

Answer Wiki

Thanks. We'll let you know when a new response is added.
Send me notifications when members answer or reply to this question.

Discuss This Question: 4  Replies

 
There was an error processing your information. Please try again later.
Thanks. We'll let you know when a new response is added.
Send me notifications when members answer or reply to this question.
  • TheRealRaven
    What hardware is involved? What hosts ES? (Local or cloud?) What OS is used? Describe the environment and the networking.

    If you're asking someone to architect/design a data center to handle it all, it's likely a fairly significant contract. What is the budget?
    36,340 pointsBadges:
    report
  • Subhendu Sen
    More information is needed. Is the data be changed every alternative period or these are the fixed and never changed? The best way to do store an index per data pull while 100gb is not a problem for elastic search. Whatever server you may choose, but always consider a server that can support at least 16gb ram.
    141,290 pointsBadges:
    report
  • ToddN2000
    Can you tell us more on the hardware being used including you current network setup? The task seems fairly simple but there could be bottlenecks if the hardware is not sufficient. Another possible boos would be to include the use of SSD's.
    135,305 pointsBadges:
    report
  • TheRealRaven
    @ToddN2000 : Doesn't seem particularly simple if the basic specs are anywhere close to accurate. Handling 250-280 files every five minutes with up to 100GB data volume, in a way that practically makes the data instantly searchable can't be very simple. In a single day it's asking for adding 25TB of data or potential growth of 500TB of searchable data every 20-working-days.

    If nothing else, some highly parallel processes seem called for. More than just data transfer rates, a lot of indexing and other file processing is done (create, open, close, delete), not to mention that a lot of complex queries at the very least must be simultaneously accommodated else why do the storage and indexing? Then backups/replication, etc., may need to be near-real-time and live.

    But if it's truly "urgent", it doesn't seem like a general IT forum is a good choice. A more specific vendor support forum seems far better.
    36,340 pointsBadges:
    report

Forgot Password

No problem! Submit your e-mail address below. We'll send you an e-mail containing your password.

Your password has been sent to:

To follow this tag...

There was an error processing your information. Please try again later.

Thanks! We'll email you when relevant content is added and updated.

Following

Share this item with your network: