Terracotta's Real-Time Big Data Blog

Page 1 of 3123

May 16, 2013  4:34 PM

Boosting Hibernate Performance with In-Memory Data Management



Posted by: Karthik Lalithraj
Big Data, BigMemory, Ehcache, Hibernate, in-memory, In-memory data management, Real-time Big Data, Uncategorized

One of the great benefits of persistence frameworks like Hibernate is that they allow architects and developers to mange data in ultra-fast machine memory, or RAM. By default, a first-level cache — at the Hibernate session level — is always enabled. A second-level cache, at the session-factory tier, is optional, but can result in huge performance gains at scale. Additionally, Hibernate allows for query-level caching.

Terracotta BigMemory (Ehcache) is the default query- and second-level cache for Hibernate, and it can keep terabytes of data in memory with as few as two changes to a configuration file. Understanding how BigMemory works with Hibernate makes designing your enterprise applications much easier, so in this post I’ll share tips and best practices for using Terracotta BigMemory as a query cache and a second-level Hibernate cache.

1. How do I get started?

Documentation on using BigMemory with Hibernate is here: http://ehcache.org/documentation/user-guide/hibernate

Enabling the second-level cache or query cache requires only a single line of config in your hibernate.cfg file:

<property name=”hibernate.cache.use_second_level_cache”>true</property><property name=” hibernate.cache.use_query_cache “>true</property>

I typically use the Ehcache Singleton factory:

<property name=”hibernate.cache.region.factory_class”>

net.sf.ehcache.hibernate.SingletonEhCacheRegionFactory</property>

 

2. How do I know my query is hitting the second-level cache?

The simplest and safest way is to set “show_sql” to “true” in your Hibernate property file. When you query the database, if the SQL query prints to the console, it is probably not using your second-level cache. In addition, you can use the Terracotta Monitoring Console (provided as part of BigMemory Enterprise kit) or any Hibernate profiler (http://www.hibernatingrhinos.com/products/hprof) and look at the hits and misses against your cache.

 

3. How do I specify custom cache regions?

By default, Hibernate always points to the default ehcache.xml and the default cache region. This implies that Hibernate manages the cache regions for you.

Let’s take an example. Say you have two Hibernate objects, Account and Customer. By default, the settings of the default cache will be applied to these objects. Hibernate will create a cache with the fully qualified domain path (e.g. com.company.domain.Account with be the name of the cache)

For more control, you can also specify custom cache regions, you can do this in two different ways:

  1. Specify this as a cache region in your Hibernate cfg or using Hibernate annotations

e.g. @Cache(region = “Account”)

  1. Specify the cache region in your Hibernate domain configuration

Note that if you use query cache, Hibernate creates two caches internally for its purpose:

a)    org.hibernate.cache.StandardQueryCache

b)    org.hibernate.cache.UpdateTimestampsCache

The StandardQueryCache has the query that is executed as part of the key itself. The updateTimeStampsCache is used to track the timestamps for updates to particular tables.

Note that in case you want to cluster your query cache, you will need to specify the above as 2 separate cache regions to your ehcache.xml and cluster them using the terracotta tag.

 

4. My application is not using Hibernate, so why do I get the error java.lang.NoClassDefFoundError: org/hibernate/cache/CacheKey ?

You probably have different applications loaded by the same classloader that uses Hibernate. Separate your CacheManager config/ ehcache.xml as follows:

a) All Hibernate-related objects that require Ehcache as second-level cache should be defined in in ehcache.xml.
b) Plain old Ehcache objects will be defined in ehcache-nonHibernate.xml. Use CacheManager(“ehcache-nonHibernate.xml”).getInstance() to get a reference to this CacheManager.

 

5. How do I evict specific cache regions when I execute my hql?

Hibernate allows you to specify a synchronize tag within the class. This lets you specify the table(s) you are updating, and it will only clear the cache for the specified table(s). If you do not specify any table(s), it will clear the cache for all tables.

Here is a link and an example on the Hibernate forums on how this is accomplished:

https://forum.hibernate.org/viewtopic.php?t=959949&view=previous&sid=6032f5caaa1dea9d05c75587e400228d

<class name=”Summary”>
<subselect>
select item.name, max(bid.amount), count(*)
from item
join bid on bid.item_id = item.id
group by item.name
</subselect>
<synchronize table=”item”/>
<synchronize table=”bid”/>
<id name=”name”/>


</class>

Shows item and bid are accessed but since it is native SQL, Hibernate has no idea what is tables/entities are being touched. Synchronize informs Hibernate so it can deal with possible flush initiation, caching, etc.,  depending on how summary is being used.

 

6. How do I use Hibernate criteria, and what is the Native SQL “gotcha”?

To take advantage of the second-level cache, you will need to use Hibernate criteria. To take advantage of the query cache, you can use HQL.

If you run a stored procedure or issue an executeUpdate or execute Native SQL, there are two side effects of which you should be aware:

1. The second-level cache will not be used

2. The second-level cache will be completely purged in certain circumstances

Whenever a Query.executeUpdate() is run, for example, Hibernate invalidates affected cache regions (those corresponding to affected database tables) to ensure that no stale data is cached. This should also happen whenever stored procedures are executed.

Furthermore, if you run Native SQL through Hibernate you entire second level cache will be purged.

 

7. What about object lifecycles and read/write modes?

Hibernate likes to control the entire lifecycle of the object, from inception to destruction.

In read/write mode, Hibernate considers itself the owner of data, and tries to provide high-consistency guarantees. Without getting too technical here, let’s just say that Hibernate takes charge of all lock management and transaction management in this mode. As a result, if you need to enable rejoin/non-stop, you can only do so under non-strict read/write mode.

 

8. How do I cluster and enable BigMemory when using Hibernate?

This is simple! It takes just two lines of config to cluster using Terracotta:

  1. Specify the terracotta config url (where is Terracotta Server deployed?)
  2. Specify which cache regions need to be clustered (use the <terracotta/> tag)

You can continue to use ARC with Hibernate http://ehcache.org/documentation/2.5/arc/index

 

9. How do I cluster cache regions across multiple Hibernate session factories?

You don’t need to do anything. Just cluster the cache region as above and you will be set. Having separate Hibernate session factories should not matter. For a cache region to be clustered, it should belong to the same cacheManager-cacheRegion combination.

 

10. Can I use writebehind with Hibernate?

While you cannot configure a cacheWriter to work with Hibernate (due to the transaction semantics identified above), you can configure this using Ehcache putWithWriter and Ehcache writebehind. Use Hibernate as part of your CacheWriter interface implementation to define your persistence strategy. More documentation is here: http://ehcache.org/documentation/apis/write-through-caching

I hope these tips contribute to making your experience with Hibernate and BigMemory easier and more fruitful. If you have additional questions, please post them to the comments.

May 14, 2013  2:59 PM

Too Big to Scale? Why Financial Services can Benefit from In-Memory Data Management



Posted by: Gagan Mehra
BigMemory, Financial services, In-memory data management, Uncategorized

Financial services organizations make money by managing a variety of tradeable assets. These assets rely on data from multiple sources to make the business of managing money run efficiently. With the recent growth in data volumes and data velocity, the wheels of this financial engine are slowing down due to pressure on existing applications to a) continue to meet the performance requirements, and b) process large data volumes in real-time for the business to succeed.

These challenges can be overcome by using in-memory data management (IMDM) and integrating it with existing applications. IMDM provides several benefits that are unmatched by any other technology:

1. Ability to scale linearly with predictable performance: Most technologies support small data sets with millisecond or microsecond performance times but the performance degrades as the size of the data set grows. IMDM offers predictable performance regardless of data volume as it has been architected to scale linearly. As the data sets grow from hundreds of gigabytes to hundreds of terabytes, the IMDM setup can scale by adding more memory without worrying about scalability or performance.

2. Ability to integrate with almost any application: It does not matter if the application under pressure is related to investment banking, fraud management, settlement, compliance or something else, IMDM can integrate with any application regardless of the function or the use case. This flexibility extends to IMDM’s ability to work with several different technologies. For example – slow transactions can be offloaded from a mainframe to IMDM to make the end-to-end process run faster, IMDM can be setup as a private cloud instance to allow multiple applications to integrate with it using basic URLs or IMDM can integrate with your application, running in the same process, to give a huge boost to your application performance.

3. Short deployment timeframe: IMDM is a great technological innovation and it achieves greater heights by making it easy to deploy. It only takes a few weeks to implement IMDM in your environment.  All you need is to clearly define the use cases that are causing pain and IMDM can be setup to resolve those pain points. Before deploying the IMDM solution, it is important to keep your strategic needs in mind, mostly from a business growth perspective, to ensure the setup can support your business for the next several years.

If you are interested in learning more about the power of IMDM either leave a comment below or contact me at gagan@terracotta.org.


May 14, 2013  2:59 PM

Mainframe Offload with In-Memory Data Management



Posted by: Steve Yellenberg
, Big Data, Big Data Analytics, BigMemory, Financial services, in-memory, In-memory data management, mainframe, Manufacturing, Media, Real-time Big Data, Retail & E-commerce, Telecommunication, Uncategorized

Tech pundits sometimes talk about mainframes as if they disappeared along with leisure suits, punk rock, and the Deutschmark. But as CIOs and technology architects know all too well, mainframes are quite alive — if not altogether well — in a surprising number of today’s big enterprises. In fact, their tendency to become bottlenecks is now a hot topic, and in-memory data management is quickly becoming the solution of choice for offloading mainframe demand.

Over the last decade, as enterprises deployed more products and services through Web, mobile, and API distribution channels, the easiest way to do that was to grab data from existing mainframe services. (Sound familiar?) Unfortunately, mainframe applications for customer service, reservations, and commerce weren’t typically built to handle millions of simultaneous customers, hundreds of thousands of transactions per second, or the kind of instant access to data that’s necessary for real-time Big Data intelligence. The result is that mainframes are increasingly inhibitors of performance, costing millions of dollars for scale-outs to address spikes in demand. Let’s put it this way: If IBM sales reps are camped outside your office waiting for a purchase order for the extra MIPS you’ll need to get through the holidays, it’s time to think about another way.

BigMemory, Terracotta’s in-memory data management platform, allows enterprises to reduce  mainframe loads, deliver incredible performance, and reduce costs — without big investments in infrastructure. With BigMemory, enterprises use commodity hardware to keep up to hundreds of terabytes of mission-critical data instantly available in ultra-fast machine memory, or RAM. And many enterprises enjoy huge benefits from BigMemory with far smaller volumes. (Read about just one of our mainframe offload success stories here: Top Online Travel Service Takes Off with BigMemory).

How does it work? Offloading the mainframe with BigMemory can happen in one of four ways, all resulting in data access that is orders of magnitudes faster than directly querying a mainframe:

  • Batch offload mainframe data into BigMemory
  • Write transactions simultaneously to the mainframe and BigMemory
  • Apply results of mainframe queries to BigMemory
  • Use BigMemory as an in-memory middle tier, in front of the database, for frequently accessed data

To learn more about how Terracotta BigMemory transform your mainframe bottleneck into a source of real-time Big Data performance that delights customers and removes the need for expensive scale-outs, contact Terracotta sales.

 


May 1, 2013  3:07 PM

Eliminate Telecom Fraud using In-Memory Data Management



Posted by: Gagan Mehra
Big Data, Big Data Analytics, BigMemory, in-memory, In-memory data management, Telecommunication, Uncategorized

Fraud impacts all organizations. Most organizations are unable to fully eliminate fraud and hence have to live with it as the cost of doing business.  A recent survey estimates global telecom fraud losses at $40.1 Billion USD or approximately 1.88% of revenue. This number is based on only the known fraud transactions, adding the unknown fraud i.e. the fraud that could not be identified, could easily double the fraud losses. The good news is that these billions of dollars in fraud losses can be reduced to almost zero by using in-memory data management (IMDM) technologies.

IMDM enables real-time processing of transactions to detect fraud by maintaining all data in-memory. This results in faster detection of fraud and faster action to eliminate fraud losses. IMDM easily integrates with different applications to manage several types of telecom fraud:

1. Toll Fraud or Compromised PBX/Voicemail Systems Fraud

Compromising a PBX/Voicemail system to call toll numbers is the top fraud loss category. Within minutes operators can lose hundreds of thousands of dollars by calling toll numbers that charge $5/minute or higher. IMDM allows analyzing call detail records (CDRs) in real time using a dynamic set of rules to minimize & eliminate this type of fraud.

2. Bypass Fraud

Bypass fraud manifests many different forms but essentially involves unauthorized traffic in an operator’s network. This can be difficult to detect but IMDM makes it easier by maintaining all real-time transactions in-memory to review the source of the transactions, the destination number, the cost of the call, etc. IMDM holds streaming data in-memory to prevent a denial-of-service attack that could have led to an operator’s network becoming totally unresponsive to their customers.

3. Credit Card Fraud

Telecom organizations get hit with several different kinds of credit card fraud related to charge backs, returned checks, card holder not present, etc. IMDM enables maintaining real-time and historical transactions in-memory to quickly analyze the root cause and prevent fraud. Leading credit card organizations are already using IMDM solutions to manage their fraud.

If you are interested in learning more, please leave a comment below or email me at gagan@terracotta.org.

 


April 29, 2013  7:45 AM

Rapidly Adding In-Memory Speed with BigMemory



Posted by: Fabien Sanglier
Big Data, Big Data Analytics, BigMemory, in-memory, In-memory data management, Real-time analytics, Real-time Big Data, Uncategorized

It’s always amazing to me how quickly teams get up and running with in-memory data management using Terracotta BigMemory. The only tough part is that, because so many of our deployments create incredible competitive advantage for our clients, they like to keep them secret! So it’s really great that Mansour Raad, Senior Software Architect at ESRi, just blogged about how easy it was to build his real-time mapping proof of concept with BigMemory.

Mansour was asked to put together a “proof of concept implementation of a very fast interactive dynamic density map generation on 11 million records for a webmap application” and realized that moving all of his data into RAM was the only way to get the performance he wanted. Mansour offers a valuable step-by-step guide for anyone looking to do the same — check it out!

READ: Big Data: Terracotta BigMemory and ArcGIS Webmaps

 


April 29, 2013  7:45 AM

Rapidly Adding In-Memory Speed with BigMemory



Posted by: Fabien Sanglier
Big Data, Big Data Analytics, BigMemory, in-memory, In-memory data management, Real-time analytics, Real-time Big Data, Uncategorized

It’s always amazing to me how quickly teams get up and running with in-memory data management using Terracotta BigMemory. The only tough part is that, because so many of our deployments create incredible competitive advantage for our clients, they like to keep them secret! So it’s really great that Mansour Raad, Senior Software Architect at ESRi, just blogged about how easy it was to build his real-time mapping proof of concept with BigMemory.

Mansour was asked to put together a “proof of concept implementation of a very fast interactive dynamic density map generation on 11 million records for a webmap application” and realized that moving all of his data into RAM was the only way to get the performance he wanted. Mansour offers a valuable step-by-step guide for anyone looking to do the same — check it out!

READ: Big Data: Terracotta BigMemory and ArcGIS Webmaps

 


April 16, 2013  7:05 AM

In-Memory Data Management: Solving Telecom’s Big Data Pains



Posted by: Gagan Mehra
Analysis, Big Data, Big Data Analytics, Big-Memory-Hadoop, BigMemory, Hadoop, In-, In-memory data management, Real-time Big Data, Telecommunication, Uncategorized

Telecom is one of the leading verticals reeling from big data pains. These pains started showing after the iPhone was launched in 2007. Before the iPhone, the Telecom business mostly had customers that were on pay-per-use data plans resulting in few scalability challenges and infrequent network outages. Then came the iPhone that made the all-you-can-eat data plans popular making choked networks and outages everyday news. Over the last few years Telecoms have invested a lot to upgrade their networks to support this insatiable hunger for data, going from 3G to 4G to 4G LTE, but the back-end applications continue to run into challenges, as they were not originally designed to handle large data sets.

The solution to these challenges is In-memory data management (IMDM). IMDM integrates with existing back-end applications to help them scale, perform well, overcome big data challenges and meet customer expectations thus opening up a totally new set of possibilities for Telecoms. Here are a few areas where IMDM can provide value for Telecoms:

1. Increase Average Revenue Per User (ARPU)
ARPU is the key metric for Telecoms. IMDM helps increase ARPU by enabling real-time personalization and targeted cross-sell / up-sell offers by processing large data sets in-memory.

2. Meet Service Level Agreements (SLAs)
Back-end applications that are struggling to meet SLAs because of big data challenges can take a sigh of relief as IMDM integrates seamlessly to improve application performance and meet or even beat the defined SLAs.

3. More Self-Service
If your applications are running slow, customers are unlikely to use them to resolve their issues. Integrating IMDM with applications results in improved performance leading to greater customer adoption of the self-service functionality.

4. Reduced Operational Costs
Customer records are the most accessed data entity in a Telecom’s operations and they are, usually, maintained across multiple applications leading to inefficient use of time, money and resources. IMDM aggregates customer records in memory for faster data access. This allows various departments, like call centers, to support customers faster and reduce the overall operational costs.

5. Improved Network Management
Telecoms run large networks and a delay in reacting to an event can result in loss of revenue, loss of service or both. IMDM allows all network events to be managed in-memory enabling quick action to resolve issues faster.

6. Enable Big Data Analytics
Large data sets are not easy to analyze as every query or job can take hours. IMDM comes to the rescue by holding terabytes of data in-memory to make real-time big data analytics a reality. IMDM (like Terracotta’s BigMemory) also integrates with Hadoop clusters to make the Hadoop jobs execute faster. Faster analytics leads to faster decisions.

If you are interested in learning more, please leave a comment below or email me at gagan@terracotta.org.


April 9, 2013  5:02 PM

Showing off Terracotta In-Genius – 2013 AFCEA San Diego Plugfest Participation



Posted by: Fabien Sanglier
AFCEA, Analysis, Big Data, BigMemory, ESRI, Google, Humanitarian Assistance and Disaster Relief, In-Genius, in-memory, In-memory data management, JackBe Presto, Nokia, Real-time Big Data, San Diego State University, SimTable, Software AG, Uncategorized, webMethods

A couple of weeks ago (end of January 2013), 2 colleagues and I participated (under our company banners, Terracotta and Software AG) in a government “plugfest”…and we won first place! Check out this other article in CTO Vision that also talks about our win - and if you’re in a rush, jump directly to the short 3 minute video below.

What is that, you may ask? As explained on the AFCEA website, a plugfest is a “collaborative competitive” challenge where industry vendors, academic, and government teams work towards solving a specific set of “challenges” strictly using the RI2P industrial best practices (agile, open standard, SOA, cloud, etc.) for enterprise information system development and deployment.

The idea is to “plug” technologies together (technologies provided by the various players, not necessarily within your team) as opposed to rebuild everything from scratch. And indeed, “plugging” is almost mandatory since the scenario is only announced 24 hours before the event, giving the teams a mere 72 hours to create something based on the scenario provided.

Overall, it’s the government effort to encourage/push for more interoperability and reuse of IT components across projects and/or even agencies.

This particular January 2013 plugfest was about solving a Humanitarian Assistance and Disaster Relief (HADR) use case problem where technology:
  • Helps track in real-time what’s happening on the ground (data streams about hazardous materials, first responders, sensors, injured civilians, etc…) and report it in an actionable, geospatially-enabled, format
  • Provides real-time decision support based on pre-defined emergency protocols
  • Correlates various “BigData” streams (sensors, social feeds, etc…) to perform real-time analytics in order to predict movements and/or identify “flash mobs” / criminal hotspots taking advantage of the confusion.

The end result of what we put together was a real-time “map” dashboard that shows everything that’s happening on the ground, and provide contextual highlights to help decision support.

Here is a short 3 minute video showing the nuts and bolts of that demo:

What you particularly see in the video demo:
  • Moving actors on the disaster zone (first responders, plumes of toxicity, drones, etc…). Each of these actors are “broadcasting” their current geolocations (lat, long + metadata) at various time interval using the nirvana universal messaging (1000s of message per second)
  • Terracotta’s Complex event processing (CEP) engine performing continuous “geo” queries identifying in real-time the distance, speed and direction of the hazardous plumes in comparison to the various red-cross shelters on the map. The CEP engine automatically generates alerts if hazardous plumes are indeed forecasted to impact shelters…providing critical decision support to the commander in charge.
  • All events and metadata are stored in-memory, using Terracotta BigMemory for faster, micro-second access and analytics.
  • The ability to drill into the moving drones, planes and responders to see a ground view in real-time.
  • A triaged based causality tracking and available blood supply.
  • Availability of shelters, red cross centers, blood banks, and other supporting organizations(DOD types).
List of what we “plugged”:

Thanks for everyone who organized this event. It’s been a blast to participate as part of Terracotta team, and I’m looking forward to participating in the next “plugfest” event!


April 2, 2013  4:22 PM

Hadoop + BigMemory: Run, Elephant, Run!



Posted by: Gagan Mehra
Analysis, Big Data, Big-Memory-Hadoop, BigData, BigMemory, Financial services, Hadoop, in-memory, In-memory data management, Real-time analytics, Real-time Big Data, Terracotta

When many of us hear the term Big Data, we think Hadoop. That’s only natural as Hadoop has helped countless organizations overcome huge Big Data challenges, at relatively low cost.  No surprise, therefore, that analysts expect global demand for Hadoop to grow to $13.95 billion by 2017.

That said, Hadoop isn’t a complete answer to Big Data. While Hadoop is great for batch processing and storage of very large data sets, it can take hours to produce results. Then, once you gather insights from Hadoop, it can take even longer to share those insights with your enterprise apps. Every second your apps can’t see the latest insights is time that your Hadoop-derived intelligence could be delivering value, but isn’t.

To address this challenge, Terracotta recently announced the BigMemory-Hadoop Connector, a game-changing solution that lets Hadoop jobs write data directly into BigMemory, Terracotta’s in-memory data management platform. This enables downstream applications to get instant access to Hadoop results by reading from BigMemory. Hadoop jobs also execute faster, as they can now write to memory instead of disk (HDFS). The result can be a significant boost in competitive advantage and enterprise profitability.

(For those familiar with Hadoop: the BigMemory-Hadoop Connector also lets you read streaming output from Hadoop, allowing apps to get Hadoop results even faster.)

If you run a Hadoop project and you’d like to see what the BigMemory-Hadoop Connector can do, download our early access version at:  http://www.terracotta.org/downloads/hadoop-connector

Of course, if you’d like to learn more about how BigMemory can help your organization make the most of Hadoop, contact me directly at gagan@terracotta.org, or post in the comments.


April 1, 2013  3:59 PM

3 Maxims for Customer Delight



Posted by: Gagan Mehra
Big Data, BigData, BigMemory, Customer Loyalty, Ecommerce, Experida, In-memory data management, Nordstrom, Real-time analytics, Real-time Big Data, Retail & E-commerce, ROI, Uncategorized, Zappos

These are challenging times for online retailers. Customer loyalty is short lived and just one unpleasant experience can lead to losing the customer for life. Customers not only expect a great experience across all touchpoints but also demand the best price and an accurate understanding of their needs.

Get this mix right, though, and loyalty pays huge dividends in increasing the lifetime value of a customer, lower acquisition cost through referrals and a positive brand association that creates new business opportunities.

Online retailers, like Zappos, Nordstrom and Expedia, are leading examples of a customer-focused culture that has helped them attract & retain customers and further grow their business via word-of-mouth marketing. The best way to enrich customer relationships is by making customers feel special in every interaction they have with your company. This can be accomplished by following these three maxims for customer delight:

1. Personalize the shopping experience
This includes not only personalizing the products offered to customers but also specially tailored content and promotions. Research shows that personalization can deliver five to eight times the ROI on marketing spend and lift sales 10% or more.

2. Acknowledge and reward customer loyalty
Send customers thank you notes with special offers to let them know that you appreciate their loyalty. This will drive repeat purchases, reduce customer churn, grow word-of-mouth marketing and increase customer lifetime value.

3. Invest in the right technology solutions
In-Memory Data Management (IMDM) solutions are the perfect fit to add velocity to your customer experience while enabling personalization, loyalty based rewards, fraud management and predictive customer analytics, all in real-time. Choose technology that can seamlessly integrate with your existing environment and make the job of enriching your customer relationships a lot easier.

If you are interested in learning more, please email me at gagan@terracotta.org or leave a comment below.

Learn more about Terracotta BigMemory.


Page 1 of 3123