Feed: SingleStore Blog.
Author: .
What are real-time analytics? The industry is still coming to terms with a standard name for this use case. It’s sometimes called operational data warehousing, or operational analytics — and more recently, folks have started to use the term analytical applications.
In my view, these are all different names for very similar use cases. They all describe applications that need to run low latency analytical queries over fresh data. The most common pattern is an application designed to enable reporting, alerting or taking an action based on analysis of a live stream of data (click streams, application telemetry, data from IoT devices, equity or market data, logistical data, etc.). The workload combines reasonably complex analytic query shapes with the latency, concurrency and data freshness requirements more common in OLTP or operational applications.
At a high level, real-time analytics take an analytical workload (duh?) — so reasonably complex queries over reasonably large datasets (100s of GBs to 10s of TBs) — and applies some or all of these additional operational requirements to it:
-
Low latency streaming data ingestion. Data is loaded and queryable within seconds of generation.
-
Higher query concurrency requirements than a typical data warehousing workload (1000s of analytical queries a second). The apps are often user facing, and can experience spikes in usage.
-
Low query latency requirements or strict latency SLAs (100s of ms or less is common) to be interactive.
The query workloads are also typically more precise than classical data warehousing. They often analyze or aggregate data for one (or a small number) of users or objects, instead of showing aggregate information over a large portion of the data set.
To make this discussion more concrete, here are some examples of real-time analytics among SingleStore’s customers:
-
Energy & Utilities: Analysis of sensor data from oil wells to detect maintenance issues early, guide the drilling process and do profitability analysis. A similar use case applies to smart power meter telemetry for an electrical company.
-
IoT & Telematics: Analysis of cell tower telemetry for a large cell phone carrier to detect phone call quality issues as early as possible.
-
Gaming & Media: Behavioral analysis on the click traffic from web games or streaming video services to optimize end-user experience (like providing more personalized recommendations) and monitor quality of service.
-
Marketing & Adtech: Market segmentation and ad targeting based on application telemetry, geospatial data and clickstream data from various sources.
-
Retail & eCommerce: Low latency dashboards or `fastboards` to provide a live, 360-degree view of key company metrics
-
Fintech: Low latency stock portfolio analytics based on fresh market data for a large financial instiution’s high net worth customers.
-
FinServ: Credit card fraud detection over a stream of purchase data and other telemetry.
-
Cybersecurity: Security threat detection and analysis over device telemetry data
These applications all involve analytical query shapes and yet, they’re use cases that traditional data warehouses do not handle very well. These workloads have technical requirements that set them apart from other analytics use cases:
-
Low latency streaming data ingestion. Data should be ingested continuously as it is generated, and be immediately indexed and queryable. Batch data loading is not good enough.
-
Flexible indexing to enable low latency data access in a variety of scenarios (selective queries, full-text search queries, geospatial queries, etc.).
-
Good support for complex queries via ANSI SQL that matches top-tier data warehouses at data sizes in the 100s of GB, to 10s of TBs.
-
Separation of storage and compute for improved elasticity and lower costs. Applications don’t need to give up elasticity to get low latency ingest and query capabilities.
-
Strong high availability support to keep applications online in the face of hardware failures as well as when doing management operations such as database upgrades and schema changes

Now, let’s look at important capabilities in SingleStoreDB for real-time analytics in more detail.
SingleStoreDB also supports low latency and high-throughput SQL inserts/update/delete/load data statements so applications can import data in a manner that suits their specific needs. This feature may not seem that exceptional for a single box SQL database (like MySQL or Postgres), but data warehouses don’t have good support for these types of small writes — specifically for updates, upserts and deletes. All data warehouses today are columnstores and supporting point write queries against a columnstore is challenging because columnstores keep many rows tightly compressed together, making it slow and inefficient to modify only a few rows at a time. This is why SingleStoreDB employs a log structured merge (LSM) tree design with support for secondary indexes to enable efficient point write queries, including upserts (on duplicate key updates) over data in columnstore layout. There are more details on SingleStoreDB’s storage layout in the next section.
Primary or unique keys |
Equality queries (col=value), Upserts (ON DUPLICATE KEY UPDATE) |
Secondary keys (inverted indexes) |
Equality queries (col=value) |
In-memory secondary keys (lockfree skiplists) |
Range queries (col > value) over in-memory data only |
Sort keys |
Range queries (col > value), order bys, group bys and joins (merge joins). |
Full-text keys |
Full-text filters : MATCH (column) AGAINST (“term”) |
Geospatial keys |
Point in polygon, nearest neighbor and other geospatial queries |
Shard keys |
Equality queries and push down joins and group bys (joins and groups on the shard key avoid the data movement of a reshuffle or broadcast |
Zone maps (Min/Max indexing) |
Range queries (col > value) Created on every column by default |
I won’t go into each of these indexing features in detail, but I do want to call out a few important capabilities that other analytical systems often don’t support:
-
Shard keys: Joins and group-bys on the shard key columns can be executed without data movement (no need for a reshuffle or a broadcast). This is very important for high concurrency workloads. Shard keys also enable read queries that filter on the shard key to run on a subset of the nodes in the cluster (only those that own the shards the query is filtering for).
-
Unique keys for easy deduplication: Most analytical databases don’t support unique keys that can do efficient row-level locking over terabytes of data. Unique keys enable high concurrency deduplication on data loading.
Use of a blob store for separated storage is often not a strict requirement for real-time analytics. The data sizes involved are typically small enough to use local disks in a classic, shared-nothing design that ties compute and storage. That said, the core requirements of the use case for low latency streaming ingest can still be accomplished by making proper use of the cloud storage hierarchy. Hot data should be kept in memory, cooler data on local disks and cold data in blob storage as SingleStoreDB does. There is no reason to give up the benefits of separated storage when doing real-time analytics.
5. Stronger availability requirements than typical data warehouses. Real-time analytics use cases often have availability requirements closer to those of operational applications than of a data warehouse. These use cases are not typically a “system of record,” but they are part of the serving layer for an application and are often customer facing so downtime can’t be tolerated. This includes no downtime for operational and maintenance work, and is why SingleStoreDB has features including online upgrades, online cluster expansion/shrink and online schema changes. SingleStoreDB has a robust set of availability features for cross-AZ and cross-region high availability, as well as support for full and incremental backups and point-in-time recovery (PITR) to ensure customers can meet their availability and durability goals.
Hopefully this quick tour through SingleStore features gave you a good intuition as to why so many companies have adopted SingleStoreDB for their analytical applications or real-time analytics use cases. We believe we are likely among the market leaders (as far as revenue share) for this small — but growing — market segment. Even still, we continue to add new features and improvements to make doing real analytics on SingleStoreDB easier and more efficient.