Quantcast
Channel: MemSQL – Cloud Data Architect
Viewing all articles
Browse latest Browse all 427

Enhancing Education Applications with Real-Time Analytics

$
0
0

Feed: MemSQL Blog.
Author: Mike Boyarski.

Real-Time Insights in the Classroom

Improving the classroom experience is a complex task due to budget constraints and the pace of technology innovation. Today, most students have access to high speed internet, iPads, and Chromebooks as primary learning tools in the classroom and at home. Compute and bandwidth have raised expectations to be always-on with always-up-to-date information. This expectation is why Curriculum Associates, a company focused on making classrooms better places with innovative technology, embarked on a journey to boost its infrastructure to deliver faster on-demand analytics for its students and teachers.

The Architectural Journey for Real-Time Analytics

The current architecture was designed with a traditional batch based extract, transform, and load model. Every night, a series of batch jobs would extract data to the enterprise data warehouse to provide 24-hour updated data. To address new analytic requirements, the team would deploy specialized data marts to provide faster or more up-to-date reporting. The team wanted a new architecture that could deliver anytime analytics without impacting the operational application. This triggered the process for designing a new real-time architecture using Kafka, AWS S3, and MemSQL.

                                                                            Figure: Previous Analytics Architecture

Designing a “Human Time Channel”

Early on, the Curriculum Associates team knew they didn’t require trading system level data processing. To avoid internal semantic debates, the team agreed to call the new processing requirements the Human Time Channel (HTC) response time. The goal was to enable an HTC that could deliver query responses of roughly 600-700 milliseconds for the majority of queries. The system had to support roughly 5 million students with 600 million rows and perform a number of aggregations and row level detail reports in roughly a second.

Understanding Sharding

An early step in moving to a real-time system is data sharding. Data sharding provides a number of performance advantages for a distributed database system, such as MemSQL. The Curriculum Associates team designed the data layout to provide maximum parallel processing without creating query conflicts or redundancy to ensure each node worked as hard as possible to process the data.

Using Multiple Storage Engines

To maximize query performance, the Curriculum Associates team combined row and columnar storage to create a set of logical tables. Volatile data is kept in row store format in-memory while non-volatile data is stored in columnstore format on disk and memory. Requests for data are made using the MemSQL UNION ALL function, that presents a single query of all live and historical data.

Adding Kafka Message Queues

The HTC leverages Kafka instead of batch ETL to take changes in application data in real-time. The team uses MemSQL Pipelines to move application transaction to MemSQL. Each event is stored as a JSON data type and is split automatically into calculated fields to provide structure for analysis. As messages come into the system, calculations can be run immediately.

Integrating with an AWS S3 Data Lake

The architecture leveraged a Lambda style architecture by ensure every event that reaches MemSQL is also stored on AWS S3. Confluent based Kafka messages are converted into flat files and ultimately stored in S3. The highly available storage system serves as a long term analysis repository and provides the source data for periodic re-builds of MemSQL. The S3 environment delivers an easy way to manage data growth, currently projected at 200-250 percent year-over-year.

                                                      Figure: Real-Time Analytics Architecture

Delivering a real-time analytic solution on an existing batch based application architecture can be achieved when the requirements, data architecture, and technology tools are aligned. The result can prove impactful for end users and technology companies trying to compete on data performance. The innovative solution from Curriculum Associates now delivers faster analytics on the latest data without having to re-write or change its core application, a winning formula for customers and the engineering team.

 

Viewing all articles
Browse latest Browse all 427

Trending Articles