Quantcast
Channel: MemSQL – Cloud Data Architect
Viewing all articles
Browse latest Browse all 427

MemSQL Live: Nikita Shamgunov on the Data Engineering Podcast

$
0
0

Feed: MemSQL Blog.
Author: Floyd Smith.

MemSQL CEO Nikita Shamgunov was featured recently on the Data Engineering Podcast with host Tobias Macey. Tobias drew out Nikita on MemSQL’s origins, our exciting present, and where we’re going in the future. We share a few highlights here, but we also urge you to listen to the full episode.

The podcast covered a wide variety of topics. In this blog post, we’ll highlight Nikita’s comments on MemSQL’s origins; setting up and running the MemSQL database; typical use cases; and the future of MemSQL.

How MemSQL Started

Nikita grew up and was educated in Russia, where he earned his PhD in computer science and became involved in programming competitions, winning several. (Several MemSQL engineers have shared the same passion.) Shortly after graduation, Nikita was hired by Microsoft. He flew to Seattle to join the SQL Server team and spent five years there.

Nikita describes his early days as a “trial by fire,” followed by a “death march”: testing and debugging SQL Server 2005, five years in the making. The SQL Server effort was a “very, very good training ground.” It was also where the seeds of MemSQL were planted.

Internally, Microsoft worked on a scalable transactional database – but never shipped it. Nikita and Microsoft colleague Adam Prout, who became the chief architect and co-founder, with Nikita, of MemSQL, found that “incredibly frustrating”.

After Microsoft, Nikita joined Facebook, where “it became very apparent to me what the power and the value of such a system is.” So Nikita co-founded MemSQL, and the team built the first widely usable distributed transactional system.

Early in MemSQL’s history, most of the workloads that fit the new system were in analytics on live data. MemSQL can handle “a high volume of inserts, updates, deletes hitting the (transactional) system, but at the same time actually drive insights, drive analytics”.

But, notes Nikita, “the amount of technology that goes into a product like this is absolutely enormous.” That meant it would take time to complete the full vision: a “fully scalable, elastic transactional system that can support transactions and analytics in one product.”

So MemSQL started out as an in-memory system. This “allowed us to onboard various high value workloads” and start building a business. Later, MemSQL added disk-based columnstore capability, making it a very flexible, memory-led database.

Note: The photograph below comes from a talk by Nikita at the SQL NYC group in 2017. The talk is available on YouTube.

Nikita Shamgunov describes how a distributed and scalable converged database can optimize transactional and analytical workloads.
MemSQL CEO Nikita Shamgunov speaks on
database convergence at SQL NYC, 2017

Setting Up and Running MemSQL

Tobias asks Nikita about setting up MemSQL. Nikita replies: “Setting up and deploying is trivial. Once you commit to a certain size of the cluster, it’s basically a push-button experience, and then MemSQL will be running on that cluster.”

The main obstacle to implementing MemSQL is “understanding the distributed nature of the system.” However, “once you realize the value, then not only do we have passionate fans in the enterprise – they tend to get promoted too.”

One key innovation is MemSQL Pipelines. “If you have data coming from Kafka or you have data in S3 or HDFS, you can literally type one command, and this data is flowing into MemSQL in real-time with arbitrary scale.”

With the addition of Pipelines to Stored Procedures, you can add much more processing, running at high speed. With Pipelines, says Nikita, “We pulled in data ingest as a first-class citizen into the engine… We have customers that are moving data into MemSQL tables at multiple gigabytes a second.”

MemSQL is notable for combining scalability and a very high level of performance – which were, for a time, only available from NoSQL databases – with ANSI SQL support. “It gives the user the ability to have as much structure as the user needs and wants, but at the same time it doesn’t sacrifice performance.” The advantages include “compatibility with BI tools… you can easily craft visualizations, and it simplifies building apps as well.”

Typical Use Cases

Tobias asks Nikita about typical use cases for MemSQL. Nikita replies in detail: “At the very, very high level, we support general purpose database workloads that requires scale. We give you unlimited storage and unlimited compute and in one product. In an elegant way, we can scale your applications.”

Nikita cites real-time dashboards, Web 2.0 and mobile applications, portfolio analysis, portfolio management, and real-time financial reporting as some specific applications that need “to deliver a great user experience and lower latencies,” thus requiring nearly limitless compute capability – that is, a fully scalable system.

For data warehousing, with MemSQL, “you don’t need to extract your data and put it into the data warehouse; you do it on the data that sits in the database. So the advantage is really simplicity.” There are also “use cases by industry.” MemSQL started out being “most successful in financial services and media,” with “health care coming more and more into the platform.”

The Future of MemSQL

Tobias presses Nikita on the need for MemSQL developers to choose row-store, in-memory tables vs. column-store, disk-based tables for different data. In the podcast, Nikita explains the trade-offs in detail, then goes further. “In the future, I believe databases will get to a place where there are no choices, you just create tables. You have a cloud service, you have a SQL API to it, and that SQL API has endless capacity. You pay for better latencies and better throughputs.”

Nikita continues, “On the way towards that vision, we’re investing in managed services and Kubernetes integration. We are constantly strengthening the engine itself, making it more elastic, making it more robust, having the ability to run on multiple data centers, and improving the developer experience.”

Also, “AI and ML workloads are rising exponentially, but they need to store data somewhere, and there is no clear choice for that just yet. That’s an exciting world, and we are absolutely going to be part of that.”

Conclusion

While these highlights are intriguing, there’s much more technical depth and industry-related breadth in the full podcast episode. We encourage you to listen to the episode and subscribe to the Data Engineering Podcast, which is available on iTunes, Downcast, Instacast, Podgrasp, and other platforms.


Viewing all articles
Browse latest Browse all 427

Trending Articles