MemSQL – Cloud Data Architect

Feed: SingleStore Blog.
Author: .

Veteran SingleStoreDB users take our elasticity for granted. SingleStoreDB is a true distributed database. It’s easy to scale up, so you can handle increasing demands to process queries concurrently — for both transactions and analytics.

Many application developers are still using single-box database systems and sharding data manually at the application level. That’s not necessary with SingleStoreDB, which can save you many person-years of development time — letting you focus on the app, and not the data platform.

Although we can stretch and grow with your workload, we’re pushing to make our elastic scale easier and more powerful. With the latest release of SingleStoreDB, we’ve taken a big step toward our ultimate goal of transparent elasticity with our flexible parallelism (FP) feature. FP is the premier feature in this release, and is part of our long-term strategy to make SingleStoreDB the ultimate elastic relational database for modern, data-intensive applications.

Flexible parallelism

SingleStoreDB uses a partitioned storage model. Every database has a fixed number of partitions, defined when you create the database. Historically, SingleStoreDB has done parallel query processing with one thread per partition. This works great when you have an appropriate partitioning level compared to the number of cores, such as one core for every one or two partitions.

With prior releases, when you scaled up, you couldn’t use the extra cores to run one query faster. You could only benefit from them by running more queries concurrently. FP changes that.

Flexible parallelism works by internally adding a sub-partition id to each columnstore table row. Then, query execution decides a set of sub-partitions for each thread to scan on the fly. The use of sub-partitions allows shard key matching queries to work. To make this efficient, the sub-partition id is added internally to the beginning of the sort key.

Figure 1. Allocation of three threads to scan two partitions in parallel.

Figure 1 illustrates an example division of two partitions into three subsets of roughly equal size, to be scanned concurrently by three threads, so that all three threads finish about the same time and don’t idle. This allows the fastest possible processing of all the data using three threads.

Performance example

Here’s a simple example of FP in action. My hardware has eight cores, and I’m using one leaf node. Hyperthreading is enabled, so I have 16 logical cores (vcpus). I created a database db1 with eight partitions, and FP disabled. So queries will use eight threads. I created a table t(a int) with one million rows, loaded with serial numbers 1…50,000,000. Table t is sharded on column a so data is evenly distributed across partitions. I ran this simple query that does a cast (:>) and string LIKE operation 30 times in a loop, to burn a little CPU to make the measurement easier to see:

select a into x from t where a:>text like '1000000';

This loop takes 33.23 sec. Now, I copy the data to a database fp with two partitions that has FP enabled. The partitioning for db1 and fp is as follows:

select DATABASE_NAME, NUM_PARTITIONS, NUM_SUB_PARTITIONS
             from information_schema.distributed_databases;

+---------------+----------------+--------------------+
| DATABASE_NAME | NUM_PARTITIONS | NUM_SUB_PARTITIONS |
+---------------+----------------+--------------------+
| db1           |              8 |                  0 |
| fp            |              2 |                128 |
+---------------+----------------+--------------------+

NUM_SUB_PARTITIONS is zero for db1, which indicates that it has FP is turned off for its tables. For database fp, it’s non-zero, so FP is enabled.

Now if I use fp, copy table db1.t to fp.t, switch to use fp and run the same loop again, the result is returned in about the same amount of time, 34.20 sec.

This is a snapshot of the CPU meter during the above run on database fp:

It shows that all the cores are almost fully working on this query.

The loop executions run almost equally fast with all cores engaged, even though database fp has only two partitions. See Appendix 1 for a full script for loading the data and running the loops.

To show the contrast with the non-FP approach to query processing, I disable FP by running this:

set query_parallelism_per_leaf_core = 0;

(The next section will discuss FP configuration in more detail.)

Now, I run the loop again, and it takes 1 min, 31.59 sec. It’s not exactly four times slower, but it’s in that vicinity. Moreover, the CPU meter looks like this while it’s running:

The key thing to observe is that only about a quarter of available CPU time is being used, since each query only uses two cores. The queries may be dispatched on different cores, one after the other, and the graph has some time-delay averaging. That’s why it doesn’t just drive two cores to 100% in the picture.

FP configuration

Flexible parallelism is on by default for customers using SingleStoreDB Cloud. It has sensible defaults, so most users won’t have to configure it. For SingleStoreDB Self-Managed Self-Managed, it’s off by default (but you can choose to turn it on). To use FP, you must enable it before you create the database you want to query.

These variables are available to configure flexible parallelism if the defaults aren’t what you want:

sub_to_physical_partition_ratio: This is a global variable. It causes creation of a specified number of sub partitions on newly created databases. For SingleStoreDB Cloud, this defaults to 16. That’s often a good choice if you’re self-hosting, too.
query_parallelism_per_leaf_core: This is a session variable. It specifies the query parallelism to use in the session. It’s a ratio between 0.0 and 1.0. A value of 0.0 means flexible parallelism is disabled. For SingleStoreDB Cloud, it defaults to 1.0. This means all the cores will normally be used by one query for a parallel operation.
expected_leaf_core_count: This is a global variable. It should be set to the number of CPU cores on the leaves. It defaults to the number of leaf cores on SingleStoreDB Cloud.

If you want to use FP for SingleStoreDB Self-Managed, use the default values discussed. That’ll normally be sufficient. Only consider changing these values if you want to achieve a specific goal, like reducing parallelism to allow more predictability under a heavy concurrent load.

You can check if flexible parallelism is enabled for a database by querying information_schema as follows:

select DATABASE_NAME, NUM_PARTITIONS, NUM_SUB_PARTITIONS
             from information_schema.distributed_databases;

An example of this was given earlier. Again, a zero value for NUM_SUB_PARTITIONS means FP is disabled for tables from the corresponding database.

Understanding, troubleshooting and tuning FP

With the introduction of FP, when troubleshooting a query performance issue, you may now want to check if FP took effect for the query. You can do that by checking the parallelism_level used in the operators in the query plan. There are three possible levels:

partition
sub_partition

The partition level uses a thread per partition, sub_partition applies a thread per sub-partition and segment applies a thread per segment (typically a million-row chunk of a columnstore table). EXPLAIN and PROFILE plans now describe the parallelism level of relevant query plan operators; to see it, look for the parallelism_level symbol in the plans. If you see partition_level:sub_partition or partition_level:segment in the plan, then FP is being used for the query. Here’s an example:

singlestore> explain select count(*) from t join t2 on t.a=t2.b;


Project [CAST(COALESCE($0,0) AS SIGNED) AS `count(*)`] est_rows:1                            
Aggregate [SUM(remote_0.`count(*)`) AS $0]                                                   
Gather partitions:all est_rows:1 alias:remote_0 parallelism_level:sub_partition              
Project [`count(*)`] est_rows:1 est_select_cost:2                                            
Aggregate [COUNT(*) AS `count(*)`]                                                           
HashJoin                                                                                     
|---HashTableProbe [t.a = r0.b]                                                              
|   HashTableBuild alias:r0                                                                  
|   Repartition [t2.b] AS r0 shard_key:[b] parallelism_level:partition est_rows:1            
|   TableScan db.t2 table_type:sharded_rowstore est_table_rows:1 est_filtered:1              
ColumnStoreFilter [<after per-thread scan begin> AND <before per-thread scan end>]
ColumnStoreScan db.t, KEY ... table_type:sharded_columnstore est_table_rows:1 est_filtered:1

This means that the Gather operation and everything following it uses sub_partition level parallelism, unless otherwise specified. The Repartition uses partition level parallelism because table t2 is a rowstore, and FP doesn’t apply to rowstores. Rowstores are always processed using partition-level parallelism — however, after rowstore operations are processed, the resulting rows can be used by other operations that use FP.

The interesting expression “after per-thread scan begin” and “before per-thread scan end” (showing up as part of ColumnStoreFilter) represents the filters added internally by FP query execution. They filter the rows being processed by each thread, so that each thread only handles rows that belong to the set of sub-partitions assigned to it.

Another important aspect of troubleshooting and tuning FP is to check the settings of the three variables mentioned, and also to make sure you know whether FP is enabled for your database. This can be found by querying information_schema.distributed_databases. If NUM_SUB_PARTITIONS is 0, then FP is disabled for that database.

Cases that fall back to fixed parallelism

Some operations fall back to fixed parallelism, which is one thread per partition. These include:

Reading from rowstore or temporary tables. Processing only falls back for parts of the query — the scans on those tables, and any shard key matching operations on top of them.
Write queries. For INSERT .. SELECT, we only fall back for the INSERT part.
Queries inside a multi-statement transaction
Single-partition queries and queries with a shard-key-matching IN-list
Queries that use the PARTITION_ID built-in function.

Optimizations disabled by flexible parallelism

Some optimizations related to ordering or sorting are disabled by FP. Some cases of ordered scans on the columnstore sort key are incompatible with the internal table schema change made for flexible parallelism, including:

ORDER BY “sort key”
Merge joins between sharded and reference columnstore tables

The aggregator GatherMerge operation is currently unsupported for queries using flexible parallelism. It falls back to Gather, followed by a sort on the aggregator.

Segment elimination — the skipping of whole segments based on applying filters to min/max column metadata for the segment — still works with sub-partitioning, but it may be less effective for some applications.

Other limitations and future work

A limitation of FP is that partition split (done with the SPLIT PARTITIONS option of the BACKUP command) is blocked if it causes the number of partitions to become larger than the number of sub-partitions. This is a rare occurrence — it’s unlikely to affect many users since the default number of partitions is 16 on SingleStoreDB Cloud. You’d have to split more than four times to experience this.

You’ll want to make sure that the number of sub-partitions is large enough that you can split one or more times, and still have plenty of sub-partitions in each partition. If you expect several splits, use more sub-partitions by setting sub_to_physical_partition_ratio to a larger value, like 64. In the future, we can overcome this limitation by increasing the sub-partition count during splits.

There is currently no command to enable or disable flexible parallelism on existing databases. Making a database capable of supporting FP can only be done at the time you create the database. If you want to make FP available for an existing database, you’d need to create a database with FP enabled and copy the tables from the original database to the new one. One way to do this is with CREATE TABLE LIKE and INSERT…SELECT.

Should I use FP?

If you are using SingleStoreDB Cloud, it’s likely that you may want to quickly scale up, given the tendency of new cloud applications to start small and expand. Since flexible parallelism is automatically enabled for all new cloud deployments, you can take advantage of improved scalability without any changes to your application.

If you’re Self-Managed, are building a new application and you think you will expand your hardware by adding nodes or increasing node sizes over the next few years, you should use FP. If you’re Self-Managed and have an existing application that is performing well, but you do think you will expand nodes and hardware, you may want to migrate to FP — but it’s important to note that requires reloading your database for now, since there’s currently no way to add sub-partitioning in place. And, you’ll need to test your application to see if loss of some of the optimizations discussed earlier will impact it seriously if you use FP. The impact will likely be less than the benefit, but you’ll need to verify that.

If you’re certain you know the size of your workload and data for the next several years and don’t think it will expand much, then you don’t really need FP. Your up-front number of partitions will likely be the right number. And, you won’t be subject to the limitations we’ve discussed.

The future of elasticity in SingleStoreDB

Flexible parallelism is a major step forward in elastic scalability for SingleStoreDB. You don’t have to think so hard about how many partitions your database should have anymore, because even with few partitions per leaf, you’ll still be able to take advantage of all your cores to run individual queries fast.

We see many areas where we can enhance elastic scalability even further, including:

New ways to share a single database among multiple compute clusters
Creating dynamic, online repartitioning
Automatically scaling computing resources up and down in the cloud to provide a great experience during peak load and save money during slack time
Removing the limitations of FP, and more

SingleStoreDB is an incredibly elastic RDBMS. And it’s only getting more limber!

Try FP in 7.8. And stay tuned for more dynamic, flexible data power.

Appendix 1: Load Script For FP Demo

-- create a standard db without sub-partitioning

set global sub_to_physical_partition_ratio = 0;

create database db1;

use db1;

-- insert serial numbers 1..50,000,000 in a table

delimiter //

create or replace temporary procedure tp(SIZE int) as
declare
  c int;
begin
  drop table if exists t;
  create table t(a int, shard(a));
  insert t values (1);
  select count(*) into c from t;
  while (c < SIZE) loop
    insert into t select a + (select max(a) from t) from t;
    select count(*) into c from t;
  end loop;
  delete from t where a > SIZE;
  echo select format(count(*),0) from t;
end 
//
delimiter ;
call tp(50*1000*1000);

--  run this
delimiter //
do
declare x int;
begin
for i in 1..30 loop
  select a into x from t where a:>text like '1000000';
end loop;
echo select 'done';
end
//
delimiter ;

-- the above loop completes in 33.23 sec

-- turn on FP for new DBs:
set global sub_to_physical_partition_ratio = 64;
set global query_parallelism_per_leaf_core = 1.0;
set global expected_leaf_core_count = 16;

-- leave session and start a new one
exit;
singlestore -p

-- show my FP variable settings
select @@query_parallelism_per_leaf_core, @@expected_leaf_core_count, @@sub_to_physical_partition_ratio G

/*
@@query_parallelism_per_leaf_core: 1.000000
       @@expected_leaf_core_count: 16
@@sub_to_physical_partition_ratio: 64
*/

create database fp partitions 2;
use fp;

create table t like db1.t;

insert t select a from db1.t;

--  run this again:
delimiter //
do
declare x int;
begin
for i in 1..30 loop
  select a into x from t where a:>text like '1000000';
end loop;
echo select 'done';
end
//
delimiter ;

-- the result was returned in 34.20 sec 

-- the time is about the same, indicating all the cores are 
-- being used, even though there are only 2 partitions

-- now turn off FP:
set query_parallelism_per_leaf_core = 0;

--  run this again:
delimiter //
do
declare x int;
begin
for i in 1..30 loop
  select a into x from t where a:>text like '1000000';
end loop;
echo select 'done';
end
//
delimiter ;

-- the result was returned in 1 min 31.59 sec

Feed: SingleStore Blog.
Author: .

From development to delivery, SingleStoreDB Cloud ensures that security is considered, designed, reviewed and implemented so that the data of our customers — and their customers— is safeguarded as if it were our own. We’ve built security into all of our products; those hosted by customers on their own infrastructure, and those we host on our customers’ behalf.

SingleStoreDB automatically manages encryption, authentication, access and monitoring so you can focus your efforts on your data and the value it adds. We maintain a holistic approach to information protection, combining a set of controls that help businesses meet their compliance objectives — while always ensuring our customers’ data is secure.

Figure 1: Security aspects of SingleStoreDB Cloud

As shown in Figure 1., SingleStoreDB Cloud ensures an end-to-end security posture for all of your critical enterprise workloads. This requires securing customer data using in-depth defense against three security threat concerns:

How is my data stored in SingleStoreDB Cloud protected?
How can someone connect to SingleStoreDB Cloud?
How do I prevent the wrong person from accessing data?

Data Encryption

As a SaaS offering, it’s critical that customers trust that housing their data in SingleStoreDB Cloud is safe and also have peace of mind that their data will not be compromised. Ensuring customer data is always secure, SingleStoreDB Cloud offers multi-layer security for data storage and retrieval, encrypting customer data while it’s both in motion and at rest. Here are the key controls and capabilities that SingleStoreDB Cloud provides to ensure customer’s data is always safeguarded:

Data security-in-motion. SingleStoreDB Cloud guarantees that all connections to the database are enabled with TLS 1.2, which ensures data-in-motion is always encrypted. In addition, each client connection is checked to ensure that valid certificates are being used, and that the connection to SingleStoreDB Cloud is secure.
Data security-at-rest. In accordance with industry best practices, all data stored in SingleStoreDB Cloud is encrypted at rest using an AES 256-bit encryption key. Encryption scope includes both EBS and S3 for AWS, Premium SSD and Blob store for Azure, and Persistent Disks and Google Cloud Storage for GCP.
Customer managed encryption keys (stores in the cloud KMS). An additional layer of security for customer data can be added using the keys stored in a customer’s cloud key management service (KMS). Additionally, separate keys can be used for data backups and their associated bucket(s). This feature is only supported for the dedicated edition of SingleStoreDB Cloud.

Figure 2: Customer Managed Encryption Key (CMEK)

Read the whitepaper: SingleStoreDB Cloud Security

Network Security

Since SingleStoreDB Cloud is a cloud-based (remote) service, properly granting access to a SingleStoreDB Cloud workspace is the first line of defense. Network security in SingleStoreDB Cloud ensures that only properly configured resources can gain access to a database and its data. The controls available to a customer include:

Accessing data using private networking (inbound and outbound). Customers that want secure connectivity from inside their VPC — but who do not want to connect to SingleStoreDB Cloud over the public internet — can use AWS PrivateLink, Azure Private Link and Google Private Service Connect to connect to AWS, Azure and GCP deployments, respectively. This private connectivity applies to both inbound and outbound traffic. This feature is only available to customers, regardless of the SingleStoreDB Cloud editions they use (Standard, Premium and Dedicated). Private networking can also be used for creating outbound connectivity to object storage (blob storage) for backups and/or customer-initiated copies of data.
IP allowlisting. Traffic to SingleStoreDB Cloud is over the internet when using the IP allowlisting. Customers can restrict access to a SingleStoreDB Cloud workspace group from a specific set of IP addresses based on a corporate network policy. Having a known client location and IP address helps prevent unauthorized access to your SingleStoreDB Cloud. If an application is hosted outside of your VPC, it’s critical to define a restrictive set of IP addresses that are required to successfully run the application. However as a best practice, you should always choose private connectivity over IP allowlisting to connect to SingleStoreDB.

Access Control

SingleStoreDB Cloud provides a variety of access control tools and capabilities that govern user login, permissions and data access, including:

Native password authentication. Customers can use native password authentication both for SingleStoreDB Cloud and database connectivity. Password complexity is flexible and can be based on your corporate policy.
Single Sign-On (SSO). As the amount of users accessing the database increases, customers can also connect to SingleStoreDB Cloud (both the data plane and the portal) using identities defined in an Identity Provider (IdP) of their choice (Azure AD, Okta, Ping). This allows database user identities to be managed from a central location, which simplifies access management at scale. Customers may also enable multi-factor authentication (MFA) to provide an additional layer of security.
JWT/JWKS authentication. To eliminate the use of passwords, customers can authenticate clients via JSON Web Tokens (JWT). JWTs are useful for both authorization (the most common scenario for using a JWTs) and information exchange (where information can be securely transmitted between parties). JWTs can be created by both the SingleStoreDB Cloud portal and by customer-run identity providers, and used to authenticate users to database clusters. Additionally, JSON Web Key Sets (JWKS) can be used to validate the signature of a signed JWT. JWKS are a set of keys containing public keys that can be used to authenticate any JWT. Also, if you use SingleStoreDB native drivers to connect to the database, you can use the JWT authentication directly in the drivers.
Role-Based Access Control (RBAC). For more granular security SingleStoreDB Cloud offers Role- Based Access Controls (RBAC), a reduced-privileges, role-separated environment where users and groups can be granted access to databases, tables, views, etc. based on role.
Row-level security. Similar to RBAC, row-level security in SingleStoreDB Cloud can be used to dictate which roles have access to specific rows in a table.

Conclusion

The inherent security capabilities in SingleStoreDB Cloud allow you to build your next cloud-based, mission-critical application knowing that it will run on a secure infrastructure. Data security is a fundamental value with SingleStoreDB, and we continually follow industry guidelines and have adopted best practices to ensure that your data remains secure throughout its lifecycle.

In addition to the best practices, policies, process and procedures we’ve implemented, we continually invest in new measures and certifications to ensure that we — and our products — remain at the forefront of data security.

SingleStoreDB Cloud operates within a shared-responsibility model. Our company, our customers and our service providers share the responsibility for identifying and preventing compromises in our respective infrastructures and/or data. If you have questions regarding data security in SingleStoreDB Cloud, please contact us at security@singlestore.com.

Try SingleStoreDB free

Feed: SingleStore Blog.
Author: .

Founded in 2012, SingleStoreDB is a real-time distributed SQL database that offers ultra-fast, low-latency access to large datasets — simplifying the development of modern enterprise applications. And by unifying transactional (OLTP) and analytical (OLAP) workloads, SingleStoreDB introduces new efficiencies into your data architecture.

In a time defined by faster analytics, higher availability and lower latency, it’s more important than ever before that your technology infrastructure meets increasing data demands. Whether you take a multi-cloud or hybrid cloud approach to your data infrastructure, running SingleStoreDB on Google Cloud helps you better manage workloads across both cloud and on-prem systems.

Why Choose SingleStoreDB for Google Cloud?

SingleStore has emerged as a quickly growing database (growing 192% in 2021 on Google Cloud), and is leveraged by global brands like Uber, who recognize the positive impact of using Cloud Marketplaces:

“It’s a huge opportunity organizationally though to be able to move faster through that buy cycle because now, you can make that decision in a much cleaner, much faster environment.” — Sarah Keller, Director, Technology and Supply Chain at Uber.

SingleStoreDB is built on Unified Storage which is a hybrid combination of in-memory Rowstore and on-disk Columnstore, managed dynamically across three-tiers of storage (memory, persistent ssd cache, and Cloud Object Storage, such as Google Cloud Storage). This offers virtually unlimited storage, and the separation of storage and compute.

Customers choose SingleStoreDB with Google Cloud to power their modern, data-intensive applications and deliver fast real-time analytics — meeting even the toughest service level agreements.

Bringing It All Together

Google Cloud offers some of the most advanced and innovative cloud data technologies on the market, so why introduce another cloud database? SingleStoreDB is perfect for those scenarios that just don’t quite fit with BigQuery, Bigtable or Spanner alone.

For example, SingleStoreDB can be used to supplement your full-scale BigQuery data warehouse with real-time insights on highly active data, providing the ability to ingest and handle updates and deletes to the tune of millions of rows per second, and serve complex analytic queries on the same data immediately. Real-time interactive dashboards serving large volumes of concurrent users are SingleStoreDB’s bread and butter.

Google Cloud global infrastructure delivers the highest level of performance and availability in a secure, sustainable way. Enterprise customers increasingly choose to run SingleStoreDB on Google Cloud to access their Global Network Backbone. Google Cloud owns its global fiber network, providing customers with strong security, networking and access to industry-leading AI/ML solutions.

Success on Google Cloud Marketplace With SingleStoreDB

Customers choose SingleStoreDB with Google Cloud to power data-intensive and real-time analytics use cases, maintaining scale and concurrency and meeting the toughest service level agreements, including:

Ultra-fast ingest: Process millions of events per second (up to 10M upserts per second) for immediate availability
Super-low latency: Sub-second latencies with immediate consistency — at least 50ms for fraud detection
High concurrency: Millions of real-time queries, across tens of thousands of users

Leading organizations adopt SingleStoreDB to augment their data warehouses (like BigQuery) and modernize legacy datastores (like Hadoop).

What our customers say

To give you a better idea of how our customers are benefiting, here are some recent comments about running SingleStoreDB on Google Cloud and transacting through Google Cloud Marketplace:

“Our goal has and will always be to build our platform in a way that makes it feel like an on-premise solution. Speed of data processing and delivering are critical to providing this within our solutions. Both Google Cloud and SingleStoreDB have helped us achieve this.” — Benjamin Rowe, Cloud & Security Architect, Arcules

“The SingleStoreDB solution on Google Cloud allows marketing teams to make more informed data decisions by organizing data in one system and allowing for self-serve analytics” — Praveen Das, Co-Founder, Factors.ai

Our customer experiences and the business impacts they achieve with our solutions are our most critical KPIs. By partnering with Google Cloud, we have unlimited potential to improve our services to power the next generation — and deliver advantages to our customers, including:

Customers can easily transact and deploy via Marketplace, providing that true Software-as-a-Service feel for SingleStoreDB
Transacting through Marketplace counts toward customers’ Google Cloud commit, which helps partners like us tap into budget that customers are already setting aside for Google Cloud spend
Billing and invoicing are incredibly intuitive and efficient, since SingleStoreDB compute is delivered on customers’ unified Google invoice through Google Cloud Marketplace
Many of our customers have previously signed the Google Cloud User License Agreement (ULA), which further expedites procurement by speeding up legal reviews

We are also able to co-sell more effectively with the Google Cloud sales team as they receive credit for a customer sales. Even more, customers receive 100% of the SingleStore transaction amount against any existing “commit” they have with Google Cloud.

Discover how SingleStoreDB can transform your business on the Google Cloud Marketplace.

Try SingleStore free today.

Feed: SingleStore Blog.
Author: .

Every organization strives to process more data faster to be able to react in real time to customers, manufacturing, logistics, pricing and other business decisions. This requires optimizing two related data functions.

The first is to make data broadly accessible. Legacy data systems were purpose built for applications and specific application performance. The advances in flash data storage and networking have provided the technology foundation for modern, distributed data platforms that can serve multiple applications and data types effectively. Companies are simplifying data management by removing data silos and natively supporting multiple data types.

The second is to reduce the overhead of legacy data pipelines. The traditional data cycle was transaction, extraction, transformation, combination and then eventually… analytics. Real-time decision making requires analytics queries very close to the data creation. New frameworks that include AI inference and Spark can only be effective when they have the data.

SingleStoreDB meets these needs by unifying data-intensive applications in a real-time distributed SQL database. It provides fast ingest, and supports multiple data types and queries.

At a customer’s request, IBM Storage had the opportunity to test SingleStoreDB with IBM Spectrum Scale. It is natural to want to deploy a highly scalable database (SingleStoreDB) on the leader in scalable, distributed storage. The combined solution is more impressive and complementary than we originally anticipated.

Our customers adopt IBM Spectrum Scale to provide a Global Data Platform on which they had deployed multiple databases, NoSQL data stores, HDFS and related applications. The IBM storage solution eliminates data movement and provides fast, multi-protocol access for different teams and applications. It also has superior data storage economics through consolidation and data tiering.

SingleStoreDB combines transactional and analytical workloads, which is one of the key reasons our client is intending to deploy SingleStoreDB to consolidate and simplify. They wanted fast, scalable data access that would be flexible in data ingest, but also able to tap their existing data lake on HDFS (on IBM Spectrum Scale).

The combined solution provides deployment simplicity, independent scaling of compute and data, and enhanced data agility. We used OpenShift Kubernetes to deploy IBM Spectrum Scale and SingleStoreDB across multiple servers. IBM Spectrum Scale presents as a local filesystem, though it is distributed across the SingleStoreDB servers. The combined solution performs as if each server were independent, yet it provides a consolidated storage platform. The consolidated storage platform eliminates over-provisioning and increases utilization.

It also has the intrinsic benefit for data ingest and data extraction into a common location. This is important to our customer who already takes advantage of IBM Spectrum Scale’s native storage protocols to write data directly from some applications. Each node in SingleStoreDB will have easy access to this data to ingest the raw data into the database.

Our thanks to the technical team at our customer who initiated this work, and to those at SingleStore and IBM Storage who quickly tested and demonstrated the solution.

More to come!

Learn more about SingleStoreDB with IBM

Try SingleStoreDB free.

Feed: SingleStore Blog.
Author: .

The global Postgres community boasts tens of thousands of users and several thousand enterprises. That must say something good about the technology, right?

Postgres has become very attractive to developers, thanks to benefits like its strong support for transactions, ability to handle JSON natively and ecosystem extensibility. Moreover, the database is widely adopted — given that it is open source and available as an option on popular services like AWS RDS. SaaS applications like Instagram, FlightAware and Reddit even use Postgres.

Postgres developers looooooooove Postgres, so much so that it’s the #4 database on TrustRadius and on the DB-Engines ranking!

So, What’s the Catch?

The data-intensive world we live in today has introduced new, far more complex customer demands than ever before. SaaS providers have gone from simply providing a service to providing data as a service, even if it wasn’t originally part of their business plan. For example, take Strava: once an app to help athletes track their runs and bike rides, Strava now has a value-add analytics product to compare today’s workout to historical ones (and they can charge for it, too!).

As these demands for in-app analytics continue, developers managing these services have become hard-pressed to scale Postgres (and other databases, like MySQL). Analytics over trillions of records become quite slow on single-node systems that are not built for large scale aggregations, window functions and filters with tight SLAs.

Several providers saw an opportunity to scale Postgres to be a distributed SQL database — like CitusDB. Users could then take advantage of distributing their database across several machines and be able to partition their data cleanly. It turns out that this approach works fine if all you need is more compute for transactional queries, but the band-aid still falls off when it comes to the complex analytics. Not to mention, there have been several anecdotal reports of reliability issues with distributed Postgres.

The Good News? There’s Another Way

As discussed so far, the demands of data-intensive applications are centered around support for transactions and analytics. SingleStoreDB is the only database that supports both transactions and analytics, while still maintaining all of the great features of open source databases like Postgres! SingleStoreDB is a multi-model, real-time distributed SQL database with strong JSON support, ecosystem integration through our MySQL wire compatibility and robust support for analytics with our patented Universal Storage — and SingleStoreDB is the #1 database on TrustRadius.

Many organizations have migrated their applications from Postgres to SingleStoreDB. The majority of these migrations were completed in weeks — some in just days. Here’s a great example:

Foodics is a leader in the restaurant and point-of-sale industry. One huge differentiator for Foodics’ business is their ability to provide advanced analytics on inventory, menus and overall restaurant operations. As Foodics added more analytics features to their offering, they experienced database-level challenges including:

Ongoing service instability
Constant re-balancing of data
Low concurrency that only supported 200 users

Foodics came to SingleStore looking to improve their analytics performance on transactional data, and engineers from both teams collaborated on a two-week sprint to change their database destiny. Some of the tests included loading 10 billion rows, using complex queries with wide date ranges and leveraging SingleStoreDB’s dbbench to simulate large concurrency loads. Data loads via S3 were seamless thanks to SingleStoreDB Pipelines. After experimenting with a few different shard keys and sort keys, Foodics saw fantastic results:

A performant analytics engine (with Columnstore) to democratize data access
High concurrency to support a large number of reports being generated simultaneously
A fully managed database with low TCO that freed up engineering teams

Watch the webinar: Supercharging SaaS Applications: The Foodics Story

So How Do I Do It?

Bulk data load

In this example, we’ll use a table we’ve created in AWS Aurora MySQL and migrate it to SingleStoreDB.

AWS RDS Postgres Table:

CREATE TABLE `scan` (
  `scan_id` bigint(20) NOT NULL,
  `package_id` bigint(20) NOT NULL,
  `loc_id` int(11) DEFAULT NULL,
  `Loc_name` char(5) DEFAULT NULL,
  PRIMARY KEY (`package_id`,`scan_id`)
) ;

Scan table in Postgres:

select count(*) from scan;
7340032

Simple export of data as a CSV:

SELECT * from scan
INTO OUTFILE  s3 's3://data-bucket-pb/Jan13/scan.csv' 
FORMAT CSV
FIELDS TERMINATED BY ',' 
ENCLOSED BY '"'
LINES TERMINATED BY 'rn';

Create a database and a table. Note the addition of a SHARD KEY and a COLUMNSTORE KEY in the SingleStore DDL. These will enable optimal distribution and organization of data to ensure lightning fast queries. SingleStoreDB Documentation offers advice on how to select these keys for your tables:

create database mem_0113;
use mem_0113;

create table scan (
 scan_id BIGINT NOT NULL,
 package_id BIGINT NOT NULL,
 loc_id INT,
 loc_name CHAR(5),
 KEY (scan_id) USING CLUSTERED COLUMNSTORE,
 SHARD(package_id) );

Create SingleStore Pipeline to get data from S3. This is a super simple way to get data from several external sources:

CREATE PIPELINE pipe1_scan
AS LOAD DATA S3 'data-bucket-pb/Jan13/scan.csv.part_00000'
CONFIG '{REDACTED}'
CREDENTIALS '{REDACTED}'
INTO TABLE mem_0113.scan
FORMAT CSV FIELDS TERMINATED BY ',' ENCLOSED BY '"'
LINES TERMINATED BY 'rn'
(scan_id,package_id,loc_id,loc_name);

Start SingleStore Pipeline:

start PIPELINE pipe1_scan

Check table for records, and we have the same # of rows as we did in Aurora:

select count(*) from scan; --7340032

SingleStoreDB Replicate Tool

SingleStoreDB offers lightweight migration tooling for your bulk data load needs in initial migrations. This can also be used for incremental CDC after the initial load of data. These two features allow users to test out their workload on SingleStoreDB, and then have a zero-downtime cutover when moving to production. Let’s look at another example of a table in RDS Postgres, which covers the bulk data load:

AWS RDS Postgres Table:

CREATE TABLE `scan` (
  `scan_id` bigint(20) NOT NULL,
  `package_id` bigint(20) NOT NULL,
  `loc_id` int(11) DEFAULT NULL,
  `Loc_name` char(5) DEFAULT NULL,
  PRIMARY KEY (`package_id`,`scan_id`)
) ;
 
select count(*) from scan;
7340032

The scan table includes 7.3 million records.

Configuration File:

To configure the connectivity between RDS Postgres and SingleStoreDB, we simply populate two configuration files pointing to the respective databases. Below is the example of the SingleStoreDB config file (yaml):

type: POSTGRESQL

host: demo.cynodgz9a7ys.us-east-1.rds.amazonaws.com
port: 5432

database: my_pg_db
username: <redacted>
password: <redacted>

max-connections: 30
max-retries: 10
retry-wait-duration-ms: 1000

replication-slots:
  io_replicate:
    - wal2json
  io_replicate1:
    - wal2json

-----------------------------------------------------------

type: SINGLESTORE

host: svc-1732741a-f499-467c-a722-9887d73150c1-ddl.aws-virginia-2.svc.singlestore.com
port: 3306

username: <redacted>
password: <redacted>

#credential-store:
#  type: PKCS12
#  path: #Location of key-store
#  key-prefix: "memsql_"
#  password: #If password to key-store is not provided then default password will be used

max-connections: 30

max-retries: 10
retry-wait-duration-ms: 1000

Execute Replicate Command:

Now, we execute the REPLICATE command based on the configuration file previously populated.

./bin/replicant snapshot conf/conn/postgres.yaml conf/conn/singlestore.yaml

Verify that databases and tables exist in SingleStoreDB:

select count(*) from scan; --7340032

Summary

As you can see, there are a few different ways to easily migrate your existing Postgres database to SingleStoreDB. This simple migration will elevate your database from single-node and slow, to distributed and lightning fast. Ingest data more easily, make your queries faster and improve support for concurrency with one of these migration options today.

SingleStoreDB Cloud offers $500 in free credits to get started with just a few clicks. The SingleStoreDB Cloud Engineering Team that contributed to this blog is always standing by to assist you with your MySQL migration, or any subsequent questions you may have about the platform.

Schedule your time to chat with SingleStore engineers today.

Feed: SingleStore Blog.
Author: .

In this guide, we’ll show you how to get up and running with Laravel and SingleStoreDB in a matter of minutes. You can view the full example repository at github.com/singlestore-labs/start-with-laravel or follow along here.

If your machine is not already set up with Composer, we recommend reading their getting started documentation. You may also consult Laravel’s getting started documentation.

To get started, create a fresh Laravel project.

# Create a new Laravel project.
composer create-project laravel/laravel singlestoredb-example
cd singlestoredb-example

Then require the SingleStoreDB database driver for Laravel.

# Require the SingleStoreDB driver for Laravel.
composer require singlestoredb/singlestoredb-laravel

Now we need to update your config/database.php to use this new SingleStoreDB driver. Here, we’re going to create a new connection named singlestore in the connections key of the configuration. Notice that it’s almost an exact copy of the MySQL configuration, but with the driver set to singlestore.

'singlestore' => [
    'driver' => 'singlestore',
    'url' => env('DATABASE_URL'),
    'host' => env('DB_HOST'),
    'port' => env('DB_PORT'),
    'database' => env('DB_DATABASE'),
    'username' => env('DB_USERNAME'),
    'password' => env('DB_PASSWORD'),
    'unix_socket' => env('DB_SOCKET'),
    'charset' => 'utf8mb4',
    'collation' => 'utf8mb4_unicode_ci',
    'prefix' => '',
    'prefix_indexes' => true,
    'strict' => true,
    'engine' => null,
    'options' => extension_loaded('pdo_mysql') ? array_filter([
        PDO::MYSQL_ATTR_SSL_CA => env('MYSQL_ATTR_SSL_CA'),
        PDO::ATTR_EMULATE_PREPARES => true,
        PDO::ATTR_PERSISTENT => true
    ]) : [],
],

Over in your .env file, change the DB_CONNECTION to point to the new singlestore connection. You’ll also need to set the rest of the DB_* variables to point to your SingleStoreDB instance.

DB_CONNECTION=singlestore
DB_HOST= # SingleStoreDB URL
DB_PORT=
DB_DATABASE=
DB_USERNAME=
DB_PASSWORD=

There are a few of the default Laravel migrations that you’ll need to modify before you can run php artisan:migrate. Mainly, we are changing unique constraints to plain indexes, to be compatible with SingleStoreDB’s sharding strategy. If you need to enforce uniqueness, you can either change these migrations further or enforce uniqueness at the application layer.

First, publish the Sanctum migrations so that we’re able to modify them.

php artisan vendor:publish --tag=sanctum-migrations

Then make the following changes:

Open 2014_10_12_000000_create_users_table.php and change the unique key on the email column to index.
Open 2019_08_19_000000_create_failed_jobs_table.php and change the unique key on the uuid column to index.
Open 2019_12_14_000001_create_personal_access_tokens_table.php and change the unique key on the token column to index.

That’s it! You can now run php artisan migrate and you should see all of your tables in SingleStoreDB.

You’re fully up and running with Laravel and SingleStoreDB.

For further reading, you can check out the SingleStoreDB documentation, or the SingleStoreDB Laravel driver documentation.

Try SingleStoreDB free today.

Feed: SingleStore Blog.
Author: .

Whether you’re a developer, engineer, IT or business leader, the ability to supercharge real-time customer experiences is more important than ever before.

On July 13, SingleStore is diving into all things real time with our Summer 2022 launch event, [r]evolution. This free, virtual event will highlight next-level innovations in SingleStoreDB — the #1 database for unifying transactions and analytics.

From first-look product demos to industry deep dives with leaders across fintech, cybersecurity, IoT and more, [r]evolution 2022 unlocks unprecedented access and insight to all things real-time data.

The event will kick off with a main session hosted by SingleStore CEO Raj Verma, and how we’ve crossed the real-time rubicon. You’ll also hear from engineers on our Launch Pad team as they demonstrate new features in SingleStoreDB including Workspaces, Code Engine powered by Wasm, Data APIs and more.

After the main session concludes, you’ll have the opportunity to choose your own adventure and join one of two breakout tracks. Here’s more info on what you’ll find in each:

Breakout Session A: Developers and Engineers

Host: Adam Prout, CTO at SingleStore

This session for developers, architects and engineers will take you on an under-the-hood guided tour of SingleStoreDB and its architectural design. This hour includes everything from building a database for real-time applications to a deeper dive at the newest product features in SingleStoreDB, and a partner showcase with MindsDB CEO Jorge Torres.

Breakout Session B: IT and Business Leaders

Host: Oliver Schabenberger, CIO at SingleStore

This session for IT and business leaders focuses on how to build a framework for measuring digital maturity and resilience. During this hour, you’ll get a look at data-intensive applications in action, customer and partner showcases with IBM, Siemens and Impact.com, and a fireside chat with John Foley, founder and editor of the Cloud Database Report.

Additionally, all attendees will be entered into a raffle for a chance to win a pair of Apple Airpods Pro, SingleStore swag and a grand prize summer vacation package!

Can’t make the live event? Register anyway, and we’ll share on-demand content with you as soon as it’s available.

Feed: SingleStore Blog.
Author: .

Amidst an otherwise challenging year due to world events, I am thrilled that SingleStore’s results thus far have been outstanding, and we now have even more to celebrate.

Today, we announce a new round of funding in the amount of $116M led by the beacon of financial institutions, Goldman Sachs Asset Management. We also welcome Sanabil Investments, who joined this round as a new investor, and we’re thrilled to have participation from existing investors whose continued investment speaks to their belief in the company’s value and growth. With this funding, we have raised $278 million over the last 20 months, and our valuation is upwards of $1.3 billion. We are truly at unicorn status!

We continue to strengthen our executive team as well, and I am very pleased with the tremendous talent and experience we’ve added to SingleStore’s leadership. Earlier this year, we announced that Shireesh Thota joined as SVP of Engineering, alongside Yatharth Gupta, VP of Product Management — two very impressive hires for us. Our new chief financial officer, Brad Kinnish, came on board at the beginning of June, joining us as we finalized the details of this latest round. And today, we announced that Meaghan Nelson joins us as General Counsel. I cannot emphasize enough how strategic these new leaders are for SingleStore, and how excited I am to see what we can accomplish together.

It really has been a year of accomplishment for SingleStore, and it’s only July. We kicked off 2022 by partnering with industry leaders IBM and SAS to accelerate insights for real-time, data-intensive applications and reduce total cost of ownership (TCO) for our joint customers.

And our claims are validated. We conducted a TCO study with GigaOm that revealed our unparalleled performance in the industry. Results show that SingleStoreDB delivers a 50% lower TCO against the combination of MySQL and Snowflake and a 60% lower TCO compared to the combination of PostgreSQL and Redshift.

While we’ve received many accolades and industry awards, there is one in particular I want to call out. In May 2022, we were honored to have won four top-rated categories from TrustRadius, the verified site for the users of technology to provide reviews. We won because our users reviewed us and shared amazing stories about how much they love using our technology. Thank you to all who shared your excitement about what you can accomplish with SingleStore.

All the above speaks to our upward trajectory in the database space even as we notice broader shifts in the industry. In the last few weeks, we’ve seen database companies like Snowflake and MongoDB make announcements about plans to update their capabilities to support transactional and analytical data to keep up with market demands. We can certainly understand why these worthy competitors want to head in this direction. We think it’s great that they understand the shifts in the industry, and we appreciate the renewed attention their announcements put on the importance of having both transactional and analytic capabilities in one database, something that we embraced from day one.

It’s pretty hard to take a gasoline powered car and turn it into an electric one by strapping on a new engine. It doesn’t work, at least not well. The Tesla had to be designed as electric from the get go. The fact is, SingleStore has realized from the beginning that one database for both transactions and analytics is the only way to achieve fast results. It’s what we were designed for. The future demands real time, and we have empowered customers to do real-time streaming, operational and analytical processing using a single, general-purpose, modern, cloud-native database for several years running now. We have a fantastic product, and we are already leading the way in real-time analytics.

But we are not resting on our laurels. We will invest time and resources to innovate more and even faster — like our collaboration with Intel. Working together, we will optimize the performance of SingleStoreDB on current and future Intel architectures so our customers can handle new levels of data intensity with real-time analytical and transactional workloads.

If you are curious to learn more about what we’re up to, please join us on July 13 when we host [re]volution 2022, our launch event that includes customer use cases involving data intensity, live product demos and much more. See you there.

Feed: SingleStore Blog.
Author: .

The world’s fastest cloud database has added key features to accelerate enterprise adoption for data applications. Workspaces allow organizations to deploy and scale isolated workloads across shared data, and Code Engine allows developers to push computation directly into the database for performance and agility.

Now, we are announcing the addition of new capabilities which enhance the performance, scalability and manageability of SingleStoreDB, making it easier than ever to build modern data applications.

With the introduction of Workspaces, SingleStoreDB delivers isolated compute instances which can operate across shared data. This allows organizations to deliver rich customer facing applications, power real-time analytics workloads and operationalize machine learning models at scale.

SingleStoreDB now also provides Code Engine — Powered by Wasm. Code Engine is an embedded code execution engine which allows developers to write user defined functions, application logic or machine learning modules and execute them directly within the database. This delivers increased performance, simplifies development and exposes new opportunities for generating insights using operational data.

Check out a full list of the new features and capabilities available today in SingleStoreDB to help you build the next generation of modern data applications:

Workspaces

Workspaces are isolated and scalable compute deployments which provide ultra low-latency access across shared data. Each database can be attached to one or more workspaces concurrently, allowing simultaneous operation of multiple workloads on shared data.

This enables organizations to run customer facing applications, real-time analytics and operational machine learning across shared data, without managing complex data movement and ETL, or deploying multiple database solutions to handle operational and analytical workloads.

Management API

Our Management API is a standardized REST API that allows you to deploy, manage and operate SingleStoreDB at the largest scales. Some of our largest customers are managing tens of thousands of deployments concurrently, which is only possible with robust and scalable management interfaces.

The SingleStoreDB Management API is a critical tool in meeting the needs of modern and real-time data applications, and beginning today it is available to all of our customers.

Security Enhancements

Security is critical for organizations large and small, and SingleStoreDB provides advanced features to ensure end-to-end protection for your company’s most critical data. With modern access controls including SSO and passwordless authentication, end-to-end encryption, compliance certification and advanced access controls, SingleStoreDB powers even the most secure financial and healthcare workloads.

Code Engine — Powered by Wasm

Code Engine is a powerful tool for developers, allowing them to write code once and deploy it directly in the database to create powerful user defined functions, apply machine learning algorithms, or provide application specific data transformations. This provides incredible performance, simplicity and extensibility for powering modern data applications.

Data API

Building and scaling serverless applications is easier than ever using the Data API. Providing stateless dynamic connections, the Data API enables developers to securely connect applications to the fastest and most scalable database for operational and analytic workloads.

Vector Functions

Vector Functions in SingleStoreDB enable developers to build and deploy image recognition models, sophisticated financial calculations and other advanced learning algorithms.

Leveraging Single Instruction / Multiple Data (SIMD) to deliver unmatched speed, and with built-in vector functions such as dot product, euclidean distance, JSON array pack/unpack, vector math (add, subtract, multiply, scalar multiply, sum, etc..) and vector manipulation (sort, slice, get-element), vector functions make it easier than ever for developers to create modern intelligent applications.

Flexible Parallelism

Flexible Parallelism allows workloads to scale evenly across compute resources delivering increased performance and better resource utilization as applications grow. This means you can develop your application on SingleStoreDB with the peace of mind that as your user base grows, your database will effortlessly scale with you.

Power BI Connector

The Microsoft-certified connector for Power BI enables business intelligence running directly on the data in SingleStoreDB, without complex integrations or custom connectors. Supporting both DirectQuery for real-time dashboards and Import Mode for point-in-time snapshots with additional support for Custom SQL, the Power BI Connector makes it easier than ever to enable data-driven decision making.

dbt Adapter

The dbt Adapter for SingleStoreDB provides true data ops, including CI/CD data pipelines, with the power of SQL for enrichment and transformation. And with the ability to run real-time workloads, the dbt Adapter for SingleStoreDB makes it easier than ever to manage data pipelines, schema evolution and transformation.

Try SingleStoreDB

SingleStoreDB offers a completely free trial of the cloud database-as-a-service to get started. The trial allows you to load data, connect your application, and experience the performance and scalability of a real-time Distributed SQL database.

You can also take a look at our demonstration of SingleStoreDB “running a trillion rows per second,” and try it for yourself to see how SingleStoreDB delivers the fastest performance and lowest query latency, deployed on your public cloud of choice.

Feed: SingleStore Blog.
Author: .
A real conversation about real time. In this thought-provoking keynote, SingleStore CEO Raj Verma explains his vision for real time — Why real time matters in our lives, what makes a system real time and the technological convergences that make such systems possible. He describes how SingleStore plays a role in creating real-time experiences for all kinds of customers across multiple industries. Raj shares how SingleStore plans use our latest, $116M round of funding to further advance our product roadmap across analytics, distributed SQL and the developer experience. Finally, Raj shares his stance on the issues affecting our world today, and SingleStore’s mission and dedication to making things better.

Feed: SingleStore Blog.
Author: .

For some time now I have wanted to build a universe simulation. The idea of millions of spaceships each making their own real-time decisions in a massive universe sounds compelling.

This idea isn’t new — many video games and AI competitions have explored the same topic. But to my knowledge, none of them have tried to run the entire simulation within a database. So, I made this my goal and started building.

To complete this project, I would need a unique database. To start with, it needs to support large volumes of transactions and analytics at the same time on the same data. This is known in the database community as “HTAP” (Hybrid transactional/analytical processing). Next, it must be able to handle a highly concurrent read and write workload to support thousands of clients. Finally, the database needs to be extensible with custom logic to run the custom AI powering each spaceship.

Luckily for me, SingleStoreDB 7.9 satisfies all of these requirements and more. A bit of elbow grease and some long nights later, I put the finishing touches on what I call the Wasm Space Program.

Simulation Walkthrough

The Wasm Space Program is available online, so I suggest opening it up in another tab to follow along while reading the rest of this blog post. You can access it here: Wasm Space Program

After clicking to enter, you will see a screen full of solar systems.

Each of the orange/yellow circles on your screen is a unique solar system. You can expand the “Universe Stats” view in the top-left corner to see how many solar systems there are, as well as other details about the simulation. Let’s click on a solar system to warp to it and see what is inside.

Within each solar system there are spaceships and energy nodes. Energy nodes look like asteroids covered in green crystals. You can think of energy nodes like phone chargers — spaceships need to sit on top of them to absorb energy. Since spaceships need energy to survive, energy nodes are a critical resource in the universe.

You will also see spaceships moving around. Each spaceship is driven by a program written in Rust which implements that spaceship’s behavior. There are a number of different behaviors so far, ranging in aggression and movement strategies. Spaceships can choose to fight by occupying the same square as another ship. This will lead to combat, which results in the losing side being destroyed — leaving all of its remaining energy on the board for the victor to absorb.

I encourage you to click on different entities and inspect their statistics in the info panel which pops up. You can also open up the information screen by clicking “Information” in the top left corner.

So, how does this work?

The Wasm Space Program takes advantage of three major features in SingleStore DB 7.9 and SingleStore Cloud: Code Engine — Powered by Wasm, Workspaces and the Data API.

Code Engine — Powered by Wasm

To avoid needing a separate backend service to run each spaceship’s AI, we use SingleStoreDB Code Engine. This feature supports creating functions using code compiled to WebAssembly (Wasm). In the case of the Wasm Space Program, each spaceship’s AI is written in Rust.

This allows me to take advantage of the developer ergonomics and library ecosystem of Rust, while ending up with a function that I can run directly within my SQL queries. In addition to powering the spaceships, I also am using Wasm for other utility functions which are easier to write in Rust than in SQL. All of the Rust code used in this simulation is available here.

See more: Code Engine — Powered by Wasm

Workspaces

In SingleStore 7.9, we have released the ability for multiple compute clusters (called Workspaces) to share access to the same database. One of the workspaces can read and write, while the rest of the workspaces can only read from the database. This feature is extremely useful when you want to isolate portions of your workload from one another.

For the Wasm Space Program, it’s extremely important that each turn takes less than one second to ensure the game runs in real time. To achieve this, we run the write workload in a dedicated workspace. Meanwhile, we need to power thousands of game clients running read queries against the universe — so we use multiple read only workspaces to ensure we can handle any scale.

See more: Workspaces

Data API

The final piece of the puzzle is how to connect directly from the game client running in the browser (Javascript) to SingleStoreDB. Historically, SingleStoreDB only supported the MySQL protocol which sadly can not be used directly from a browser. So, you would need to build a backend API service to act as the middleman between browser and SingleStore. This added complexity to application architecture as well as a lot of boilerplate code which didn’t provide much business value.

Well, I’m happy to say that the Data API provides a solution to this problem. With the Data API, you can connect directly from the browser to SingleStore over normal HTTP(s) connections. In addition, query results come back in JSON which makes them trivially easy and fast to handle in Javascript.

The Wasm Space Program uses the Data API to power the game client. Everything you see on the screen is the result of running SQL queries directly from your browser to SingleStore with no backend service or API in the middle. This results in performance improvements, development agility and deployment simplicity for the application.

See more: Data API

Looking forward

I have worked at SingleStore for over nine years and can’t emphasize enough how revolutionary SingleStore DB version 7.9 is. The ability to extend the database with custom logic and run low latency transactional and analytical workloads on the same storage completely changes how I think about building applications.

The Wasm Space Program will be undergoing some changes in the upcoming weeks, culminating in an open competition to see who can develop the best spaceship AI. Stay tuned for rules and instructions on how to join in on this experience.

In the meantime, please check out the code on Github and try running it on your machine. You can even build your own spaceship behavior by adding strategies here. Take a look at how the other strategies are organized and injected into SingleStore by searching for their names throughout the codebase. If you encounter any issues or want to send a pull request, all contributions are welcome.

Thanks for playing along, and have a fantastic day!

Missed [r]evolution 2022? Check out all of the sessions on demand.

Feed: SingleStore Blog.
Author: .

SaaS innovator Foodics revolutionizes the way companies and restaurants handle real-time reporting, inventory management, order submission and employee operations.

Headquartered in Riyadh and available across 17 countries, Foodics is growing rapidly around the world — meaning their need to process data at faster speeds and deliver real-time insights is growing, too. Using their application dashboard, users can quickly glean insights on how their business is performing. This performance can be analyzed per branch, payment methods, products and more. And, users can easily apply filters, group different metrics together and change analysis dimensions to find the exact metrics they’re looking for.

We sat down with Foodics to hear more about their database journey in our latest webinar, “Supercharging SaaS Applications: The Foodics Story.”

Foodics’ Journey to SingleStoreDB

As Foodics grew, so did its user base — effectively moving the company into a new SaaS category: they were now dealing with a data-intensive application. Unfortunately, Foodics’ data architecture at the time wasn’t able to meet those requirements.

“Like any tech company, we started with MySQL,” said Mohammed Radwan, head of engineering at Foodics. “We started with MySQL because it was compatible with what we had, and was easy to use. The thing is, with MySQL, it lacked analytics capabilities. It’s more of a transactional database, not for analytics. It did fill its purpose for a while, but then we needed to grow and expand — and that’s why we chose PostgreSQL.”

It’s a journey not unfamiliar to many SaaS companies and app providers — starting with single-node, open-source databases to get their platform off the ground. Yet as Foodics soon discovered, their existing service provider came with a series of challenges that directly impacted their end users, including:

The inability to scale up and scale out
Ongoing service instability
A lack of data integrity
Constant re-balancing of data
Low concurrency that only supported 200 users
A lack of AWS ecosystem support

Foodics knew to effectively and efficiently accelerate their SaaS app performance, they needed a database that eliminated these challenges while setting them up for future success.

“When we faced these things we decided, okay, it’s time to move on,” says Radwan. “We need to look for something that is more reliable, that allows us to grow and that we can rely on for the next five, 10 or 20 years moving forward.”

That realization led the company to SingleStoreDB.

eBook: A 5-Step Guide to Supercharging Your SaaS Apps

From Subpar to Supercharged

“We started asking ourselves some questions: what do we need?” says Radwan. Foodics’ list of database criteria included:

Placing all analytics-related data into a single, unified data store.
A performant analytics engine (with Columnstore) to democratize data access
Real-time and near real-time analytics
High concurrency to support a large number of reports being generated simultaneously
A fully managed database with low TCO that freed up engineering teams
A scalable solution that supported ongoing growth
High availability with little-to-no downtime

Foodics’ search led them to an article written by Jack Ellis, co-founder at Fathom Analytics, detailing how SingleStoreDB had replaced MySQL, Redis and DynamoDB for the website analytics company. SingleStoreDB checked the boxes for what Foodics wanted in a database — however, the engineering team needed to run their own stress tests: “Once we settled and once we agreed on how we’re going to do things, we started some POCs — proof of concepts. We wanted to make sure that what we see or what we hear is real, because it was almost impossible to be real,” says Radwan.

Radwan and the Foodics engineering team started pushing SingleStoreDB to the limits with stress-testing that included:

Creating around 10 billion rows
Using complex queries (conditional aggregation, sub-queries, etc.)
Inputting wide date ranges
Simulating concurrent reads and writes
Simulating heavy loads with concurrent connections

For Foodics, the rest is history. SingleStoreDB proved its ability to stand up to growth, data size and ingest requirements, truly cementing its position as the only database with the speed and scale built for data-intensive applications.

“What was the impact for all of that? First of all, what we’re focused on is the customer satisfaction,” says Radwan. “So you can imagine if you have a customer complaining all the time about reports…eventually they’ll get tired and churn. So this is something we were looking not to have.”

“Thankfully, we were able to improve the performance of our queries and performance of our reports by more than 60% — some reports were executed within the range of 400 milliseconds, now they’re executed within 60 milliseconds.”

“Now, we’re able to scale without worrying, knowing that we will have more concurrent users because we get more customers every day. And the offer we got from SingleStore was great — for that, we have 10x the power, performance and cost efficiencies.”

Get the Full Foodics Story

To hear more about how Foodics powers their business with SingleStoreDB, check out our latest webinar, “Supercharging SaaS Applications: The Foodics Story “

Feed: SingleStore Blog.
Author: .

SingleStoreDB Cloud now supports user-defined functions written in C, C++ and Rust with our new Code Engine — Powered by Wasm.

Application developers can now use libraries of existing code, or code from their applications, in database functions and call them from SQL. That can tremendously reduce the need to move data from the database to the application to analyze it, speeding things up and simplifying development. We illustrate the power of Code Engine for Wasm with a story about a developer who finds a new way to create an important sentiment analysis report — easier and faster.

You’re an application developer for an eCommerce company, and you’ve been asked several times to produce reports that do sentiment analysis on the review comments for products your company has sold. It is getting mighty tedious because your sentiment analysis code lives in a Rust program, so every report requires you to export data to the app and score it there. Then, you have to write query processing-style logic in your app to create the report. It’s honestly kind of fun, but you — and your boss — think your time would be better spent elsewhere.

That’s when you learn from your DBA that your database, SingleStoreDB, has a new feature called Code Engine — Powered by Wasm that lets you extend the database engine with user-defined functions (UDFs) in C, C++ or Rust, and other languages soon. It dawns on you that you can take the sentiment analysis code from your application and move it into the database as a Wasm UDF. That means those reports people have been asking for can be created using a pretty straightforward SQL SELECT statement. Better yet, you can teach the analysts to write the queries themselves. Then, you don’t even need to hear about it!

We’ll hear more of this story later!

What Is Wasm?

Wasm is short for WebAssembly. It’s a machine-independent instruction format and compilation target. Key points to remember about Wasm are:

It supports many languages. Backends exist for a set of different source languages that can generate compiled .wasm files, which are like dynamic link libraries (DLLs) containing Wasm instructions. Wasm was created to enable developers to write code that can run in a web browser, using just about any language, not just the pervasive JavaScript language. This lets them build on libraries of existing code in those other languages.
It’s fast. Compiled .wasm files can run 30x or more faster than JavaScript. In some cases, they can be within 10% of the speed of C code compiled to native machine instructions.
It’s safe. To make this happen, the Wasm community has mastered the challenge of allowing code to run safely in the browser using robust sandboxing. In the browser environment, the compiled code (whether C, C++, Rust, Go or some other language) cannot escape the sandbox, period. No starting process. No opening files. No writing network messages. No system calls of any kind.

The Wasm world began in the browser. But it has spread to the server! Wasm runtimes like Wasmtime can now be embedded in many kinds of apps, not just browsers. SingleStoreDB embeds a Wasm runtime environment, enabling you to write UDFs in C, C++ Rust and (soon) more languages.

Benefits of Wasm Extensibility in SingleStoreDB

At the beginning of our story, our application developer suffered from having to export data to the application to get a sentiment score for it (slow), and the need to write application code to do query-processing-like things (labor intensive). SingleStoreDB’s Code Engine — Powered by Wasm makes these sore points go away.

Benefits of Wasm extensibility include:

Faster performance by moving the computation to the data, instead of moving data to the applications.
It’s not necessary to write your own query processing-like logic in the application; you can rely on the SingleStoreDB SQL processor to do it for you.

Sentiment Analysis in SQL With Wasm

Let’s return to our developer tasked with creating sentiment analysis reports. The first report asks to find the five comments with the highest sentiment score. Let’s suppose this is the schema:

create database demo;
use demo;

create table products(id int, name varchar(70), category varchar(70));
create table comments(id int, ts datetime, user varchar(30), pid int, comment_text text);

And this is some sample data:

insert into products values
  (1,"running shoe","sporting goods"),
  (2,"soccer ball","sporting goods"),
  (3,"cotton balls","cosmetics");

insert into comments values
  (1, "2022-06-25 22:11:25", "joe",1,"fantastic shoe"),
  (2, "2022-06-25 22:58:01", "sue",2,"ball has poor bounce"),
  (3, "2022-06-25 22:59:00", "amy",2,"amazingly durable ball, and it looks great"),
  (4, "2022-06-25 23:05:10","mila",3,"cotton balls are nice and fluffy -- love them!"),
  (5, "2022-06-25 23:06:37","joao",3,"cotton balls were not fluffy; I don't like this brand");

Our hero realizes that if there was a function “sentiment()”, then they could write a simple SQL query to get the report data — instead of writing reams of code to pull the comments into the app and implement a non-trivial top-five calculation. That looks something like this:

select id, comment_text, sentiment(comment_text) as s
from comments 
order by s desc 
limit 5;

And gets these results:

+------+-------------------------------------------------------+---------------------+
| id   | comment_text                                          | s                   |
+------+-------------------------------------------------------+---------------------+
|    4 | cotton balls are nice and fluffy -- love them!        |  0.8069730414548824 |
|    3 | amazingly durable ball, and it looks great            |  0.6248933269389457 |
|    1 | fantastic shoe                                        |  0.5573704017131537 |
|    5 | cotton balls were not fluffy; I don't like this brand | 0.20746990495811898 |
|    2 | ball has poor bounce                                  | -0.4766576055745744 |
+------+-------------------------------------------------------+---------------------+

This sentiment() function combined with the power of SingleStore Code Engine for Wasm and SingleStoreDB’s distributed and parallel query performance allows this calculation to happen many times faster than if you did it in the app, while also saving development time.

And of course, once you have ability to do sentiment scoring in SQL, you can vary your reports, using conventional SQL structures intermixed with sentiment analysis. For example, if you want to calculate the top three highest-sentiment comments for sporting goods that happened on 2022-06-25, you can do this:

select c.id, c.user, p.name, 
  c.comment_text, sentiment(c.comment_text) as s
from comments c, products p
where p.id = c.pid
and (c.ts :> date) = "2022-06-25" 
and p.category = "sporting goods"
order by s desc
limit 3;

And get this result:

+------+------+--------------+--------------------------------------------+---------------------+
| id   | user | name         | comment_text                               | s                   |
+------+------+--------------+--------------------------------------------+---------------------+
|    3 | amy  | soccer ball  | amazingly durable ball, and it looks great |  0.6248933269389457 |
|    1 | joe  | running shoe | fantastic shoe                             |  0.5573704017131537 |
|    2 | sue  | soccer ball  | ball has poor bounce                       | -0.4766576055745744 |
+------+------+--------------+--------------------------------------------+---------------------+

See the appendix for source code for the sentiment function, and how to create a Wasm UDF for it in the database.

Customer Validation

Our customers are excited about the possibilities of running application functions in SQL to get more value from their data more easily. Abel Mascarenhas, IT Unit Manager at Millennium BCP, the largest bank in Portugal, says:

“The Code Engine for Wasm in SingleStoreDB is a catalyst for extracting value from our data faster and cheaper by leveraging our enterprise code base in real-time SQL.”

Another one of our financial customers has a trading application that stores 4K-byte packets, which are messages sent by their application, in BINARY fields of records in SingleStoreDB. They’d like to implement a function to simply convert the binary packet to text so they can pattern match against it with LIKE filters, allowing people to read the contents of the field more easily in the output of queries.

Summary

The real beauty of SingleStoreDB Code Engine — Powered by Wasm is that it brings the power of existing code, whether from libraries or your applications, into SQL. That allows you to apply the logic of this code in a parallel and distributed fashion, close to the data and benefit from all the power of SQL. What logic will you move close to your data?

Experience Code Engine — Powered by Wasm for yourself. Try SingleStoreDB free.

Appendix: Code for Sentiment Function

Source Code

The source code for the sentiment analysis UDF is as follows:

/* file examples/sentimentudf/src/lib.rs */
wit_bindgen_rust::export!("sentiment.wit");
struct Sentiment;
impl sentiment::Sentiment for Sentiment {

    fn sentiment(input: String) -> f64 {
        lazy_static::lazy_static! {
            static ref ANALYZER: 
     vader_sentiment::SentimentIntensityAnalyzer<'static> =
                 vader_sentiment::SentimentIntensityAnalyzer::new();
        }

        let scores = ANALYZER.polarity_scores(input.as_str());
        scores["compound"]
    }
}

WIT File

Create the Wasm Interface Types (WIT) file sentimentudf/sentiment.wit with the following contents:

sentiment: func(input: string) -> float64

Building the Code

To build it, use a Cargo.toml file with the following contents:

# Cargo.toml
[package]
name = "sentiment"
version = "0.1.0"
edition = "2021"

# See more keys and their definitions at https://doc.rust-lang.org/cargo/reference/manifest.html

[dependencies]
wit-bindgen-rust = { git = "https://github.com/bytecodealliance/wit-bindgen.git" }


vader_sentiment = { git = "https://github.com/ckw017/vader-sentiment-rust" }
lazy_static = "1.4.0"

[lib]
crate-type = ["cdylib"]

And run:

cargo wasi build --lib --release

Uploading the Code to Cloud Storage

Then, upload the sentiment.wasm file from target/wasm32-wasi/release/ under your VS Code folder for the sentiment UDF project.

There are many different ways to make your sentiment function accessible to SingleStoreDB Cloud. One way is to upload it to an AWS S3 bucket. As an example you could:

Create a bucket mybucket
Upload the sentiment.wasm file
Then navigate to the properties page for the file
And in the upper right corner, choose “Object Actions”
Then “Share with a presigned URL”
Enter a number of hours to share the file, say 12, and click the button to complete the action
Then, on the upper right corner, choose “Copy presigned URL”
Then paste the URL into a create function statement similar to the one in the script below, between the single quotes after wasm from http
Repeat the above steps, but for the sentiment.wit file, except paste the URL between the single quotes after the wit from http clause

use demo;

create function sentiment as 
wasm from http 'https://mybucket.s3.us-west-2.amazonaws.com/sentiment.wasm?response-content-disposition=inline&X-Amz-Security-Token=IQ<redacted>b7d8'
with wit from http 'https://hansonpublic.s3.us-west-2.amazonaws.com/sentiment.wit?response-content-disposition=inline&X-Amz-Security-Token=IQ<redacted>61ec'
;

Then, run the script. Now, you can run the DDL and SELECT statements from the section Sentiment Analysis in SQL with Wasm.

Sentiment Analysis with a Wasm TVF

For another variation of sentiment analysis that uses a Wasm table-valued function (TVF), a capability still in preview, clone this github project:

https://github.com/singlestore-labs/demo-sentiment-analysis

The TVF version outputs not just an overall score, but instead a single row with four values:

After building and uploading the sentimentable function from this project, you can use it like this:

select c.comment_text, 
  format(s.compound,3) cpd, format(s.positive,3) pos, 
  format(s.negative,3) neg, format(s.neutral,3) ntrl
from comments c, sentimentable(c.comment_text) s;

There’s an implicit lateral join (cross apply) between c and the sentimentable function. Here are the results:

+-------------------------------------------------------+--------+-------+-------+-------+
| comment_text                                          | cpd    | pos   | neg   | ntrl  |
+-------------------------------------------------------+--------+-------+-------+-------+
| fantastic shoe                                        | 0.557  | 0.783 | 0.000 | 0.217 |
| cotton balls were not fluffy; I don't like this brand | 0.207  | 0.185 | 0.000 | 0.815 |
| amazingly durable ball, and it looks great            | 0.625  | 0.406 | 0.000 | 0.594 |
| ball has poor bounce                                  | -0.477 | 0.000 | 0.508 | 0.492 |
| cotton balls are nice and fluffy -- love them!        | 0.807  | 0.510 | 0.000 | 0.490 |
+-------------------------------------------------------+--------+-------+-------+-------+

So you can do sentiment analysis with a single simple score with a UDF if that suits your needs, or get all four components of sentiment analysis with a TVF if you need them.

Feed: SingleStore Blog.
Author: .

Life moves in real time, and the modern world expects real-time insights and action.

The foundation for making real-time real is your choice of database. The more complex the data plumbing, the longer it takes for you to ingest, analyze and act on data.

During our [r]evolution 2022 event that we recently hosted, we talked about how SingleStoreDB was built ground-up for real-time use cases, explained what it means to be real time, detailed innovations that enable data evolution, referenced research highlighting the performance and savings evolution benefits that are now possible, noted why businesses need (and some already have) a database for the data-intensive era and outlined the best path forward to achieving true real-time operations.

All the sessions are now available to view on demand at your own pace, but I thought it might be helpful to showcase five key takeaways from [r]evolution 2022.

One: It’s time to get real about real time.

As SingleStore CEO Raj Verma explained, there is a lot of noise going around in our market about real time. Raj helped to clear the air so that people can more easily differentiate real, real-time database companies from businesses that talk the real-time talk but cannot walk the walk.

Shireesh Thota, SingleStore’s Senior Vice President of Engineering, explained that the modern data stack is broken. The complexity of these transformations, he said, lead customers to build a lot of siloed enterprise architectures, and what ends up happening is that the time to insight is much longer. So there’s a need to shorten the time between the moment a transactional piece of data enters the system and when the business can gain insights.

Real time needs fast technology. But what makes a database fast is not really query optimization, as Raj noted — it’s the storage architecture. Our three-tier storage architecture — memory, disk and object store — empowers customers with response times within milliseconds. Our unification of different data models (relational, key-value, time-series, geospatial, full-text search, document, etc.) allows customers to do most work using a single platform rather than having to stitch together databases. And SingleStore pioneered the unification of transactions and analytics in one engine enabling ultra-fast ingest and time to insights. All that provides SingleStore customers with speed, scale, savings, resilience and reliability that are beyond compare.

Two: SingleStore continues to innovate.

Despite our leadership position, we are not resting on our laurels. SingleStore continues to innovate. At [r]evolution 2022, we detailed our expanded SingleStoreDB capabilities, which are based on three pillars: enterprise scale, developer experience and real-time analytics.

Workspaces is among the recently released SingleStoreDB features. It enables enterprise scale by allowing our customers to build isolated compute workspace environments that can be attached to the same database on demand. That means a company could, for example, build BI dashboards, point-in-time sales and other applications all on the same data — without having to move the data and without introducing latency or adding cost.

Developer experience is also near and dear to our hearts, and we understand that WebAssembly (Wasm) is popular for building highly efficient applications and has amazing characteristics such as portability and safety. So, we have deeply integrated Wasm into our Code Engine. Now developers don’t have to move data and worry about latency and other challenges that data movement creates.

In our latest real-time analytics innovations, we introduced flexible parallelism, which enables queries to take advantage of all available hardware and run on all cores. This is very useful for complex queries. Some customers have benefited from performance increases approaching 4X from this feature. Our Power BI Connector is now certified by Microsoft, so SingleStoreDB users don’t need any third-party extensions. And our dbt adapters help developers to build workflow environments.

Three: A unified database improves performance and yields savings.

Not only does a unified database enable real-time speed, but it also delivers major savings.

You don’t have to take our word for it, as Raj said. Just look at the stellar results that SingleStore achieved in the recent performance and total cost of ownership benchmark study by GigaOm.

The independent report by the respected tech analyst firm found that SingleStoreDB is 50% more cost-effective over three years than the Snowflake-MySQL stack. GigaOm said that organizations can achieve 60% savings over three years using SingleStoreDB as opposed to a Redshift-PostgreSQL stack. And GigaOm said that we are up to 100% faster than Redshift for TPC-H workloads. What’s more, SingleStoreDB is one of the only databases that can do TPC-H and TPC-DS competitively with cloud data warehouses and TPC-C benchmark in a very reasonable scale.

Four: Data intensity is here to stay — you can succumb to it or use it to thrive.

Almost all applications depend one way or the other on data, and the data requirements of applications are becoming more and more rigorous. As SingleStore Chief Innovation Officer Oliver Schabenberger explained during [r]evolution 2022, that’s a sign of data intensity.

Data intensity depends on many factors, including (1) data size, (2) speed of ingestion, (3) latency requirements, (4) complexity and (5) concurrency. Such factors lead many companies to add one database after another to address these considerations and add functionality. Unfortunately, that leads to highly complex systems, based on a multitude of technologies, that are difficult to maintain and nearly impossible to scale. But there is a better way.

SingleStore’s general-purpose, cloud-native database for transactional workloads that has multi-model capabilities and is built for speed allows organizations to address data intensity without complexity — and that is a measure of digital maturity. Leading companies such as Uber, Comcast and ARMIS use SingleStore to solve data-intensive problems every day.

Visit SingleStore.com and use our data-intensity calculator to get your data intensity score, based on the five dimensions we’ve noted.

Five: Leading companies are evolving and succeeding — and you can, too.

SingleStore helps businesses thrive in today’s data-intensive, real-time environment. Just look at what SingleStore customers and [r]evolution speakers Siemens and Impact have achieved:

“With the stack we were in before, as much as we tried to optimize data, structures and SQL itself, we couldn’t get report run times to go under two seconds,” said Mauricio Aristizabal, principal data architect at Impact, “whereas with SingleStore we’ve been able to address all the different reporting needs in sub-second times. So, that was the main requirement – to be able to deliver a positive human experience and delight our customers. Now that we’ve adopted SingleStore as our full data warehouse, we are able to serve all other workloads. Being able to have data in one place and serve all kinds of different tenants and workloads is just wonderful.”

Christoph Malassa, head of analytics and intelligence solutions at Siemens, said this about his SingleStore experience: “We now have the ability to do analytics on data volumes that we couldn’t do in the past. We are now doing more and more advanced analytics live. The analytics that you can now do give you way more insight. This is tremendous for the business. And the process improvements that SingleStore enables makes it a lot easier for us to innovate faster.”

Uber is another leading company that uses SingleStore to solve data-intensive problems every day. It employs SingleStoreDB to get real-time analytics on riders and drivers 24/7.

It makes me immensely proud that, in our own small way, SingleStore is helping deliver great customer experiences and timely insights that improve people’s lives.

See if SingleStoreDB is right for you by trying it for free. Check out our customer reviews on Gartner Peer Insights and TrustRadius. And of course, explore the SingleStore [r]evolution sessions on demand here.

Feed: SingleStore Blog.
Author: .

Learn more about these open-source databases including advantages, disadvantages and uses.

PostgreSQL and MySQL are well-known and often-used relational database managers worldwide. They provide 24/7 support and are considered very stable for the services they offer. They’re designed to create scalable databases and perform everything related to stored procedures, functions, triggers and automatic jobs, among others.

PostgreSQL vs. MySQL: What Are Their Histories?

MySQL was developed by a Swedish company MySQL AB in 1995. The platform developers were Michael Widenius, David Axmark and Allan Larsson. Their idea was to create an efficient platform to handle data for businesses — as well as normal, everyday people who needed to handle extensive data.

MySQL was initially developed by mixing code between C and C++ but over time everything was migrated to MySQL Server and MySQL Cluster, using only the C++ Language.

Web programmers use MySQL to make changes to websites in a simple way. Since it allows for quick and easy changes, programmers also don’t have to modify the web code. When using MySQL with PHP it becomes a powerful tool to carry out applications that require the use of a fast, secure and powerful database. If we use, for example, WordPress, the database used is MySQL — and all its processes are managed through the said database.

The history of PostgreSQL begins in 1986 with a project by Professor Michael Stonebraker and a team of developers from the University of Berkeley (California). The product’s original name was POSTGRES.

In 1996 the name changed to PostgreSQL, returning to the original sequence of versions, thus releasing the new version 6.0. The latest version, 14.0, was released in 2021 and is stable.

An interesting feature of PostgreSQL is multi-version concurrency control, or MVCC. This method adds an image of the database state to each transaction. This allows us to make consistent transactions, offering us great performance advantages.

PostgreSQL vs. MySQL: What Are the Advantages and Disadvantages?

My SQL Advantages

MySQL software is Open Source.
Speed (at low scales).
Thanks to its low consumption of resources, it is possible to use it on machines with little performance, which improves the cost of its use.
Easy installation and configuration.

MySQL Disadvantages

Little documentation of all the services you can offer.
When the database structure must be modified, there may be slight errors.
It is not intuitive, like other programs like ACCESS for example.

PostgreSQL Advantages

Stability: Postgres is categorized as one of the most stable database managers worldwide.
Power and Robustness: It is designed in such a way that it does not allow interference between each of the parallel processes that we can execute.
Extensibility: It can be used by the vast majority of programming languages since a connection form has been created for all of them.

PostgreSQL Disadvantages

For small databases, it is not recommended as it is slow for inserts.
Official support: It does not have quick access support.
Learning the language and using it and maintaining it because of its complexity can be difficult.

SingleStoreDB

SingleStoreDB is the world’s fastest database for data-intensive applications and real-time analytics. SingleStoreDB unifies transactions and analytics in a single engine to drive low-latency access to large datasets, simplifying the development of fast, modern enterprise applications.

Built for developers and architects, SingleStoreDB is based on a distributed SQL architecture, delivering 10-100 millisecond performance on complex queries — all while ensuring your business can effortlessly scale.

SingleStoreDB is MySQL wire compatible and offers the familiar syntax of SQL, but is based on modern underlying technology that allows infinitely higher speed and scale versus MySQL. This one of the many reasons that SingleStore is the #1 top-rated relational database on TrustRadius.

For more information on how SingleStore is related and can turbocharge your open-source databases, visit our pages on MySQL or PostgreSQL.

Additional Resources:

Try SingleStoreDB free.

Feed: SingleStore Blog.
Author: .

MySQL Error Command “error 1016 can’t open file” — What is this error? How do you typically resolve it?

The error describes a situation where access to a required file is blocked. This can be the result of either dump (backup) was not created properly or read/write access to the output location.

Here is the error format, returning the the appropriate error number:

Error: 1016 SQLSTATE: HY000 (ER_CANT_OPEN_FILE)

Message: Can’t open file: ‘%s’ (errno: %d)

MySQL Error 1016 — What Are the Causes?

This error is caused by following:

Permissions access on required file, this could be dump files created as backups. The error is thrown when restricted dumps are accessed.
Read/write access blocks inside the installation location. This can occur if the server loses access to the installation location or if the database files. This can also lead to the server crash.

MySQL Error 1016 — Solutions

The following are possible solutions to the problem based on each situation:

If the issue arises from backup/dump files then one solution is to lock tables in the dump files, as the created files can be corrupted due to the excessive table volumes.
Check version compatibility.
Also check the following permissions:
1. Permission for directory where backup/dump files are placed.
2. MySQL user access to the directory and files for backup/dump
If all permissions are valid, check the MySQL server status. If it’s not functional, then check error logs.

Conclusion

The MySQL error 1016 arises from flawed maintenance of resources, and this can also be due to lack of maintenance scripts which entail operations to check the health of the system. Upgrades or patch installations can cause this issue. To avoid serious issues and loss of data, proper precautions should be taken in addition to astute configuration.

SingleStoreDB

Built for developers and architects, SingleStoreDBdelivers 10-100 millisecond performance on complex queries — all while ensuring your business can effortlessly scale.

For more information on how SingleStore is related and can turbocharge your MySQL, visit our MySQL page.

Additional Resources:

Try SingleStoreDB free.

Feed: SingleStore Blog.
Author: .

We designed SingleStoreDB to be the world’s fastest distributed SQL database, and currently it powers some of the largest mission-critical applications at enterprises such as Uber, Comcast, Disney and Sirius XM.

These organizations are leveraging ultra-fast ingest, low latency queries, and the ability to support extreme concurrency to drive data-intensive SaaS, telemetry and operational machine learning workloads.

Many of our customers start with SingleStoreDB powering a single workload, but as companies leverage data across the organization to inform decision making and introduce new products and services, the need for shared data grows. Traditionally, companies copy data across various storage solutions and applications, and build a complex web of data and applications resulting in convoluted data pipelines between disparate data solos. This introduces cost, complexity and latency, resulting in critical applications operating on stale data.

Solution: Workspaces

Workspaces is our newest feature, which enables customers to run multiple workloads on isolated compute deployments, while providing ultra low-latency access to shared data. This is possible because of the unique SingleStoreDB architecture, leveraging our native internal data replication engine to ensure applications are always operating on fresh data.

Users can create and terminate workspaces directly using the cloud portal, or through our scalable Management API. Databases are created and attached to one or more workspaces concurrently, allowing simultaneous operation of multiple workloads on shared data. Databases can be attached and detached from workspaces on-the-fly, allowing organizations to manage and meet rapidly changing needs.

Because workspaces are stateless, they can be created and terminated at will, making it easy to run reporting or custom telemetry applications on the fly. When a workspace is terminated it no longer incurs charges, offering simple cost optimization of any workload while ensuring data is retained as long as needed.

Unique Design

SingleStoreDB is the only real-time HTAP database designed on a modern distributed SQL architecture. This means that compute can be scaled out using the native clustered architecture, rather than simply scaling up using larger machines. Workspaces further enhance this distributed architecture by freeing databases from the confines of a single cluster, delivering the true value of separate compute and storage.

Some enterprise data warehouses offer a similar separation of compute and storage, but because they are only designed for analytic workloads, they sacrifice latency to enable this flexibility. This is because writes are forced to go to object storage, which introduces latency and causes queries to return stale data if changes haven’t been propagated completely across the storage stack.

SingleStoreDB is designed to power modern applications, where real-time access to data and low latency query responses are just as important as scalability and concurrency. To meet this need, SingleStoreDB workspaces are designed to provide low latency data access to databases across every workspace deployed within a group (a logical tool for organization of workspaces). Each separate application running on an independent workspace can be scaled up or down, while still ensuring fast access to fresh data.

Use Cases

Impact.com has been running SingleStoreDB to power their customer-facing applications for some time, but now with the introduction of Workspaces they have also moved their reporting and internal analytics workloads to SingleStoreDB, unifying their entire data architecture:

“Workspaces are very exciting for us… we can now simply add and scale workloads across the organization’s most important data!” – Mauricio Aristizabal, Data Architect, Impact.com

Impact found that the customer data being stored in SingleStoreDB was critical for reporting and other operational analytics within the company, but the process of moving this data out of SingleStoreDB and into pure analytics solutions like Cloudera or Snowflake was costly and time consuming. It also introduced latency, which meant that by the time analysts got to the data it was already stale. They wanted a way to run the workloads previously running on Hadoop directly on SingleStoreDB, which is where workspaces came in.

“..when Cloudera’s Impala and Kudu could not keep up with the speed of Impact’s business, SingleStoreDB delivered. SingleStoreDB checks all the boxes with sub-second reporting, low-latency analytics, high concurrency, separation of storage and compute with workspaces, and more — which is why SingleStoreDB is now Impact’s Database for 100% of its data and reporting. In short: All Data. One Platform.”

In addition to scaling out to multiple workloads, workspaces also provides a simple way to manage isolation between ingest and application workloads. Many customers are now using isolated workspaces for ingesting from streaming data sources like Confluent Kafka or Redpanda, and blog posts like Nabil Nawaz’s “loading 100 billion rows ultrafast” show the incredible power of parallel ingest into SingleStoreDB. Now with workspaces, users can isolate ingest workloads and dynamically scale them up or down based on scheduling and performance needs, all while maintaining the consistent query response and strict SLAs needed for customer-facing applications.

More companies than ever are using machine learning to customize content for their customers, or to provide real-time services such as fraud detection. A Tier 1 U.S. Bank is using SingleStoreDB to deliver “On The Swipe Fraud Detection” by operationalizing fraud detection models against customer transactional data. Now with workspaces, these workloads can be run directly on data generated by customer facing applications, without affecting the ability to deliver application uptime or impacting user generated workloads. Workspaces provides complete isolation and independent scalability without requiring complex integration of multiple data sources.

Manage application & operational ML workloads with Workspaces

Design & Architecture

SingleStoreDB’s workspaces are built using the native data replication engine built into our database. If you have read about features such as leaf fanout failover you may already be familiar with the storage engine — but if not, check out Adam Prout’s description of what makes the SingleStoreDB storage engine unique.

Workspaces further this design by creating isolated pools of compute resources, which are clustered on top of cloud hardware. These compute pools have dedicated memory and persistent disk cache to deliver immediate query responsiveness, while operating on top of bulk scale-out object storage.

When combined with SingleStoreDB’s query code generation and tiered Universal Storage architecture, this allows workspaces to deliver extremely low latency query response, highly concurrent access and fast parallel streaming ingest while automating the movement of data across workloads.

Try Workspaces

You can also take a look at Eric Hanson’s demonstration of SingleStoreDB “running a trillion rows per second,” and try it for yourself to see how SingleStoreDB delivers the fastest performance and lowest query latency, deployed on your public cloud of choice.

Feed: SingleStore Blog.
Author: .

Databases without data are pretty pointless. For SingleStoreDB, we’re trying to make the process of loading data into your databases as easy and seamless as possible.

To achieve that, we have been building a dedicated UI where users can load data into their database — from files stored in an S3 bucket — with a handful of clicks.

If you use our help menu, you should select “Load your data” > “Cloud Storage” > “AWS S3” and then the JSON or CSV option. The Load Data page can be found after that, on the first step of the data ingestion (see image 1):

Image 1. First step: Choose which database and table to load data into

So, what do you need to ingest data into SingleStoreDB? Well, the first (and most obvious) is the location of your data. Your data must be in an AWS S3 bucket in the form of JSON or CSV files. In addition, you need to have a SingleStoreDB workspace up and running, and your AWS credentials in the form of an AWS access key/secret config (see image 2).

Your CSV files can have the structure you want — CSV or TSV files, with a custom data format, which you are able to define in the “Connect to Data” step (see image 3).

Image 2. Second step: Setup your AWS credentials

Image 3. Third step: Define the data you want to load, and its structure

After successfully loading the data file, your soon-to-be table will be displayed in the fourth step. On this page, you’re able to rename (or remove) as many columns as you want (see image 4).

Image 4. Fourth step: Example of random data with two different operations

While we plan to support more kinds of operations in the future, these are the two currently available. On the left-hand side of your screen, you can see the changes you’re planning to make to your data when loading it to your table. This section is updated as you add or remove new operations.

Image 5. Last step: SQL queries to ingest your data

When clicking “Next”, the last step is displayed (see image 5). Here, you can review the data information and take a sneak peek at the SQL queries that are going to be run against your database. If you want to, you can copy and use them as base queries for future changes.

After ingesting the data, two queries will be pasted in your SQL Editor. The first will select all the data from the new table, so you can ensure it was properly inserted. The second will retrieve the status of your data pipeline. Pipelines are a SingleStoreDB feature, used to load data from external sources to your database.

This UI is still in preview. As mentioned before, we plan to support other kinds of operations, other than renaming and removal, and support other sources and file types too. Stay tuned for that!

Eager to test it for yourself? Try SingleStoreDB free.

Feed: SingleStore Blog.
Author: .

Getting a MySQL error: out of memory? We break down what this error means, and steps you can take it resolve it.

Error: "101011 8:12:27 [ERROR] mysqld: Out of memory (Needed 219113937 bytes)"

The “out of memory” error is raised by the MySQL server when it encounters a memory shortage. In short, the MySQL server doesn’t have enough buffer and cache memory to perform SQL queries or hold the result sets returned by the SQL queries.

Whenever you set up a MySQL server, it allocates a considerable amount of memory from the host RAM to facilitate reading, joining, and sorting buffers, temporary tables, and client connections. Hence, database administrators should ensure that the above memory areas don’t exceed the available system memory which crashes the MySQL service with the “out of memory” error.

MySQL Error: Out of Memory Solutions

Configure the maximum MySQL server memory usage.

In general, the MySQL server starts on a virtual machine with 512MB of RAM. It utilizes the memory for its caches and buffers such as innodb_buffer_pool, key_buffer, query_cache, sort, read, join and binlog cache, etc. Since every SQL connection keeps separate cache areas for reading, joining and sorting operations, total cache memory will be calculated by multiplying the total cache size by the count of permitted SQL connections. Hence the total MySQL server memory consumption can be calculated as the following:

Total MySQL Memory Consumption = innodb_buffer_pool_size + 
innodb_additional_mem_pool_size+ innodb_log_buffer_size + tmp_table_size +
(max_connections*(sort_buffer_size+read_buffer_size+join_buffer_size+binlog
_cache_size))

All these parameters are defined in the MySQL configuration file as shown in the following figure:

Increasing the innodb_buffer_pool_size

Usually, the cached InnoDB data is stored in the InnoDB buffer pool memory area. Furthermore, it assists in holding multiple rows returned from high-volume read operations — so the size of the InnoDB buffer pool has a big impact on MySQL server performance. The ‘innodb_buffer_pool_size’ system variable is used to define the optimal InnoDB buffer size. It is recommended to use up to 75% of the system memory for the InnoDB buffer pool.

The innodb_buffer_pool_size variable can be set dynamically while the MySQL server is running. In addition, the InnoDB buffer size changes in chunks. It is defined by the ‘innodb_buffer_pool_chunk_size’ system variable. By default, that value is set to 128Mb.

So, the innodb_buffer_pool_size has to be a multiple of innodb_buffer_pool_chunk_size * innodb_buffer_pool_instances. Whenever the defined InnoDB buffer size is not multiple, MySQL will automatically round it to the nearest multiple of innodb_buffer_pool_chunk_size * innodb_buffer_pool_instances.

Let’s set the innodb_buffer_pool_size system variable to 4GB and the number of buffer pool instances to 8.

mysqld --innodb-buffer-pool-size=4G --innodb-buffer-pool-instances=8

We can inspect whether these values have been set properly with a select statement as shown in the following:

SELECT @@innodb_buffer_pool_size/1024/1024/1024

Output:

+-------------------------------------------+
| @@innodb_buffer_pool_size/1024/1024/1024  |
+-------------------------------------------+
|                   4.000000000000  |
+-------------------------------------------+

This guarantees that the MySQL InnoDB buffer pool size will not exceed the available system memory and crash due to the memory shortage.

In the same way, most of the mentioned parameters can be increased or decreased accordingly until the overall MySQL memory consumption value hits below 60% of system RAM. It would be an incremental task where you set some values to each of these system parameters and check what percent of system RAM has been consumed by MySQL service and repeat until it meets the optimal memory consumption.

Allocate a fixed memory for third-party applications, processes from available memory.

In most cases, MySQL is not the only service that consumes system memory. The system might run other applications including backup applications, third-party software, monitoring tools, docker containers, etc., which eat up system memory excessively. Hence, it is important to implement constraints on how much memory each process/application should consume from available memory. Every application should be assigned to a fixed maximum memory that it shouldn’t exceed. This will guarantee that such processes will not eat up the entire server memory and crash the MySQL service.

Optimize the database tables and queries

Databases have a very dynamic nature where the data insertion, update and deletion occur frequently. With the growth of data inside MySQL databases, memory consumption will be increased. Furthermore, the fragments can occur due to the data deletions over time. These fragments altogether might occupy a considerable amount of system memory that is abandoned — and should be monitored in regular intervals and optimized. MySQL memory leaks can negatively affect memory consumption, leading you to a MySQL out of memory error.

These memory leaks might occur due to the complex joins and sorts implemented in the application’s database logic. It will eat up all the available memory and eventually, MySQL will crash. Therefore, database logic needs to be reviewed closely and optimized. In addition, whenever new plugins, libraries and third-party apps are introduced, developers should be careful and alert to the complexity and optimizations of MySQL queries.

Setting user account limits

In some scenarios, clients who connect to the server might abuse its resources. It can be a startup of unwanted applications/processes, and keep them open up forever without terminating or opening up several client connections to the MySQL database for longer periods which is not necessary. These might waste a considerable amount of system memory, and will eventually cause the MySQL error: ‘out of memory’ due to memory shortage.Proper constraints should be implemented against user accounts to avoid this issue.

That might mean restricting access to certain applications, or authorizing users with a minimum set of permissions. These measures will keep your server’s memory consumption in the green area, ensuring enough memory is available to use with MySQL.

Expand the available memory (RAM)

After trying out all the preceding solutions, you may still end up with the MySQL ‘out of memory error. It concludes that your server doesn’t have enough RAM to feed all the processes and applications, even with the normal traffic. Hence, a RAM upgrade is mandatory. All the above solutions are low cost, but an upgrade might be costly.

SingleStoreDB

Built for developers and architects, SingleStoreDB delivers 10-100 millisecond performance on complex queries — all while ensuring your business can effortlessly scale.

SingleStoreDB is MySQL wire compatible and offers the familiar syntax of SQL, but is based on modern underlying technology that allows infinitely higher speed and scale versus MySQL. This is one of the many reasons that SingleStore is the #1 top-rated relational database on TrustRadius.

For more information on how SingleStore is related and can turbocharge your MySQL, visit our MySQL page.

Resources:

Feed: SingleStore Blog.
Author: .

Learn more about these open-source databases including advantages, disadvantages and uses.

MariaDB is an open-source Relational Database Management System (RDBMS). MariaDB provides various additional functionalities compared to MySQL and is gaining popularity due to its flexibility and availability.

MySQL is the legacy (RDBMS) currently developed by Oracle Corporation and remains as one of the most popular database systems to date.

MariaDB vs. MySQL: What Are Their Histories?

Released in 1995 by a Swedish company, MySQL, was initially developed for internal usage. MySQL was based on mSQL, which is a low level language based on ISAM considered inept for usage; a new interface and backward compatibility allowed developers to quickly adapt with MySQL. Oracle Purchased MySQL in 2010.

MariaDB started as a fork of MySQL. Micheal Widenus, together with many of the original developers, dedicated themselves to the new open-source project. They did not agree with Oracle corporation regarding the future of the original product, and instead wished for an architecture similar to Eclipse. Over the years MariaDB has made significant progress in the areas of scalability and reliability, and has emerged as a competitor to MySQL.

MariaDB vs. MySQL: What Are the Advantages and Disadvantages?

MySQL advantages:

Data masking; Any field that is deemed sensitive can be masked efficiently, and appropriate encryption and decryption mechanism exists. These have been found excellent in their performance.
Legacy system; As one of the first popular RDBMS, MySQL supports old implementations to this data for smooth functionality.
Replication support; This allows data to be available at each node in a distributed setup. Data is automatically copied to the destination node to enhance accessibility.
Support for distributed databases; MySQL supports distributive implementation in a Master-Slave configuration. These configurations allow low access times for large datasets.
Query caching; A feature that stores recent access data in a cache, subsequent similar requests are processed with a low turnaround.

MySQL disadvantages:

Lack of standard practices; Using an Oracle foundation instead of following the SQL standard introduced custom functionality, which can lead to complications in data migrations.
Caches; MySQL caches are known to be troublesome, and don’t serve updated, real-time high volatility situations.
Limited support for conditional statements; Conditional statements within queries are very limited, often requiring more effort in structuring CRUD operations.
Low throughput in nested queries; In MySQL, queries with three or more layers have low throughput. This is due to the management of cache and direct access of data inside the architecture.

MariaDB advantages

Configuration ease; MariaDB is easy to setup and configure as the syntax of configuration files is simple and well structured.
Support for distributed databases; MariaDB supports distributive configurations, which out-perform MySQL in throughput.
High compatibility; MariaDB excels in backward compatibility, as well as seamless integration with third party systems to enhance functionality and capacity.
Source code availability; MariaDB is open source, allowing developers to fork and customize as per requirement.
Additional features; MariaDB extended existing features in MySQL to better simplify implementation and extensibility.

MariaDB disadvantages

No data masking support; MariaDB does not support data masking. However, the feature can be included using plugins.
Relatively weak encryption; Data security in MariaDB is weak, and requires developers to take voluntary precautions.
Scalability limits; Within a table there are limits on the number of rows and data size, which makes MariaDB unsuitable for high-volume usage.
Not suited for big data; Due to its lack of ability to handle larger datasets, MariaDB isn’t suited to support big data.
Weak cache; MariaDB has a limited cache which can result in longer throughputs in instances of larger datasets.

MariaDB vs. MySQL: Final Word

MariaDB is suitable for small to medium size implementations, where datasets do not increase for more than one hundred thousand rows per table. For any workloads with larger table sizes, additional configurations are required.

MySQL has good support for larger datasets; however, caching problems require custom configurations that can increase development cost.

SingleStoreDB

SingleStoreDB is a real-time, distributed SQL database that unifies transactions and analytics in a single engine to drive low-latency access to large datasets, simplifying the development of fast, modern enterprise applications. SingleStoreDB provides support for large scale databases with analytics and takes care of most configuration, and also supports various distributions for deployment.

SingleStore is MySQL wire compatible and offers the familiar syntax of SQL, but is based on modern underlying technology that allows infinitely higher speed and scale versus MySQL. This is one of the many reasons SingleStore is the #1, top-rated relational database on TrustRadius.

Resources

Feed: SingleStore Blog.
Author: .

Having fine-grained access to data in the database is critical. As we think about data security in databases, SingleStoreDB supports fine grained access control that can help customers to set the right level of access to the data.

This is achieved through a fully functional Role based Access Control (RBAC) mechanism which ensures that only the right level of privileges are given to each user to access the right database objects. However, how can users bring in additional security to data within a table itself — thereby ensuring more granularity of control to users?

One key capability SingleStoreDB supports is Row-Level Security (RLS), a feature that’s been included since SingleStoreDB 7.3. This provides the ability to ensure that the specific users/groups are able to see only specific segments of the tables based on the roles they assume. By default, any user/group who has access privilege to a table essentially has access to all rows in it based on the SQL privilege system and all rows are available for manipulation.

With Row-Level Security we can prevent the exact rows that users can manipulate. It is a very strong capability to implement data security — especially for applications where data from multiple tenants are stored in a common table definition.

Use Cases

Here are some examples where Row-Level Security is very critical and can be used:

A sales representative should only see the rows which are related to customers that they are working with
Isolating data across different users’ segments in a multi-tenant application. This could be due to multiple customers’ data being stored in a single table, but isolated from each other from application perspective

Row-Level Security is a role-based access control mechanism for granular level objects within a table. It is flexible, centralized and scalable. The security is based on access permissions defined for each role-row pair in the table as the administrator sets it up. Based on the role-row pairing criteria, the right level of access to the data is maintained.

Benefits

(RLS) benefits an organization by balancing security and governance at scale using a RBAC model. The scalable aspect of the model refers to the fact that it can be dynamically changed at any time to meet the corporate policy of the organization.

Additional benefits include:

Ease of use
Change management— we can easily change the roles and users tied to that role, without having to reapply the change the roles accessing rows of the tables

Now that we know what Row-Level Security (RLS) is and the benefits of using it, let’s see the functionality in action. In this example we have two scenarios:

A global user is the owner of and able to access all sales data, but we want to restrict regional sales people to only view sales data pertaining to their country
We have two additional salespeople — one in Canada and one in the U.S. The Canadian user shouldn’t access U.S. sales data, and the U.S user shouldn’t access Canada sales data.

Schema

We will first create a database, a table that has an access_roles column and then insert the data:

DROP DATABASE if EXISTS row_level_security;
CREATE DATABASE row_level_security;
USE row_level_security;

DROP TABLE IF EXISTS sales_data;
CREATE TABLE sales_data (
tx_date DATETIME,
country VARCHAR(20),
amt REAL,
access_roles VARBINARY(50) DEFAULT "," NOT NULL
);
INSERT INTO sales_data VALUES (now(),'US', 100.11, ',us_sales_role,');
INSERT INTO sales_data VALUES (now(),'CAN',200.22,',canada_sales_role,');
SELECT * FROM sales_data;

See the screenshot here to see it in action:

Note:

The table needs to have the column Access_Roles with data type VARBINARY
During insert operation, the rolenames need to have trailing and leading commas

User Management

Users can access a database and execute their functions and responsibilities through the creation of users, roles and groups, and granting of correct permissions (privileges):

A role can have multiple privileges
A group can have multiple roles
A group can have multiple users
A user can have multiple roles
A user can be assigned to multiple groups
Users inherit the permissions, roles of the groups they are assigned to

Roles

We will create two roles, one for each country, updating access to be country specific for the appropriate role

CREATE ROLE 'us_sales_role';
CREATE ROLE 'canada_sales_role';
SHOW ROLES;

UPDATE sales_data SET ACCESS_ROLES=CONCAT(ACCESS_ROLES, "us_sales_role,")
WHERE country = 'US';
UPDATE sales_data SET ACCESS_ROLES=CONCAT(ACCESS_ROLES, "can_sales_role,")
WHERE country = 'CAN';

View creation

We need to create a view to restrict access on the table that was created earlier.

CREATE VIEW sales_data_view AS SELECT tx_date, country, amt
FROM sales_data WHERE SECURITY_LISTS_INTERSECT(CURRENT_SECURITY_ROLES(),
ACCESS_ROLES);

Groups

We need to create two groups, one for each region.

CREATE GROUP 'us_sales_group';
CREATE GROUP 'canada_sales_group';
SHOW GROUPS;

Now, assign the role to the appropriate group and grant access.

GRANT ROLE 'us_sales_role' to 'us_sales_group';
GRANT ROLE 'canada_sales_role' to 'canada_sales_group';
GRANT SELECT ON row_level_security.sales_data_view to ROLE 
'canada_sales_role';
GRANT SELECT ON row_level_security.sales_data_view to ROLE 'us_sales_group';

Connect to SingleStoreDB using CLI/Dbeaver/MySQLWorkbench and verify access.

SELECT * FROM sales_data_view ;

Now, our U.S-based user can log in, and won’t see Canada-specific data.

Similarly, our Canada-based user will log in and not see U.S. data.

Conclusion

Row-Level Security extends our RBAC model to give fine-grained access to users, and helps build solutions for real-world business applications. You can read more about the functionality in our documentation guide. If you are ready to hit the ‘easy’ button, get started with SingleStoreDB today and take advantage of the free product credits.

Keep up with the latest tech updates from SingleStoreDB. Follow us on Twitter @SingleStoreDevs.

Feed: SingleStore Blog.
Author: .

Ilove notebooks as I can easily emulate my code, visualize its output and collaborate with my colleagues.

Today, I want to show you a quick Python Jupyter notebook tutorial on how to connect to SingleStoreDB from your local notebook. I also give you some code snippets on how to write into and read from SingleStoreDB. I will finish with a quick visualization code snippet using Plotly.

Intro

Here are a few things to know before starting:

I am using a csv file from the gapminder data that you can download here.
I would recommend using Visual Studio Code or Anaconda.
I uploaded the notebook that can be downloaded here.

You should be familiar with working with a local notebook environment, installing Python library and of course using SingleStoreDB (try it for free).

Get the Right Libraries

The first cell in my notebook is to import the libraries in your notebook environment. If you don’t have these libraries installed, you can go on on pypi.org to install each library on your local machine (Pymysql, pandas, Plotly and SQLalchemy).

import pymysql
import pandas as pd
from pandas.io import sql
from sqlalchemy import Column, VARCHAR
from sqlalchemy.ext.declarative import declarative_base
from sqlalchemy import create_engine
import sqlalchemy
import pandas as pd
import plotly.express as px

Connect to a SingleStoreDB Database

My second cell is to set the variables for my connection:

# I set the username, password and database name as variables
UserName='Your user name'
Password='Your Password'
DatabaseName='Your database'
URL='Your URL'

Where Can You Find the URL?

The URL you need to enter will look like the following:

svc-df0gt59a-af3a-4ty5-9ff2-13b8h81c155-dml.aws-oregon-2.svc.singlestore.com

If you have a workspace, you can access the connection string by clicking on your workspace, and then the Connect button:

The next step is to find the connection string URL:

The third cell in my notebook is to set the connection to the database:

# Creating the database connection
db_connection_str = "mysql+pymysql://"+UserName+ ":" +Password
+"@"+URL+"/"+ DatabaseName
db_connection = create_engine(db_connection_str)

Insert a Dataframe Into a Newly Created Table

If you want to follow the tutorial, you can download the csv file on your local machine in the same folder as your notebook.

First, you need to load the csv file into a dataframe (fourth cell):

df_data = pd.read_csv("gapminder_tidy.csv")
df_data

Second, you can load that dataframe into an existing or new table (it will create the table in SingleStoreDB):

# Insert whole DataFrame into MySQL
df_data.to_sql('gapmindertidy', con = db_connection, if_exists = 'append', 
index = False, chunksize = 1000, dtype ={
 'country': sqlalchemy.types.NVARCHAR(length=255),
 'continent': sqlalchemy.types.NVARCHAR(length=255),
 'year': sqlalchemy.types.INTEGER(),
 'metric': sqlalchemy.types.NVARCHAR(length=255),
 'value': sqlalchemy.types.Float(precision=3, asdecimal=True)})

If you switch to SingleStoreDB portal, you should see the following result for a “Select * from gapmindertidy” command:

Read a Table from SingleStoreDB

Now, you can read the table you just created with the following command:

DataFromDB = pd.read_sql('SELECT * FROM gapmindertidy', con=db_connection)
DataFromDB

You should see the following result:

Visualize Data Using Plotly

Here is a code snippet to do some visualization on this table:

df_gdp_oceania = DataFromDB.query("continent=='Oceania' &
metric=='gdpPercap'").sort_values(['country','year'],ascending = [True,
True])
fig = px.line(df_gdp_oceania, x = "year", y = "value", color = "country")
fig.show()

That code snippet should show the following graph once executed:

Voila, you are all set to do more serious things with SingleStoreDB using Python and notebooks!

Stay tuned on our developer experience announcements in the future by following us on Twitter @SingleStoreDevs.

Try SingleStoreDB free.

Feed: SingleStore Blog.
Author: .

Getting a MySQL error: ‘the table is full’? We break down what this error means, and steps you can take it resolve it.

The MySQL error “the table is full” may occur when you try to create or add a new record to your database, or when you try to modify an existing entry in your table. This error usually means that your table has reached its maximum capacity. There are several reasons why this could be the case, and they all depend on what you’re doing with your MySQL database — and how many records you already have.

The “table is full” MySQL error is one of the most frustrating errors you can encounter. Still, by following our steps correctly, you can eliminate the chance of this error happening in your database.

Here are a couple of possible reasons for this error:

The disk reaches the maximum limit.

Your disk may be reaching its maximum limit. This can happen if you’re constantly adding new data and not deleting any old data. You’ll need to either delete some of your data or increase your disk size to fix this.
You may have too many columns in your table. Each column takes up a certain amount of space, so if you have too many columns, it can cause the table to fill up. Even small changes can add up quickly for an extensive database with many rows and make the database full.

The disk is full of large files.

The most common reason for the MySQL ‘table is full’ error is that your disk is full. This can happen if you’re running many queries that are writing data to disk or if you have a lot of large files that are taking up space. If you’re getting this error, it’s essential to check your disk usage and make sure you have enough space available.

MySQL Error: The Table is Full Solution

The solution to this error is pretty straightforward. You need to change the max value for the key ‘innodb_data_file_path’ to a larger value and then save the config file. After this, restart your MySQL dB.

innodb_data_file_path=ibdata1:25M:autoextend:max:512M

You can also try auto-extending the size without specifying the maximum size. Doing so allows the InnoDB table to increase the size until the disk is full. Put this line in your config file, then save the file and restart your database:

sudo service mysql stop
sudo service mysql start

After applying these methods, try to connect your data with the database table again; it should work this time.

If you are using the MyISAM engine for your table, then MySQL allows each table of MyISAM to grow 256TB by default. You can increase the size up to 65,536TB.

Following these steps should help you resolve the MySQL error: ‘table is full.’ For more MySQL solutions, visit our documentation.

Happy coding!

SingleStoreDB

Built for developers and architects, SingleStoreDB delivers 10-100 millisecond performance on complex queries — all while ensuring your business can effortlessly scale.

SingleStoreDB is MySQL wire compatible and offers the familiar syntax of SQL, but is based on modern underlying technology that allows infinitely higher speed and scale versus MySQL. This is one of the many reasons that SingleStore is the #1 top-rated relational database on TrustRadius.

For more information on how SingleStore is related and can turbocharge your open-source databases, visit our pages on MySQL.

Additional resources:

Feed: SingleStore Blog.
Author: .

Getting a MySQL error 1064? Here’s what the error means, causes and solutions.

MySQL error 1064 refers to a general syntax error, which can vary between mistakes in MySQL commands to utilizing an unsupported format. The official documentation lists the error in a following manner:

Error: 1064 SQLSTATE: 42000 (ER_PARSE_ERROR)

Message: %s near ‘%s’ at line %d

Where, SQLSTATE: 42000 refers to: Class Code 42: Syntax Error or Access Rule Violation

MySQL Error 1064 — What Are the Causes?

There are a number of cases which may lead up to this error. Here are the most common scenarios:

Syntax error. These are caused by simplistic mistakes — like missing quotes, brackets, misspelled column names and forgetting keywords.
Version conflicts. A particular MySQL version may require following specific syntax structure. However, this is rare, and different configurations can also cause this problem.
Encoding. If the database was created using a specific encoding scheme, any statement that doesn’t align with to the encoding scheme constraints will be considered illegal, and could result in the MySQL error 1064.
Library issues. MySQL is generally masked by middleware interfaces such as PHP, Node or Django. These interfaces often utilize third-party libraries as a database interface. If any library is not following standard practices, MySQL might throw the 1064 error in certain cases.
Reserved word. MySQL, like other programming languages, have reserved words — which if used inside a query will be illegal and cause error 1064.
Escape characters. When passing data through queries (especially in batches), on occasion escape characters, coma, quotes, slashes and spaces are also passed, without properly coding for those instances. MySQL will consider those as syntax violations.

MySQL Error 1064 — Solutions

Here are some possible solutions for MySQL error 1064:

Prepared statements. By using prepared statements, most MySQL errors can be avoided (including error 1064).
Code for exceptions. When laying out the foundation of your database interface, plan for all possible exceptions.
Follow best practices. Follow the best practices listed in official documentation.

MySQL Error 1064 — Final Word

Error 1064 can be serious, leading to lengthy down times for your web application. It’s best to be prepared with solutions, rather than reactively trying to fix the problem. By following best practices, you can work to avoid this error.

SingleStoreDB

Additional resources:

Feed: SingleStore Blog.
Author: .

The debate between the Git Rebase and Git Merge workflows is a long-lasting and heated one, and at SingleStore we use the rebase workflow (my favorite method!)

In this blog post, I want to share some of our experiences with using GitLab Code Review and GitLab CI together to iterate on the SingleStoreDB Cloud platform with the rebase workflow. This repository has 10-40 commits per day, with engineers working across many different time zones on different components (frontend UIs, backend services, Kubernetes operators, scripts, infrastructure, etc).

We use GitLab’s CI/CD system for linting, building, testing and automated deployments (and a few more odd things). Our main repository consists of 160 CI jobs, most of which are test jobs configured with matrix jobs that run in parallel. The pipeline execution time can range from 25 minutes to 60 minutes, depending on the code that was changed (and some other external factors).

One thing to note before we dive in is that we used Phabricator for several years before we switched to GitLab around 6 months ago.

Rebase Workflow and Code Review

GitLab can be configured to use the rebase workflow quite easily:

Set the “Merge method” to “Fast-forward” merge in the Merge Requests configuration
Set “Squash commits when merging” to “Require”

When you do this, GitLab will force you to have your side branch rebased against the main branch before allowing you to Merge. Merging a code change then becomes a two-step operation if your branch is not rebased against main. There’s actually a ticket open in GitLab’s backlog to make this operation one-click. (The rebase operation can also be performed with the git CLI).

Now, GitLab can be configured to block people from merging their changes to the main branch if the latest pipeline hasn’t passed by enabling “Pipelines must succeed” (“Merge requests can’t be merged if the latest pipeline did not succeed or is still running.”). However, this presents a challenge to teams that have pipelines that are even moderate in duration. This is because by the time the pipeline has completed, the branch most likely needs to be rebased again.

Because of this, we’ve had to disable the “Pipelines must succeed” requirement entirely and trust the engineers to only merge if they have a somewhat recently rebased and successful pipeline. This is not ideal, and we’d like to be able to configure a different behavior. Here are some ideas for GitLab, from simpler to more complex:

Having a check that forces the merge request to have had at least one successful pipeline.
Having a check that forces the last successful pipeline in the merge request to be X hours/days old at most.
Merge trains! (more below)

Merge Trains

GitLab has a feature called Merge Trains for flowing code changes to the main branch. Here’s a great description from another blog post about these:

With merge trains, each merge request joins as the last item in that train with each merge request being processed in order. However, instead of queuing and waiting, each item takes the completed state of the previous (pending) merge ref (the merge result of the merge), adds its own changes, and starts the pipeline immediately in parallel under the assumption that everything is going to pass.

If all pipelines in the merge train are completed successfully, then no pipeline time is wasted on queuing or retrying. Pipelines invalidated through failures are immediately canceled, the MR causing the failure is removed, and the rest of the MRs in the train are requeued without the need for manual intervention.

Unfortunately, merge trains do not currently work with the Fast-forward merge flow. If they did, we could push all code changes into the train instead of to the main branch. These commits and their associated pipelines would automatically end up in the main branch if successful, or the MRs would be reopened if unsuccessful. This would be perfect for our team’s way of working.

Rebase Workflow and Commit Messages

One of the main advantages of the rebase workflow is that the commit history in the main branch will be entirely linear. I find it important to make sure that the commit messages follow a certain template. We’ve been able to easily configure our MR summary template in the “Default description template for merge requests” configuration in GitLab. As the commits land in the main branch, the commit message is inherited from the merge request with a link to the merge request. This is perfect and makes studying the history of our repository really easy.

Stacked Merge Requests

Our team is using GitLab after several years using Phabricator (which has been EOL’d). One of the distinguishing features of Phabricator is how easy it is to manage stacked MRs (“diffs”). With GitLab, this is also possible, although it’s a little bit more complicated: we can specify in the merge request that we would like to merge into another branch. This is usually enough to scope the changes in the merge request as we’d like. However, an issue can arise if the author of the merge request and the underlying branch are not the same person.

If the author of the underlying branch prefers not to rebase against the `main` branch as they work, but the merge request author prefers to rebase, the changes in the merge request can become polluted. If the merge request is set to merge into the underlying branch, we will see all the changes from `main` that have yet to be added to the underlying branch. If the merge request is set to merge into `main`, we will see the underlying branch’s changes.

Phabricator’s stacked diffs allowed us to open a diff comparing changes between two local branches. As a result, even if the underlying branch author preferred not to rebase until the end, it would not affect the changes in the diff.

Triggering Pipelines

Because we have different CI pipeline layouts for changes to different parts of the codebase (frontend, backend, etc.), we leverage the

[changes](https://docs.gitlab.com/ee/ci/yaml/#ruleschanges)

keyword in the CI configuration a lot. This allows us to design a different pipeline altogether depending on what changed in the commit that the CI is running against. Unfortunately, this only works in Merge Request Pipelines and not in pipelines that come as a result of pushing to a side branch in origin. So, we’ve had to disable the former type of pipelines and rely exclusively on Merge Request Pipelines.

It can however be cumbersome to have to create a MR just to run the CI for a side branch. So, we’ve come up with a clever hack that allows us to still run CI even if a MR wasn’t triggered. By adding custom roles based on how the pipeline is created in a few places in our CI config, we can ensure that pipelines that were manually started via the UI/GitLab API for a given side branch also work as expected:

```
.frontend-rules: &frontend-rules
  - if: '$CI_COMMIT_BRANCH == "main"'
  - if: '$CI_PIPELINE_SOURCE == "web"'
  - if: '$CI_PIPELINE_SOURCE == "api"'
  - if: '$CI_PIPELINE_SOURCE == "merge_request_event"'
    changes:
    - 'frontend/**/*'
```

Of course, we need to have this for all the different pipeline shapes that we support.

Deployments

We use GitLab CI for automated deployments of our code (as well as manually rolling back changes if needed). This is done automatically for successful pipelines in the main branch, depending just slightly on the changes in the commit. The GitLab Environments feature is particularly useful to make it easy to check what commits are deployed to the various environments that we deploy to from CI. If there’s an incident, it’s critical that it’s as fast as possible to identify what commits are running in the various environments.

Slack Bot for Deployment Notifications

We’ve also implemented a Slack bot that notifies a release change-log channel whenever deployments go through to production. This is very convenient, and since the implementation of this bot warrants its own blog post, I won’t get into more details here. If people are interested in learning how we built this, please let me know!

Wrapping Up

A friend of mine who works at GitLab told me that migrating from Phabricator to GitLab could be painful for our team given the differences between the two platforms. After more than 6 months, we’re comfortable with our way of working with GitLab Code Review and GitLab CI, but there are definitely various things that we’d like to see improved particularly when it comes to how the Git Rebase Workflow is supported (Epic 4911, Issue 349733, Issue 895, Issue 118825).

If you have any questions, feel free to reach out to me on Twitter.

Try SingleStoreDB free.

Feed: SingleStore Blog.
Author: .

SingleStoreDB is a distributed SQL database that scales out to power some of the largest enterprise data applications at companies like Akamai, Dell, Uber and IEX Cloud. Now with our Management API (REST) it is easier than ever to deploy SingleStoreDB for all your data applications.

While many users are aware that SingleStoreDB can scale compute horizontally to increase performance, it is less well known that many of these companies have now deployed SingleStoreDB across a wide range of applications and workloads.

SingleStoreDB excels at both transactional & analytics workloads making it an ideal fit across various parts of the business — from customer facing and SaaS applications, to internal operations, telemetry, IOT, Transportation and operational machine learning. But deploying data across some of the largest global enterprises in the world requires automation and integration with existing tools.

The new Management API for SingleStoreDB Cloud is a REST API that allows you to deploy, manage and operate SingleStoreDB at the largest scales. With Tier 1 customers in cybersecurity and financial services managing tens of thousands of active SingleStoreDB deployments, and SaaS applications driving on-demand creation and termination of databases, the SingleStoreDB Management API is a critical tool in meeting the needs of modern and real-time enterprise applications.

And as of today, it’s now available to all of our customers to enable the deployment and management of SingleStoreDB Cloud at the largest enterprise scale.

Why Did We Choose REST?

REpresentational State Transfer, or REST, is an architectural style for providing standards between computer systems on the web. These standards make it easier for systems to communicate with each other. REST-compliant systems, often called RESTful systems, are primarily characterized by being stateless and separating the concerns of client and server, although there are several other constraints.

We worked with many of our customers when designing the Management API, and chose REST for our API to ensure that it is easy to use and easy to integrate with existing enterprise infrastructure.

Stateless

RESTful applications must be stateless, meaning that there is no session information retained by the server. Any relevant session data is sent to the server by the client in such a way that every request can be understood in isolation, without having knowledge of previous messages.

This means management operations for SingleStoreDB are secure and reliable, which is critical when operating mission critical internal and customer facing applications.

Separation of Concerns

When designing RESTful applications, the implementation of the client and the implementation of the server can be done independently, without one knowing about the other.

This means that the code on the client side can be changed at any time without affecting the server operations, and the code on the server side can be changed without affecting client operations.

In addition, different client languages can use the same REST endpoints to perform the same actions and receive the same responses.

Making REST Requests

A client makes a request to the server in order to retrieve or modify data on the server. For our purposes we will be using HTTP. An HTTP request generally consists of:

An HTTP verb, which defines what kind of operation to perform
A header, which allows the client to pass along information about the request
A path to a resource
An optional message body containing data

HTTP verbs:

An HTTP verb describes what type of action is being requested. Some of the more common verbs are:

GET — retrieve a specific resource or collection of resources

POST — create a new resource

PATCH — update a specific resource

DELETE — remove a specific resource

Management API Endpoints

In the SingleStoreDB Management API, you can create, retrieve, update, suspend, resume and delete workspaces. The following are the endpoints for working with workspaces (note that these URLs will not work from a browser due to missing authentication information):

GET https://api.singlestore.com/v1/workspaceGroups

Retrieves all of the workspace groups for the current user.

POST https://api.singlestore.com/v1/workspaceGroups

Creates a new workspace group.

GET https://api.singlestore.com/v1/workspaces?workspaceGroupID={workspa
cegroupID}

Retrieves workspaces of the workspace group.

POST https://api.singlestore.com/v1/workspaces

Creates a new workspace.

GET https://api.singlestore.com/v1/workspaceGroups/{workspaceGroupID}

Retrieves the workspace group with the given id.

GET https://api.singlestore.com/v1/workspaces/{workspaceID}

Retrieves the workspace with the given id.

DELETE https://api.singlestore.com/v1/workspaceGroups/{workspaceGroupID}

Terminates the workspace group with the given id.

DELETE https://api.singlestore.com/v1/workspaces/{workspaceID}

Terminates the workspace with the given id.

PATCH https://api.singlestore.com/v1/workspaceGroups/{workspaceGroupID}

Updates the workspace group with the given id. The body of the request contains a JSON document with the following fields:

{
  "name": "<name>",
  "adminPassword": "<adminPassword>",
  "firewallRanges": [
    "<ip address>",
    "<ip address>"
  ]
}

The official documentation for the Management API can be found here.

Authentication

The Management API authenticates requests using a unique API key. You can generate this key on the SingleStoreDB cloud portal — here’s how:

From the SingleStoreDB cloud portal, select the organization from the navigation pane.
Select API keys > Create API key.
Specify a name and expiration date for the API key.
Select Create.

NOTE: The API key is displayed only once! Be sure to copy and securely store the API key.

Once an API key has been obtained, use the Authorization header with each HTTP request:

    Authorization: Bearer <API key>

Calling the API With Python

As a simple example, the following Python code will make a request to the Management API to retrieve all of the workspace groups for the current user. Note that a valid API key must be provided.

 import requests
 
 url = "https://api.singlestore.com/v1/workspaceGroups"
 
 payload={}
 headers = {
   'Authorization': 'Bearer 454c736a37ad628ab370e9c6d5d1b8664042360c2025b551b999b7c82c8c205f'
 }
 
 response = requests.request("GET", url, headers=headers, data=payload)
 
 print(response.text)

All current active workspace groups will be displayed:

[
  {
    "name": "demo",
    "workspaceGroupID": "a25832c6-0776-4c2b-8cc4-68ddc6f648af",
    "createdAt": "2022-06-29T18:05:39.037486Z",
    "regionID": "99b1a977-cde0-496f-8c2e-0946b2f444db"
  },
  …
]

Calling the API With cURL

The same endpoint as above can be accessed via curl, a command line tool for transferring data via multiple network protocols, such as HTTP.

curl --request GET 'https://api.singlestore.com/v1/workspaceGroups 
--header 'Authorization: Bearer <API key>'

Generating OpenAPI Clients

Included with the Management API is an OpenAPI specification definition. This definition can be used as input to a variety of tools (such as Swagger Codegen) to generate client SDKs in many languages.

Getting Started

Using SingleStoreDB with the Management API is simple, and you can get started completely free with our cloud trial. For more details on how to develop using the Management API, you can also check out our API Reference documentation.

We hope that you will provide us feedback (you can do that on our Forum) on how you are using the Management API to scale SingleStoreDB across your unique applications and workloads.

Feed: SingleStore Blog.
Author: .

It’s been an amazing half year (our fiscal year starts in March) so far for SingleStore. The many newsworthy highlights include our collaboration with tech giant Intel; funding led by our customer, investor and financial sector leader Goldman Sachs; and the expansion of our exciting real-time features and capabilities.

We outshined major database competitors in a GigaOm performance and total cost of ownership study. We also launched the data intensity assessment to help customers understand the impact of data on their infrastructure in a self-service way.

Building on that momentum are our latest industry recognitions from Dresner Advisory Services and TrustRadius as well as a wealth of positive new customer feedback on Gartner Peer Insights. The only awards that matter are those that reflect the voice of the customer.

We are thrilled that both awards, Dresner and TrustRadius, are based on feedback from customers and users of the product. Rankings of suppliers that are heavily based on company size often get a lot of attention. But it’s really saying something when you get awards based on data from people who use and love your product.

Dresner Advisory Services 2022 Industry Excellence Awards recognized SingleStore as an overall Leader in Analytical Data Infrastructure (ADI). We earned spots in the top right in both the Customer Experience and Vendor Credibility models. The 2022 Dresner Industry Excellence Awards acknowledge companies that have achieved leadership positions in Dresner’s 2022 Wisdom of Crowds ADI study, which is based on data collected from end users and provides a broad assessment of each market including current usage, key drivers, technology priorities and future intentions. This Dresner award highlights our excellence in product/technology, sales and service, as well as value and confidence, which is also exemplified by our continuous growth.

This is the second consecutive year that Dresner has recognized SingleStore. In 2021, Dresner spotlighted our unrivaled innovation when it listed SingleStore as one of the Leaders in ADI.

I’m happy to report that TrustRadius has also recognized SingleStore for two consecutive years.

The TrustRadius Best of Summer 2022 awards recognized us for value, feature set and relationship in relational databases. This follows the four 2022 Top Rated Awards – in the relational databases, database-as-a-service, in-memory databases and operational analytics categories – that TrustRadius presented to us in May. TrustRadius also recognized us with 2021 Top Rated Awards in two categories: database-as-a-service (DBaaS) and relational databases.

TrustRadius CEO Vinay Bhagat noted the weight of awards and reviews like these, commenting: “In the age of the self-serve buyer, software buyers rely heavily on third-party sources. That’s why third-party reviews and awards from a trusted source like TrustRadius are so important.”

I couldn’t agree more! But enough from me, let’s hear what SingleStore customers are saying.

Want to learn more about how SingleStore can solve your data challenges? There is an open invitation to try our product at no cost. Want to help us drive towards becoming the world’s greatest database company? We are growing fast and actively seeking new talent across all functions of our organization.

Check out our open positions. Want to read what our customers are saying? Visit TrustRadius and Gartner Peer Insights to hear directly from our customers.

Feed: SingleStore Blog.
Author: .

Optimized architecture: Columnstore databases traditionally have been restricted to data warehouse uses where low latency queries are a secondary goal. Data ingestion is typically restricted to be offline, batched, append-only or some combination thereof.

To handle streaming analytics, a column store database implementation must treat low latency queries and ongoing writes as “first-class citizens,” with a focus on avoiding interference between read, ingest, update and storage optimization workloads. This broadens the range of viable column store workloads to include streaming analytics, and their stringent demands on query and data latency. These applications include operational systems that back adtech, financial services, fraud detection and other data streaming applications.

SingleStoreDB is a modern, unified, real-time distributed SQL database. It uses fragmented snapshot transactions and optimistic storage reordering to meet the extremely low latency requirements of streaming analytics applications. SingleStoreDB stores data in tables and supports standard SQL data types. Geospatial and JSON data types are also first-class citizens in SingleStoreDB, which can store and query structured, semi-structured and unstructured data with equal ease.

In SingleStoreDB, a table is either distributed or non-distributed (e.g., a reference table). There are two storage types for tables: in-memory rowstore and columnstore. All columnstore tables have an unexposed, in-memory rowstore table. SingleStoreDB automatically spills rows from the in-memory rowstore to columnstore. All data, including the hidden rowstore table, is queryable for the columnstore table.

Feed: SingleStore Blog.
Author: .

According to Gartner, more than half of enterprise IT spend will shift to cloud computing by 2025.

The reason is simple — outsourcing computing infrastructure allows organizations to focus their limited resources on differentiating their business within their given market segment, rather than maintaining computing infrastructure (i.e. digital plumbing).

This means more time can be spent improving products, acquiring/retaining customers and setting the strategic direction for the business. The following graph from Gartner compares the total revenue and revenue growth of cloud computing versus traditional/data center computing. Notice how in the coming years, cloud computing’s growth rate will continue rapid expansion, while traditional computing’s growth rate will slowly contract.

SingleStoreDB

SingleStore was founded in 2011 as MemSQL — the world’s fastest in-memory rowstore database. Over the next decade, the product evolved past an in-memory rowstore, into a multi-model, multi-cloud, scalable, unified database for both transactions and real-time analytics. In 2020, MemSQL rebranded into SingleStore, as this name was more suitable for the evolved product and next chapter of our business.

Since rebranding to SingleStore, we have been recognized in Gartner’s Magic Quadrant for Cloud Database Management Systems, doubled down on product development, raised three rounds of funding, have hired over 100 employees and continue to delight our customer base with our technology. Today, we have hundreds of customers across the globe, spanning all major and developing industries— and we are just getting started.

Customers can leverage our database technology through two different deployment methods: SingleStoreDB Cloud (managed service) and SingleStoreDB On-Premises (self-managed). When deploying SingleStoreDB Cloud, customers choose their favorite cloud provider (AWS, GCP, Azure), which region they want to be in and their environment size — getting their database environment simply provisioned in minutes.

With SingleStoreDB Cloud, all of the backend operations are autonomous; our cloud service is self healing, self scaling and completely automated without user intervention. In addition to all of the automation, SingleStoreDB Cloud is monitored 24/7 by our SRE team who eat, sleep and breathe SingleStoreDB every day. Customers using SingleStoreDB Cloud procure compute credits, which get burned over time depending on the size of the environment deployed (billing granularity is by the second).

When deploying SingleStoreDB On-Premises, customers are responsible for provisioning and managing SingleStoreDB software on their own x86-based virtualized infrastructure. Customers using SingleStoreDB will procure license units, which are used to install our software on their own virtualized infrastructure (VMs or containers). The following chart contrasts the responsibilities of deploying SingleStoreDB Cloud (managed service) and SingleStoreDB On-Premises (self-managed).

Migrating From SingleStoreDB On-Premises to SingleStoreDB Cloud

Many SingleStoreDB customers initially adopted our database technology via the on-premises approach we discussed in the previous section, making them responsible for all database infrastructure and to application architecture/performance. Following the larger cloud service trend, many of these customers are reconsidering their SingleStoreDB deployment methodology.

There is an increased demand from our customers to migrate their self-managed SingleStoreDB On-Premises to our cloud service, SingleStoreDB Cloud. This shift allows them to focus on their business differentiation, while we handle day-to-day database infrastructure management. In this section, we’ll discuss how to easily migrate from SingleStoreDB On-Premises to SingleStoreDB Cloud.

Prerequisites:

SingleStoreDB On-Premises deployment, with access to sdb-admin
Access to a cloud object storage provider (S3, Blob or GCS — S3 in our case)
SingleStoreDB Cloud deployment

High-level steps:

Inventory audit of current SingleStoreDB On-Premises database environment
Gather SingleStoreDB On-Premises objects
Configure firewall settings to allow traffic between: SingleStoreDB On-Premises, cloud object store and SingleStoreDB Cloud
Backup all SingleStoreDB On-Premises databases to cloud object store
Restore all databases from cloud object store to SingleStoreDB Cloud
Create SingleStoreDB On-Premises objects in SingleStoreDB Cloud
Validate inventory and objects in SingleStoreDB Cloud
Perform any application level validation testing

Detailed steps:

Inventory audit of current SingleStoreDB On-Premises database environment. Gather counts for all databases, tables, records, functions, stored procedures, views, aggregations, pipelines, etc., from the self-managed environment.
Gather SingleStoreDB On-Premises objects. Use the dump functionality of sdb-admin to generate a sequence of SQL statements that can be executed to reproduce the objects defined in the cluster. Example: sdb-admin dump –output-path /path/dump_file_name.sql
Configure firewall settings to allow traffic between: SingleStoreDB On-Premises, cloud object store and SingleStoreDB Cloud. Ensure all firewalls, NACL’s, etc., are configured to allow data to flow from SingleStoreDB On-Premises, to the cloud object store, to SingleStoreDB Cloud.

Backup all SingleStoreDB On-Premises databases to cloud object store. Use SingleStoreDB’s native BACKUP DATABASE functionality to persist database backups within the cloud object store. Example:

BACKUP DATABASE db_name TO S3 "backup_bucket/backups_path/backup_folder/"
CONFIG '{"region":"your_cloud_region"}'
CREDENTIALS 
'{"aws_access_key_id":"your_access_key_id","aws_secret_access_key":"your_s
ecret_access_key","aws_session_token":"your_session_token"}';

Restore all databases from cloud object store to SingleStoreDB Cloud. Use SingleStoreDB’s native RESTORE DATABASE functionality to restore database backups from cloud object store into SingleStoreDB Cloud. Example:

RESTORE DATABASE db_name_restored FROM S3 
"backup_bucket/backups_path/backup_folder/db_name.backup"
CONFIG '{"region":"your_cloud_region"}'
CREDENTIALS 
'{"aws_access_key_id":"your_access_key_id","aws_secret_access_key":"your_s
ecret_access_key","aws_session_token":"your_session_token"}';

6. Create SingleStoreDB On-Premises objects in SingleStoreDB Cloud. The dump command we ran in step 2 created a file called dump_file_name.sql, which now lives on your self-managed server. This file contains a list of SQL statements to recreate all objects from SingleStoreDB On-Premises. Copy and paste the contents of this dump file into SingleStoreDB Cloud SQL Editor — run the SQL statements to recreate all SingleStoreDB On-Premises objects in SingleStoreDB Cloud.

7.Validate inventory and objects in SingleStoreDB Cloud. Double check to ensure that everything has been copied over from the SingleStoreDB On-Premises environment to the SingleStoreDB Cloud environment. Validations should include things like: databases, tables, records, users, groups, aggregations, pipelines, stored procedures, etc.7.

8. Perform any application level validation testing. Now that your SingleStoreDB Cloud environment is fully hydrated and functional, start to kick the tires. Connect your application, run some benchmark queries, test concurrency — the world is your oyster!

If you’ve made it this far, you are equipped to migrate a workload from SingleStoreDB On-Premises to SingleStoreDB Cloud — well done!Please note there are some additional migration considerations, including: partition counts, database sizes, BACKUP/RESTORE vs mysqldump, objects included in the dump file, etc. This blog post is deliberately as simple as possible to maximize reach and usability.

As always, we do not recommend doing a database migration with a production environment the first time around. QA/staging migrations are recommended as validation steps in any production migration. Please reach out to us if you are looking to migrate any workload to SingleStoreDB Cloud — we’re here to help!

Try SingleStoreDB free

Feed: SingleStore Blog.
Author: .

Learn more about real-time databases and their use cases — and explore different types of real-time databases, and the features they provide.

Table of Contents

While traditional databases contain permanent data that changes infrequently, real-time databases handle data workloads that are continually changing and time sensitive. A real-time database also uses real-time processing to manage its data, meaning that transactions — or units of work performed within the database — are processed quickly enough for organizations or other integrated systems to instantly utilize the data.

Real-time databases are vital tools for many sectors, including eCommerce, energy, fintech, healthcare, high tech, retail, utilities and transportation. For example, an air traffic control system constantly analyzes large numbers of aircraft, keeps track of current values, makes choices regarding incoming flight patterns and calculates the sequence in which aircraft should land based on variables like weather, fuel, altitude and speed. If any of this data takes longer than expected to process, the consequences can be severe.

In this article, you will learn about real-time databases and their use cases. You’ll also explore different types of real-time databases and the features they provide, including low-latency streaming data, flexible indexing and high availability — so you can see what advantages they offer for various businesses.

About Real-Time Databases

A real-time database is similar to a conventional database, but its data validity is strongly dependent on time; if the execution deadline is missed, data becomes obsolete. Due to its timeliness requirement, one or more atomicity, consistency, isolation and durability (ACID) properties are sometimes relaxed to process transactions promptly while providing better support for temporal consistency.

Real-time databases are the fastest option for real-time analytics, meaning they facilitate data-driven decisions. They allow software administrators to evaluate which data sets are relevant to a specific geographic region by receiving the appropriate response and updates in real time. They also enable seamless spotting of statistical anomalies caused by security breaches, network outages or machine failures. And businesses’ stakeholders can evaluate the data acquired from customer behavior, inventories, web visits and demographics in real time for insights to improve usability, audience targeting, pricing strategies and conversion rates.

As we’ll lay out, there are two types of real-time databases: hard and soft. These classifications are determined by the transaction deadline. Additionally, a research paper by Ben Kao and Hector Garcia-Molina states that a transaction that misses its deadline is called a “tardy transaction.”

Hard Real-Time Databases

A hard real-time database is a system with hard deadline transactions, meaning that timely and logically correct execution is critical. Missing hard deadlines could lead to serious consequences, including incorrect patient data or significant financial loss, depending on the use case. Hard deadlines are sometimes referred to as safety-critical deadlines.

The preceding graph is a representation of hard deadline transactions that require a prompt response to be efficient. The data drops in value (actual effectiveness) immediately after a deadline is missed.

If a transaction misses its deadline in a hard real-time database system, the system neglects the tardy transaction (that is, it doesn’t attempt to execute that transaction again) so that other transactions may have a better chance of meeting their deadlines. This is why hard real-time databases are only used in safety-critical systems, where the system cannot afford to have a tardy transaction.

Some of these safety-critical systems include emergency alarm systems, antimissile systems, patient monitoring systems and air traffic control systems.

Soft Real-Time Databases

A soft real-time database has soft deadline transactions, and missing a deadline does not result in a failure or a breakdown of the system’s integrity. However, the quality of the results declines after the deadline, potentially affecting the database system’s quality of service.

The preceding graph represents soft deadline transactions, in which the value (actual effectiveness) decreases over time.

When a tardy transaction occurs in a soft real-time database system, the system will try to execute it again, but it will lower the priority so that non-tardy transactions are executed first. However, the system will neglect the transaction if the actual effectiveness has decreased to a negative value.

Soft real-time databases are used in banking systems, reservation systems, digital library systems and stock market systems.

Real-Time Database Use Cases

Organizations in various sectors adopt real-time databases to respond to events quickly and analyze data as it is generated. The following are some example use cases in industries including energy and utilities, gaming and media, and retail and eCommerce.

Energy and Utilities

In the energy and utility industry, generation, transmission and distribution facilities are all sources of critical data needed to optimize technical and business operations. This critical data is dynamic due to various factors like energy consumption, severe weather conditions and gas availability. As a result, this dynamic data must be managed in real time to avoid energy and financial loss and facility hazards.

Businesses in this industry are encouraged to adopt real-time databases because they will be able to perform real-time analysis of the dynamic data collected from these sources and use that analysis to optimize plant operations promptly, regulate energy usage and detect maintenance issues and equipment damages at an early stage.

Here’s how a major oil and gas company utilized SingleStore to improve its financial operations.

Gaming and Media

Real-time databases allow gaming and media companies to analyze usage patterns and behavior from online games, streaming videos or social media applications. This allows companies to improve user experience by providing more personalized recommendations tailored to each user’s preferences. Software administrators can also seamlessly monitor service quality.

Real-time databases also facilitate collaboration in the gaming and media industries. In other words, different people can control the same entity in an online game or work on the same document (including on text, images or video) simultaneously.

Retail and eCommerce

Real-time databases enable eCommerce systems and apps to handle consumer requests within their deadlines — like when market status changes — by using updated data indicating the current market status.

Additionally, real-time databases enable retail and eCommerce platforms to analyze user patterns and behavior, using the insights to present personalized and recommended products and detect fraud in real time.

Characteristics of Real-Time Databases

Among the features that real-time databases offer are performance, queuing, flexible indexing and high availability in case of hardware failure and historical records.

Performance

One of the most significant characteristics of a real-time database is the ability to process and complete transactions within deadlines. Regardless of whether there are hundreds or tens of thousands of events per minute, the database should be able to process everything without missing any deadlines — and respond to the system with minimal or no delay.

Queuing

The real-time database should be able to manage a massive number of events in the proper sequence — that is, its queuing model should keep the order of event inputs in the backlog and execute them in that order.

Flexible Indexing

Real-time databases, like SingleStoreDB, come with a number of indexing options to meet your development use case and enable low latency or minimal delay to data access in a variety of circumstances. Some of these indexes include the shard key, which is responsible for partition data distribution; the columnstore sort key, which stores data on disk in a columnstore format; and the skiplist index, which is a data structure suited for ordered data that allows queries to quickly seek data by binary searching.

High Availability in Case of Hardware Failure

High availability is the ability to remain available during a hardware failure without any service disruption. Its goal is to provide long-term, uninterrupted access to an application, and it usually involves the use of at least two servers, each running on different hardware. It can use redundant hardware on each server if desired and operate servers inside virtual machines that automatically switch from failing to working hardware.

This feature requires techniques to identify hardware failure as well as processes to ensure that a failing server remains offline so that data is not processed incorrectly and inconsistencies are avoided. It also includes the ability to automatically switch databases and notify connected applications to reconnect to the running database server.

Historical Records

Keeping track of historical data allows you to spot developing trends as they emerge so that you can make the required adjustments to preserve your system’s performance, availability, and security. Real-time databases should be able to record and store data in a timely manner, allowing for easy future access. Keeping your log data for as long as possible not only aids in resolving issues (since you can gain insights from log files from several months or years ago) but also allows you to extract more present-day value from the data.

For example, say you are investigating the root cause of an air mishap that happened over five years ago. You will be able to analyze the data that was logged during that air mishap to improve the functionality of the air traffic control systems and other systems related to the incident.

Conclusion

Real-time databases enable organizations to quickly and efficiently analyze current and historical data so that they can respond quickly to trends or resolve problems. As you learned in this article, these databases can improve the usability, engineering, and market performance of applications in multiple sectors.

If you’re in search of a real-time database to manage your applications, consider SingleStoreDB. SingleStoreDB specializes in real-time operational analytics and supports cloud-native SaaS applications. It provides global organizations with a distributed SQL database that unifies transactions and analytics.

SingleStore is well suited for delivering analytics in real time and is utilized by some of the world’s largest and most demanding businesses, including Uber, Comcast and Hulu.

Try SingleStoreDB free.

Feed: SingleStore Blog.
Author: .

From IoT to fraud analytics, and cybersecurity to retail, today’s modern, data-intensive applications need access to fast analytics in real time.

Yet legacy data architectures — and single-node, open-source databases — aren’t equipped to handle the fast-moving data streams necessary for real-time analytics.

Up-to-the-minute insights require a database that powers low-latency access to large datasets. With a unified data engine for transactional and analytical workloads, SingleStoreDB powers real-time analytics and applications. Say hello to real time, unified, distributed SQL.

Led by Senior Technical Evangleist Akmal Chaudhri, “Getting Started With SingleStoreDB” gives you an in-depth look at SingleStoreDB — including an introduction to real-time distributed SQL, the unique features and capabilities in SingleStoreDB, using connectors like Spark and Kafka, and how to get your OLAP & OLTP workloads up and running.

Here’s a look at the highlights:

Real-Time Analytics for Data-Intensive Applications

It’s no secret data volume and complexity are rising, and businesses need insights to drive real-time actions.

“It is a much more competitive world,” says Chaudhri. “Business pressures, de-regulation in many industries, there’s a lot of competitive pressures among organizations to be able to be innovative.” Combine that with how analytics have evolved and the demand for data, and you have a recipe for data intensity. The problem, however, is applications struggle to keep up. Sluggish event-to-insight response, increasing costs and complexity and growing demands for concurrency place a harsh spotlight on existing technology stacks that simply aren’t equipped to handle the five key requirements of data-intensive applications:

Speed of Ingestion
Latency Requirements
Query Complexity
Concurrency

Data-Intensity Assessment: How Data Intensive Are Your Applications?

What Makes SingleStoreDB Unique?

SingleStoreDB is the #1 database designed for data-intensive applications. As a real-time, distributed SQL database, one of the key features that makes SingleStoreDB unique is Universal Storage, a patented, single table type for transactions and analytics. By combining rowstore and columnstore capabilities, SingleStoreDB enables both OLAP and OLTP workloads in a unified data engine — a move other database technologies are aiming to replicate.

Other unique features to be aware of as you get started in SingleStoreDB include:

Additionally, our Summer 2022 product release unveiled a whole suite of new features in SingleStoreDB to accelerate application development and power real-time capabilities.

Get Up and Running With SingleStoreDB

Today, SingleStoreDB powers more than 100 global SaaS applications across industries like fintech, retail, media, gaming, cybersecurity and more. Brand like Uber, Hulu and Comcast use SingleStoreDB to:

Build new, modern applications with operational analytics workloads
Modernize, consolidate and replace legacy datastores and speciality databases to further simplify data architecture.

Watch “Getting Started in SingleStoreDB” on demand now.

Try SingleStoreDB free.

Feed: SingleStore Blog.
Author: .

Companies are struggling to integrate multiple data sources into a coherent set of analytical ready datasets that can be consumed by advanced analytical, AI/ML algorithms.

Multiple ETL jobs must be run against multiple data sources to build out “wide” columnar data sets. Data is cleansed and curated to incorporate multiple levels of detail (Temporal, Spatial, Dimensional). Data is subsequently formed into Datamart’s focused on solving business challenges and delivering competitive insights.

What if you could consolidate a significant percentage of this workflow into a single, analytically powerful solution?

Meet SAS® Viya® with SingleStore, which is here to do just that.

SAS® Viya® is our cloud-enabled, in-memory analytics engine that provides quick, accurate and reliable analytical insights. Elastic, scalable and fault-tolerant processing addresses the complex analytical challenges of today, while effortlessly scaling for the future.

As an integrated part of the Analytics Platform, SAS Viya provides:

Faster processing for huge amounts of data and the most complex analytics, including machine learning, deep learning, and artificial intelligence.
A standardized code base that supports programming in SAS and other languages, like Python, R, Java, and Lua.
Support for cloud, on-site or hybrid environments. It deploys seamlessly to any infrastructure or application ecosystem

SingleStoreDB is a real-time, distributed, SQL database designed to power modern data-intensive applications. It is designed to deliver maximum performance for both transactional (OLTP) and analytical (OLAP) workloads in a single unified engine to drive maximum performance.

By embedding SingleStore into SAS Viya, we allow organizations to integrate transactional and analytical data into a comprehensible analytic ready format for downstream AI use cases. This unified data architecture enables organizations to look at their transactional data and drive analytical models in real-time providing tremendous benefits. Not only does it offer faster time to decisions, but this approach also delivers benefits including an open data framework, simplified data access, increased productivity and cost optimization of analytical systems.

The preceding image is a high-level picture of our SAS Viya with SingleStore architecture. This newly engineered approach adds modern multithreading and fast parallel data retrieval, allowing SAS Viya to directly access and work with data stored in relational tables built in SingleStore. The SAS Embedded Process (represented by the green dots) resides within the leaves of the SingleStore database which also allows us to move a significant portion of compute to the data layer, providing for a series of improved performance characteristics.

We refer to the ability to work with and bring in data from multiple data sources as our Data Fabric. Now, our CAS workers not only continue to work with SAS DAT files and integrate data from other databases, but also allow users to work directly back and forth with relational tables in SingleStore. SingleStore’s best-in-market data ingest provides added benefits to real-time and upsert performance feeding downstream AI/ML.

SAS Viya with SingleStore drives the ability to integrate both the transactional and the analytical workloads into one platform where customers can easily develop new models and analytical insights quickly.

Visit sas.com/viya-singlestore to discover more about the integration and our partnership with SingleStore. Want to dive deeper? Join our webinar: Big Data, Small Footprint: How to Optimize Your Data and Analytics Infrastructure on November 8.

Stay tuned for additional blog posts for future updates on the types of capabilities that we are building into the embedded process and how these capabilities will help your business keep its competitive edge.

Feed: SingleStore Blog.
Author: .

We are pleased to announce that developers can run SingleStoreDB locally on their Mac ARM — M1!

I will introduce how to install SingleStoreDB on your local machine (including Mac M1) using Docker. But before jumping into the guide, let me highlight our free offering for our developer community.

When you sign up for a SingleStoreDB account, you get two huge goodies:

$500 free credits to use for our cloud managed offering over a 60-day period
The ability to run SingleStoreDB on 4 units (up to 32 vCPU) on a Mac (now supporting Mac M1 machine), PN, Kubernetes, Virtual Machine, Linux or Docker for as long as you want.

This is a unique bundle for anyone looking to use SingleStoreDB in both cloud and local environments (yes, hybrid is a reality here!).

Install SingleStoreDB with Docker

Prerequisites

Start by signing up for a free trial account

Use Docker Desktop 4.11.1

Get your license key

In the Portal, click on Organizations and On-Prem Licenses.

You should see a screen with the free license key.

Click on Install SingleStore:

Installation Guide

Select Quick Start

Select Docker as deployment

Click on Next

Enter a password that will be used to login as root user. (Here I entered ILOVESINGLESTOREDB)

Copy and paste the docker commands in your local CLI with the license key and password

You should see the following execution lines pulling the image from dockerhub and installing it on your local machine:

Run It From Docker

Your Docker desktop dashboard should look like the following. If SingleStoreDB is not running, just click on .

Connect to SingleStoreDB with the Studio

Connect to http://localhost:8080/ on your local browser

Select Localhost

Enter root for username and the password you entered earlier.

Click Submit and you should see the following screen:

For more information about running SingleStoreDB locally, check our documentation page. And once you’re signed up, you can connect with us live, 24/7 with any questions or concerns.

Happy coding!

Feed: SingleStore Blog.
Author: .

We at SingleStore want to provide a familiar and comfortable environment for our users, many of whom are accustomed to dark UI whilst working in their IDE. We recently finished a very successful hackathon, where I took on the project of implementing our long-awaited dark theme for the SingleStoreDB Cloud Portal UI. In this post, I will outline what it took to get this shipped to customers.

The Challenge

At SingleStore, we just recently finished our summer hackathon — which, if you don’t know, is where employees are free to work on whatever they want. We host two of these every year — one shorter period (about three days), at the start of the year, and a longer period, (about five days), mid-year.

My goal for this year’s five-day hackathon was to ship a dark theme for our SingleStoreDB Cloud Portal React application. It’s a project our team has desired and attempted multiple times —so myself and Jennifer Watts, the visual designer I paired up with for this task — were lucky enough to inherit all of the previous design work. That meant I could start the development work immediately, whilst Jennifer revisited previous design decisions and fleshed out any recognized gaps or insufficiencies.

The required development work can be broken down into the following tasks:

Providing the logic and UI for getting and setting the applied theme in-app and in Storybook
Conditionally switching between light and dark graphics
Discovering edge-cases and collaborating with designers to decide on a solution
Testing and quality assurance

The Implementation

Each user can select a theme preference which will be stored in their browsers’ `localStorage` in one of three states: `“light” | “dark” | “system”`. By default, `“system”` will be applied, which will look to the operating system or user agent for a preference using the `prefers-color-scheme` media query.

Our designers provide a dark variant with the same name and relative contrast for each of the base colors in our design system’s existing color palette, e.g., `color-purple-800`, `color-neutral-900`, `color-red-200`, testing for accessible contrast between expected combinations. We also needed a reliable system for naming and classifying additional tokens.

To use these tokens in CSS, we declare and apply them as CSS Custom Properties (also known as CSS Variables), which are overridden whenever the class `dark-mode` is applied to the

<body>

element. We also generate TypeScript utilities for applying the tokens as either a reference to the associated CSS Custom Property, or the underlying primitive value (because some third-party libraries do not support CSS Custom Properties).

Then, we must touch and test every corner of our codebase to ensure that these tokens are applied in every relevant CSS declaration containing properties affecting color (`color`, `background-color`, `border-color`, `box-shadow`, `fill`, `stroke`).

To switch between our themed images, we create a React component that accepts an `imageKey`, rather than a `src`, pointing to a predefined dictionary that expects a `lightComponent` and `darkComponent` for every entry.

Providing an interface for working with our theme

First of all, we needed to provide the ability to get and set the current theme. We do this by wrapping our React app in a context provider that checks local storage for the user’s theme preference, makes the data available via a `useTheme()` hook and adds/removes the `dark-mode` class to the document’s body depending on that state.

import React from "react";
import { useLocalStorage } from "./hooks/use-local-storage";
import { useMediaQuery } from "./hooks/use-media-query";

const COLOR_SCHEME_QUERY = "(prefers-color-scheme: dark)";

export type ThemePreference = "system" | "light" | "dark";

export const ThemeContext = React.createContext<{
    theme: Omit<ThemePreference, "system">;
    themePreference: ThemePreference;
    setThemePreference: (theme: ThemePreference) => void;
}>({
    theme: "light",
    themePreference: "system",
    setThemePreference: () => {},
});

export function ThemeProvider({ children }: { children: React.ReactNode }) {
    const [themePreference, setThemePreference] =  useLocalStorage(
        "singlestore-ui-theme",
        "system"
    );
    const isOSDark = useMediaQuery(COLOR_SCHEME_QUERY);
    const systemTheme = isOSDark ? "dark" : "light";
    const theme = themePreference === "system" ? systemTheme : themePreference;
    
    React.useEffect(() => {
        if (theme === "dark") {
            document.body.classList.add("dark-mode");
        } else {
            document.body.classList.remove("dark-mode");
        }
    }, [theme]);
    
    return (
        <ThemeContext.Provider
            value={{ theme, themePreference, setThemePreference }}
        >
            {children}
        </ThemeContext.Provider>
    );
}

export function useTheme() {
    const context = React.useContext(ThemeContext);
    
    if (context === undefined) {
        throw new Error("useTheme must be used within a ThemeProvider");
    }
    
    return context;
}

Conditionally switching between light and dark-themed images

We have a centralized component for rendering our more complex, now-themed SVGs, so I simply had to export the dark image variant files to the codebase and add them to an object within the component.

import SVGSchemaLight from './svg/schema.svg';
import SVGSchemaDark from './svg/schema--dark.svg';
import SVGTableLight from './svg/table.svg';
import SVGTableDark from './svg/table--dark.svg';
...
export function ComplexSVG(props) {
    const { theme } = useTheme();
    
    const allComplexSVGs = {
        schema: {
            lightComponent: SvgSchemaLight,
            darkComponent: SvgSchemaDark
        },
        table: {
            lightComponent: SvgTableLight,
            darkComponent: SvgTableDark
        },
        ...
    };

    ...

    const { lightComponent, darkComponent } = allComplexSVGs[props.imageKey];

    return theme === 'dark' ? darkComponent : lightComponent;
}

Revising our design tokens and component library

We had recently started to revise how we classify, maintain and distribute our design tokens and their dependent assets. In fact, the impact of that work is what drove my initial desire to take on the task of implementing our dark theme in this hackathon. Particularly supportive changes were:

The declaration and increased adoption of CSS Custom Properties to replace our Sass variables
Adopting a multi-tiered system for classifying our design tokens
Using “variant props” to apply typed classes to reusable UI components
Using style-dictionary to centralize our tokens in JSON and generate multiple synchronized formats for accessing our design tokens

Let’s dig deeper into some of these changes.

Naming and classifying design tokens

Inspired by this great article, we classified our design tokens as having three tiers (which I’ll explain using color tokens, but can be relevant to all types of design tokens). This “tiered” system allows us to gracefully maintain multiple themes as well as any edge-cases, like when the background is dark in both light and dark mode).

Tier 1

:root {
  --sui-color-base-neutral-200: #f3f3f5;
  --sui-color-base-red-900: #c41337;
}

.dark-mode {
  --sui-color-base-neutral-200: #221f26;
  --sui-color-base-red-900: #eb4258;
}

These are the lowest-level tokens that can be thought of as the primitive values that make up our entire set or palette. They never reference another token and were the most used tier of tokens prior to implementing dark mode. We chose to override each Tier 1 token’s CSS Custom Property with a dark value, which meant that a significant amount of our UI that already used these Tier 1 tokens would already be themed.

Prior to this, our app only provided one theme, and we could get away with hard-coding primitive CSS values. As a result, the existing tokens were not used 100% of the time — and most, that were used, did so via Sass variables, which wouldn’t be reassigned when the theme changes. To add a second, dark theme, we needed 100% CSS Custom Properties usage for at least our color-related declarations,. However we could not start converting the codebase over just yet, as Tier 1 tokens alone would not get us all the way.

What if a UI element has a dark background in both light and dark themes? Adding to that, these Tier 1 tokens’ names do not give much indication of their purpose. Can the color `“base-red-900”` be used for backgrounds, text, and borders? Well, we can start to simplify and standardize these decisions by defining additional layers of tokens with more prescriptive names that are alias to our low-level tokens.

Tier 2

:root {
  --sui-color-text-neutral-3: var(--sui-color-base-neutral-900);
  --sui-color-background-neutral-inverse-1: var(--sui-color-base-neutral-0);
  --sui-color-border-purple-1: var(--sui-color-base-purple-600);
}

Tier 2 tokens exist to give creative power to contributors by further expanding our Tier 1 tokens into more meaningful categories. I search through every declaration that set any color-affecting CSS property, which include:

`background-color`
`border | `border-color` | `border-top-color` etc…
`box-shadow`

This resulted in me defining Tier 2 tokens for background, text, and border colors in order of contrast, 1 always being the lowest contrast against the `background-color` of

<body>

. These tokens indicate where they should be used, providing guardrails, and allowing contributors to make UI design decisions with less effort and more confidence.

When applying a `background-color`, we know to use `color-background-*` and that any `color-text-*` and `color-border-*` colors will pair with it in an accessible way. If we want the low-contrast gray text, we can start with `color-text-neutral-1` and work our way up as needed. Over time, we start to remember patterns, such as that `color-text-neutral` has 3 steps and so to use `color-text-neutral-3` for high-contrast text, and that `color-border-red` only has 1 step, so use `color-border-red-1` whenever we require a red border.

But this still doesn’t help us manage edge cases where a specific UI element has a dark background in both light and dark themes. To do this, we must declare more specific Tier 3 tokens.

Tier 3

:root {
  --sui-component-help-menu-header-background-color: var(--sui-color-background-neutral-inverse-2); // dark background with high contrast against the HTML document's white <body> background
}

.dark-mode {
  --sui-component-help-menu-header-background-color: var(--sui-color-background-neutral-3); // dark background with slight contrast against the HTML document's dark <body> background
}

These tokens are so specific that they are often used in just a single declaration (although they can be reused). Themes can override these tokens to target specific properties of a specific component in a specific state. This is most useful for us when a different Tier 2 token is used in our light theme than our dark theme, or when using a unique primitive CSS value not found anywhere else in the codebase.

Discovering code derived from design tokens

Now we have our design tokens available as CSS Custom Properties that can change dynamically with whichever theme is applied, which is great!

Although, we still do not have everything we need. We do not yet have a method for accessing our tokens in TypeScript files in a type-safe way. And not only do we need the ability to access a token’s representative CSS Custom Property, but also to access its underlying, primitive CSS value to work with some of our third-party libraries that do not support CSS Custom Properties.

On top of that, we want to declare reusable CSS utility classes to apply common rules, some of which are derived from our set of design tokens. How do we type components that make use of these utility classes? And, better yet, how do we keep all of this synchronized over time?

Before I get into how we keep our design-token-derived code synchronized, let’s look at the code and why it’s important.

Utility classes that apply design tokens

Utility classes provide a great developer experience, allowing us to move quicker and reduce the amount of micro-stylesheets we create. For this reason, we want the ability to apply our most common styles this way. We see value in having a utility class for every Tier 2 color token, but feel no need to do the same for Tier 1 and 3 color tokens — since we don’t want to encourage the use of Tier 1 tokens, (which lack descriptive names),, and Tier 3 tokens are usually too specific to warrant a reusable utility class.

.sui-u-color-neutral-1 {
    color: var(--sui-color-text-neutral-1) !important;
}

.sui-u-background-color-neutral-1 {
    color: var(--sui-color-background-neutral-1) !important;
}

.sui-u-border-1px-solid-neutral-1 {
    border: 1px solid var(--sui-color-border-neutral-1) !important;
}

.sui-u-border-top-1px-solid-neutral-1 {
    border-top: 1px solid var(--sui-color-border-neutral-1) !important;
}

.sui-u-border-right-1px-solid-neutral-1 {
    border-right: 1px solid var(--sui-color-border-neutral-1) !important;
}

.sui-u-border-bottom-1px-solid-neutral-1 {
    border-bottom: 1px solid var(--sui-color-border-neutral-1) !important;
}

.sui-u-border-left-1px-solid-neutral-1 {
    border-left: 1px solid var(--sui-color-border-neutral-1) !important;
}

This is great because it reduces our need to create new CSS rules/files to apply simple styles. For clarification, however, we aren’t taking a “utility-first” approach, as something like Tailwind does. I look at it like this — the 80/20 principle would suggest that ~20% of CSS declarations make up ~80% of the application styles. We’re just finding and providing that 20%.

A common criticism of utility classes, and one that I agree with, is that they pollute the source code with long strings of utility classes. Applying these strings directly to a component’s `className` prop is also untyped. So next, let’s look into how we apply these utility classes in a way that is both typed and easier to read.

Typed styling props using Class Variance Authority

We’ve had great results providing typed style variant props to our reusable UI components using class-variance-authority. But what are style variant props?

Simply put, style variant props (or just “variant props”) are component props that apply classes to and style DOM elements when certain conditions are true. A common example of this is a Button component:

<Button
  variant="primary" // applies ".sui-c-button--variant-primary"
  size={2} // applies ".sui-c-button--size-2"
  disableMotion // applies ".sui-c-button--disableMotion"
>
    I'm a button
</Button>

We also use this pattern to apply lower-level utility classes:

// applies ".sui-u-background-color-neutral-1", which applies "background-color: var(--sui-color-background-neutral-1)"
<Flex backgroundColor="neutral-1" />


// applies ".sui-u-color-neutral-1", which applies "color: var(--sui-color-text-neutral-1)"
<Paragraph color="neutral-1" />

The logic behind the `color` prop of the preceding component is another example of code that we need to keep in sync with the design tokens. Under the hood, using class-variance-authority, this looks like so:

const textVariants = {
    variant: {
        "body-1": "sui-c-text--variant-body-1",
        "heading-1": "sui-c-text--variant-heading-1",
        ...
    },
    color: {
        // We want to keep these in-sync automatically
        "neutral-1": "sui-u-color-neutral-1",
        "neutral-2": "sui-u-color-neutral-2",
        "neutral-3": "sui-u-color-neutral-3",
        "red-1": "sui-u-color-red-1",
        "green-1": "sui-u-color-green-1",
        …
    }
}

const textVariantsKeys = Object.keys(textVariants) as Array<keyof typeof textVariants>;

const text = cva('sui-c-text', {
    variants: textVariants
})

export function Paragraph(props) {
    const { className, ...rest } = props
    const [variantProps, elementProps] =  split(rest, textOwnVariantPropKeys);
    
    return (
        <p
            className={text({ ...variantProps, class: className })}
            {...elementProps}
        />
    );
}

Accessing a token’s value in TypeScript files

In TypeScript, we need the ability to access our tokens in multiple ways:

As a CSS Custom Property `var(–sui-color-background-purple-1)`
As its underlying primitive value in light mode, `#f9edff`
As its underlying primitive value in dark mode, `#22102b`

Which looks like so:

export const COLORS = {
    "base-neutral-0": "var(--sui-color-base-neutral-0)",
    "base-neutral-100": "var(--sui-color-base-neutral-100)",
    "base-neutral-200": "var(--sui-color-base-neutral-200)",
    ...
    "text-neutral-1": "var(--sui-text-neutral-1)",
    "background-neutral-1": "var(--sui-background-neutral-1)",
    "border-neutral-1": "var(--sui-border-neutral-1)",
    ...
};

export const LIGHT_HEX_COLORS  = {
    "base-neutral-0": "#ffffff",
    "base-neutral-100": "#fafafa",
    "base-neutral-200": "#f3f3f5",
    ...
    "text-neutral-1": "#777582",
    "background-neutral-1": "#ffffff",
    "border-neutral-1": "#e6e5ea",
    ...
}

export const DARK_HEX_COLORS {
    "base-neutral-0": "#151117",
    "base-neutral-200": "#221f26",
    "base-neutral-100": "#1c181f",
    ...
    "text-neutral-1": "#858191",
    "background-neutral-1": "#151117",
    "border-neutral-1": "#29262e",
    ...
}

Accessing a token value as a CSS Custom Property

In our app, we use a code syntax highlighting library that allows us to pass through a theme object to customize the UI it renders. We use the `COLORS` object here to access a token’s representative CSS Custom Property as a string.

import { Highlight } from 'third-party-library’;
import { COLORS } from 'singlestore-ui/tokens';

const theme = {
    plain: {
        backgroundColor: COLORS['background-neutral-1'], // "var(--sui-color-background-neutral-1)"
        color: COLORS['text-neutral-3'] // "var(--sui-color-text-neutral-3)"
    }
};

...

export function CodeBlock(props) {
    ...
    return (
        <Highlight
            theme={theme}
            code={props.code}
            language={props.language}
            ...
        />
    )
}

Accessing a token value as its primitive value

The next example shows our usage of a third-party library that renders an `iframe’, where customization is applied by passing a CSS-in-JS object through props. Because this is an `iframe`, our CSS Custom Properties are not available in the embedded document — so we must pass our tokens as primitive CSS values.

import { PaymentMethodIframe } from "third-party-library-without-css-custom-properties-support";
import { DARK_HEX_COLORS, LIGHT_HEX_COLORS } from "singlestore-ui/tokens"

export function BillingForm() {
    const { theme } = useTheme();
    
    let themedInputStyles;
    
    if (theme === 'dark') {
        themedInputStyles = {
            backgroundColor: DARK_HEX_COLORS['background-neutral-3'],
            color: DARK_HEX_COLORS['text-neutral-3'],
        }
    } else {
        themedInputStyles = {
            backgroundColor: LIGHT_HEX_COLORS['background-neutral-3'],
            color: LIGHT_HEX_COLORS['text-neutral-3'],
        }
    }

    return (
        <PaymentMethodIframe
            inputStyles={{
                ...themedInputStyles
            }}
        />
    )
}

Generating code derived from design tokens using Style Dictionary

Now that we’ve seen all the code that references our color design tokens, let’s get into how we keep it all synchronized. If we were to modify or extend these design tokens, we may need to touch *dozens*, maybe even ***hundreds***, of lines of code for even simple changes. This clearly does not scale, and leaves plenty of room for human error. So, we need a way to automatically generate this code for us.

And this is how we landed on Style Dictionary. Here’s an excerpt from their website that accurately describes its utility:

“Style Dictionary is a build system that allows you to define styles once, in a way for any platform or language to consume. A single place to create and edit your styles, and a single command exports these rules to all the places you need them.”

This allows us to define our tokens in a centralized JSON object and then define “formats”, “filters” and “transforms” (i.e JavaScript functions) to generate code derived from it.We won’t dig deep into our implementation of this tool, but rather give an overview of what this process looks like.Here is what our token source files look like:

singlestore-ui/tokens
├── border
│   └── base.js
├── colors
│   ├── base.js
│   └── dark.js
├── font
│   └── base.js
├── sizes
│   └── base.js
├── space
│   └── base.js
└── tokens.utils.js
``` 

```
// singlestore-ui/tokens/color/base.js
module.exports = {
    color: {
        base: {
            neutral: {
                900: { value: "#1b1a21" },
        ….
            },
            purple: {
                900: { value: "#8800cc" },
        ….
            },
    ….
        },
        text: {
            "neutral-1": { value: "{color.base.neutral.700}" },
    …
        },
        background: {
            "neutral-1": { value: "{color.base.neutral.0}" },
    …
        },
        border: {
            "neutral-1": { value: "{color.base.neutral.300}" },
    …
        },
    },
};

I then wrote the script that defines our different formats, which filter and iterate over tokens to output the files we need. Now, the rest of our codebase can import and use these assets — and whenever we ever add, remove or modify tokens, running this script, `pnpm run style-dictionary:build`, does much of the heavy lifting for us.

singlestore-ui/tokens/__generated__
├── background-color-utility-classes.css
├── background-color-utility-variants.js
├── border-color-utility-classes.css
├── border-color-utility-variants.js
├── border-radius-utility-classes.css
├── border-radius-utility-variants.js
├── border-variables.css
├── color-variable-map.js
├── color-variables.css
├── dark-hex-color-map.js
├── font-utility-classes.css
├── font-utility-variants.js
├── font-variables.css
├── light-hex-color-map.js
├── size-utility-classes.css
├── size-utility-variants.js
├── size-variables.css
├── space-utility-classes.css
├── space-utility-variants.js
├── space-variables.css
├── text-color-utility-classes.css
└── text-color-utility-variants.js

Conclusion

That was most of the development work that went into shipping a dark theme for the SingleStoreDB Cloud Portal UI. After extensive testing, I shipped it to customers and passed it over to the QA team for another round of testing. We’re all very pleased with the end result of this hackathon project, and we hope you are too!

My final thoughts around this subject are of appreciation for how far our browsers have developed. CSS Custom Properties now being supported in all modern browsers actually made implementing the logic for dark mode fairly simple, and I’m pleased to have had an opportunity to use them and see the benefits they bring.

How to reduce cost by ~85%

Expense on STANDARD storage (non-archived) per month

By default, STANDARD storage class is used when you create any unlimited storage database on AWS S3. The following highlights the storage cost that, as of today, you will pay monthly:

So, for 50TB/ Months you will pay: 0.023x5000GB= $115/Month

Expense on GLACIER storage class (archived data) per month

Now, if you have a good archiving (example in HOW section) policy then you can save cost by ~85% by converting your storage into a low-cost Glacier storage class. Here is a link for different Glacier storage class costs today based on your requirements:

https://aws.amazon.com/s3/pricing/

Example (S3 Glacier Flexible Retrieval); For 50TB/ Months you will pay: 0.0036×5000= $18/Month

Cost reduced by %; Percentage Decrease=((Starting Value−Final Value)/Starting Value)×100: ((115-18)/115)*100= 84.34 = ~85%

How does it work?

Best for: Historical/Archived/Unused dataset, here are the high-level steps you can take to save money with SingleStoreDB On-Premises:

Create bucket in AWS S3 storage
Create unlimited storage database on S3
Create database archiving policy to move historical data from local database (or unlimited storage) database to unlimited storage database.
Create Lifecycle Policy for unlimited storage database’s S3 bucket based on your requirement. (Ex. 3 months, 12 months , one year, 10 years, etc)
After moving the data from local database to unlimited storage database for specified archived duration, take MILESTONE and DETACH database
If you want to retrieve data from the glacier bucket you can restore it in STANDARD and ATTACH the database.

The following highlights the steps in more detail:

Open your AWS account console. Search for S3 and create a bucket. Open this bucket and create a folder (Ex: dbdir) inside it.

2. Create an unlimited storage database on this bucket:

CREATE DATABASE testdb ON S3 "s2bottomless/dbdir" CONFIG '{"region":"us-east-1"}' 
CREDENTIALS 
'{"aws_access_key_id":"AKIA2B6I44NCXT7VFPFT","aws_secret_access_key":"hC+ZHSRJsuq
YFiYtIqPbAEDv/4AyIjHvigiRK69M"}';

3. Create a database archiving policy to move data from your local database (or unlimited storage) to your unlimited storage database.

In this example, data older than 10 years is moving to an unlimited storage database using the scheduler (cron job). You can set your own policy based on your requirements:

Day 1: Insert into testdb.t1 select * from db1.t1 where ts < now()-interval 10 year;

Day 2: Insert into testdb.t1 select * from db1.t1 where ts < now()-interval 10 year;

…Run this for one year

Note: testdb is an unlimited storage database, and db1 is a local database on server.

You can see the bucket size using metrics tab:

4. Create a Lifecycle policy for this bucket and folder (s2bottomless/dbdir). You can create this policy on day one, or when data is moved from the main to archive database (testdb). My choice would be after all data is moved to the archive database, (step 5) so that I can test records before it converts to glacier. Note that once your bottomless bucket is moved to a glacier, you can’t fetch records from there.

Select “dbdir” folder -> go to management tab -> click on “create lifecycle rule” -> Enter some lifecycle rule name -> Select “Apply to all objects in the bucket” -> Select “Lifecycle rule actions” as per your requirement -> Choose glacier type from “Transition current versions of objects between storage classes” (I have chosen Glacier Flexible Retrieval) -> Enter the number of days when your folder should go in glacier storage after object creation -> Check all mandatory boxes -> Click on “create rule”

5. When you think the records are moved from main(db1) to archived/ unlimited storage database (testdb) then:

Create milestone:

CREATE MILESTONE “one_year_data” FOR testdb;

Detach database:

DETACH DATABASE testdb;

Then, create a lifecycle policy as seen in step 5.

When your bucket is moved to Glacier, you can’t attach (fetch) to a database. It will return the following error: (milestone name is different in the screenshot).

6. To fetch data from the glacier database you need to attach it. To attach this, we first need to restore it to STANDARD.

Method to restore into STANDARD storage:

If you want to restore it for only few days (say, 7 days) then you can do this with following command, and the bucket will again go into GLACIER:

aws s3api restore-object --restore-request Days=7,GlacierJobParameters={Tier=Standard} 
--bucket "s2bottomless" --key "dbdir/"

Lastly, attach the database

Additional steps to delete bucket and detach database:

singlestore> detach database testdb force;

Query OK, 1 row affected, 2 warnings (1.34 sec)

Delete Bucket

Try SingleStoreDB free.

Feed: SingleStore Blog.
Author: .

This year’s ACM SIGMOD₁ database conference continued the recent trend of renewed interest in HTAP (Hybrid TransactionalAnalytical Processing) by the database community, researchers and practitioners alike.

The conference featured four HTAP papers and an HTAP workshop. This is a significant increase over the past 3-4 years where the conference often had no HTAP papers at all.

One of these papers was our own SingleStoreDB architecture paper “Cloud-Native Transactions and Analytics in SingleStore”. This paper describes some key design decisions that enable SingleStoreDB to run a breadth of workloads — from more transactional (OLTP) to more analytical (OLAP) — with performance matching databases that specialize in one use case or the other.

Today, SingleStoreDB is one of the more widely deployed distributed HTAP SQL databases in the market. The remainder of this blog post is an overview of some of the important aspects of our paper, including a description of how SingleStoreDB performs on the mixed workload CH-benCHmark performance test.

HTAP databases like SingleStoreDB are starting to reverse a decades-long trend toward the development of specialized database systems designed to handle a single, narrow use case. With data lakes, in-memory caches, time-series databases, document databases, etc., the market is now saturated with specialized database engines. As of August 2022, DB-Engines ranks over 350 different databases. Amazon Web Services alone supports more than 15 different database products.

There is value in special-case systems, but when applications end up built as a complex web of different databases, a lot of that value is eroded. Developers are manually rebuilding the general-purpose databases of old via ETL, and data flows between specialized databases.

There are many benefits for users in having a single integrated, scalable database that can handle many application types — several of which result in reductions in:

Training requirements for developers
Data movement and data transformation
The number of copies of data that must be stored — and the resulting reduction in storage costs
Software license costs
Hardware costs

Furthermore, SingleStoreDB enables modern workloads to provide interactive real-time insights and decision-making, by supporting both high-throughput low-latency writes and complex analytical queries over ever-changing data, with end-to-end latency of seconds to sub-seconds from new data arriving to analytical results. This outcome is difficult to achieve with multiple domain-specific databases, but is something SingleStoreDB excels at.

See more: The Technical Capabilities Your Database Needs for Real-Time Analytics.

Moreover, adding incrementally more functionality to cover different use cases with a single distributed database leverages existing fundamental qualities that any distributed data management system needs to provide. This yields more functionality per unit of engineering effort on the part of the vendor, contributing to lower net costs for the customer. For example, specialized scale-out systems for full-text search may need cluster management, transaction management, high availability and disaster recovery — just like a scale-out relational system requires. Some specialized systems may forgo some of these capabilities for expediency, compromising reliability.

SingleStoreDB is designed to deliver on the promise of HTAP. It excels at running complex, interactive queries over large datasets (up to 100s of terabytes) as well as running high-throughput, low-latency read and write queries with predictable response times (millions of rows written or updated per second). The two key aspects of SingleStoreDB described in more detail in our paper are:

Unified table storage. SingleStoreDB’s unified table storage is unique in its ability to support the very fast scan performance of a columnstore (billions to trillions of rows scanned a second), while also having point read and write performance approaching that of a rowstore (millions of point writes a second) over a single data layout — no extra copies of data with different data layouts needed.
Separation of storage and compute. SingleStoreDB’s separation of storage and compute design dictates how data is moved between memory, local disk and blob storage while maintaining high availability and durability of that data — and without impacting low latency write query throughput.

Unified Table Storage

In SingleStoreDB, both analytical (OLAP) and transactional (OLTP) workloads use a single unified table storage design. Data doesn’t need to be copied or replicated into different data layouts (as other HTAP databases often do). SingleStoreDB’s unified table storage internally makes use of both rowstore and columnstore formats, but end users aren’t made aware of this.

At a high level, the design is that of a columnstore with modifications to better support selective reads and writes in a manner that has very little impact on the columnstore’s compression and table scan performance.

The columnstore data is organized as a log-structured merge tree (LSM), with secondary hash indexes supported to speed up OLTP workloads. Unified tables support sort keys, secondary keys, shard keys, unique keys and row-level locking, which is an extensive and unique set of features for table storage in columnstore format. The paper describes in detail the LSM and secondary indexing layout as well as how it compares to other approaches.

Separation of Storage and Compute

SingleStoreDB is able to make efficient use of the cloud storage hierarchy (local memory, local disks and blob storage) based on how hot data is. This is an obvious design, yet most cloud data warehouses that support using blob storage as a shared remote disk don’t do it for newly written data. They force new data for a write transaction to be written out to blob storage before that transaction can be considered committed or durable (both Redshift and Snowflake do this).

This in effect forces hot data to be written to the blobstore, harming write latency. SingleStoreDB can commit on local disk and push data asynchronously to blob storage. This gives SingleStoreDB all the advantages of separation of storage and compute — without the write latency penalty of a cloud data warehouse. These advantages include fast pause and resume of compute (scale to 0) as well as cheaply storing history in blob storage for point in time restore (PITR).

CH-BenChMark: Mixed Workload Benchmark Results

One of the proof points in the paper was to show our results on the CH-benCHmark. This HTAP benchmark runs a mixed workload composed of parts from the famous TPC-C transactional benchmark running high throughput point read and write queries alongside a modified version of the TPC-H analytical benchmark running complex longer running queries over the same set of tables. It’s designed to test how well a database can run both a transactional workload (TW) and an analytical workload (AW) at the same time.

The following table shows our results. Test cases 1-3 were run with a single writable workspace with two leaves in it, each leaf having eight cores. Fifty concurrent TWs running parts of TPC-C resulted in the highest TpmC when they were run in isolation with no AWs (test case 1). Two AWs results in the highest queries per second (QPS) from TPC-H when run in isolation (test case 2).

When fifty TWs and two AWs are run together in the same workspace each slows down by about 50%, compared to when each is run in isolation (test case 3). This result demonstrates that TWs and AWs share resources almost equally when running together without an outsized impact on each other (i.e., the write workload from TPC-C does not have an outsized impact on the TPC-H analytical read workload).

Test case 4 introduces a read-only workspace with two leaves in it that is used to run AWs. This new workspace replicates the workload from the primary writable workspace that runs TWs, effectively doubling the compute available to the cluster. This new configuration (case 4) doesn’t impact TWs throughput when compared to test case 1 without the read-only workspace. AWs throughput is dramatically improved versus test case 3, where it shared a single workspace with TWs.

This is not too surprising as the AWs have their own dedicated compute resources in test case 4. The AWs QPS was impacted by ~20% compared to running the AWs workload without any TWs at all (test case 2), as SingleStoreDB needed to do some extra work to replicate the live TWs transactions in this case which used up some CPU. Regarding the replication lag, the AWs workspace had on average less than 1 ms of lag, being only a handful of transactions behind the TWs workspace. The paper has many more details on other benchmarking we did to show SingleStoreDB’s capability to run both transactional and analytical workloads with strong performance.

Summary of SingleStoreDB CH-BenCHmark results (1000 warehouses, 20-minute test executions)

With such a large market opportunity at stake for a database capable of running a breadth of workloads at scale, we expect to see more specific use-case databases being augmented with more general, HTAP-like capabilities. This includes the announcement by Snowflake of Unistore for transactional workloads, and MongoDB’s new columnstore index for analytical workloads.

Unlike these systems, SingleStoreDB was designed from its very early days to be a general purpose, distributed SQL database. This gives it an edge against some of the newer HTAP entrants that are bolting on new functionality to databases that were architected to target more specific workloads.

Even better, SingleStoreDB’s HTAP database is available for you to try today. Get started here.

Get the full research paper: Cloud-Native Transactions and Analytics in SingleStore

₁SIGMOD stands for “Special Interest Group for the Management of Data”. It’s one of the premiere database conferences for folks in both academia and industry to come together to share new ideas related to databases.

Feed: SingleStore Blog.
Author: .

Solution: Workspaces

Unique Design

Use Cases

“Workspaces are very exciting for us… we can now simply add and scale workloads across the organization’s most important data!” – Mauricio Aristizabal, Data Architect, Impact.com

Manage application & operational ML workloads with Workspaces

Design & Architecture

Try Workspaces

Feed: SingleStore Blog.
Author: .

Modern tech stacks require developers to juggle several different parts of their application: frontend UI, orchestration, APIs and the database.

As application data intensity increases, each of these components grows in complexity — but developers often find that solving the database problem is the most helpful long-term solution. Today, we’ll discuss some mistakes that developers make in designing their SingleStore databases for scaling to higher ingestion rates, lower latency queries and more concurrency.

Understanding Distributed SQL Databases

Before getting into some of the areas developers struggle with, it’s important to understand a few key concepts of distributed SQL databases and SingleStoreDB Universal Storage:

Shard Key: Partitioning data across n nodes is a concept you may be familiar with from other distributed SQL databases. The key used to partition data in SingleStoreDB is called a shard key.
Sort Key: Also known as the columnstore key, this index dictates how column segments are sorted in a Universal Storage table. This helps ensure segment elimination when accessing data.

Mistake #1: Choosing the Wrong Shard Key

Accidentally choosing the wrong shard key is the most common mistake that SingleStoreDB developers make. This quickly becomes apparent when you may not be getting the blazing fast query speeds you expected on your first try. If you’ve already used the query profiler in SingleStoreDB, you’ll be familiar with “Rebalance” and “Broadcast” operations that also may indicate a suboptimal shard key.

Shard keys that lead to unbalanced partitions are the most detrimental to query performance. If one partition has more data than another, it will be asked to do more work during SELECTs, UPDATEs and DELETES. In this case, partitions with the least data will be the least performant, dragging down the overall execution of the query. For example, let’s take this table that we have partitioned by `first` (representing a user’s first name):

CREATE TABLE people (
    user VARCHAR(24),
    first VARCHAR(24), 
    last VARCHAR(24),
    SHARD KEY(first)
);

Once we load data into the table, it ends up looking like this:

To fix this, we could try a few things:

Change the shard key to `user`, the highest cardinality column, so data is more evenly distributed across the partitions
Add an auto_increment column called `id` for even higher cardinality
Finally, we could also have a compound shard key of both options (`id`, `user`)

Keep in mind that once data is ingested into one table, you will have to run `INSERT … SELECT …` operations to copy that data into a table with a new shard key.

It is also very important to consider your query workload when selecting a shard key. For example, say we changed the schema in the above example to the following:

CREATE TABLE people (
    id INT AUTO_INCREMENT,
    user VARCHAR(24),
    first VARCHAR(24), 
    last VARCHAR(24),
    SHARD KEY(id,user)
);

Consider that we would like to join to a separate `address` table. We want to ensure that that table has a shard key matching the `people` table. This ensures the join happens on a single partition within the database, significantly reducing the amount of repartitions, broadcasts and overall network traffic required with the query.

Looking for more questions to ask when selecting shard keys? We’ve got plenty.

Mistake #2: Choosing the Wrong Sort Key

The concept of a sort key is more related to SingleStoreDB’s Universal Storage table type, rather than a distributed SQL concept like shard keys. For this reason, developers almost always miss this step! Missing a sort key can leave your Universal Storage tables completely unorganized and hard to scan when you query.

Universal Storage tables store columns in segments of up to 1 million rows at a piece. Without a sort key, data is stored in segments based on the order it’s ingested. However, when a sort key is defined, segments are organized into ranges of data which makes it incredibly easy to find. For example, consider this schema:

CREATE TABLE people (
    id INT AUTO_INCREMENT,               
    user VARCHAR(24),  
    first VARCHAR(24),  
    last VARCHAR(24),
    SORT KEY (user),   
    SHARD KEY (id)
);

Here is an example of how the sort key takes effect. Now, if we run a query like `select * from people where user LIKE `e%`` — the query engine will only scan this segment, rather than searching across all of the segments for the answer! This is called segment elimination.

Mistake #3: Mismatched Data Types Across Tables

Comparing mismatched data types can be a silent killer of query performance. When doing comparisons in queries, it is critical to make sure the matching happens across consistent data types. Not doing so can negatively impact your query results and performance.

Take for example a simple table with an `id` column:

CREATE TABLE t (id VARCHAR(50), PRIMARY KEY (id));
INSERT INTO t values ('123.0');
INSERT INTO t values ('123');
INSERT INTO t values ('0123');

This query:

SELECT * FROM t WHERE id = 123;

Returns:

This may not have been the intended result of the query.

Running this same command with the `EXPLAIN` command before the query would yield these two warnings:

WARNING: Comparisons between mismatched data types, which may involve unsafe data type conversions and/or degrade performance. Consider changing the data types, or adding explicit typecasts. See our mismatched data type documentation for more information.
WARNING: Comparison between mismatched data types: (`t`.`id` = 123). Types ‘varchar(50) CHARACTER SET utf8 COLLATE utf8_general_ci NOT NULL’ vs ‘bigint(20) NOT NULL’.

In this scenario, you could consider either using quotes around the id column in the query, or changing the datatype in the DDL. Of course, the user may be perfectly content with the query result as-is, but it’s important to note the potential risk.

Documentation: Handling Data Type Conversion Level

Summary

Developers love the various different knobs they can turn on SingleStoreDB Cloud, while also maintaining the simplicity of a managed cloud database. As you have seen, there are just a few things to look out for when trying out all of the cool features of a distributed SQL database with patented Universal Storage.

Fortunately, we have a team of SingleStoreDB engineers always standing by to help! Whether you’re just getting started or you’re ready to go to production with your app on SingleStoreDB, our engineers are standing by to assist you with shard keys, sort keys or any other technical questions you have!

Schedule your time to chat with SingleStore engineers today.

Try SingleStoreDB free

Feed: SingleStore Blog.
Author: .

Learn more about operational analytics for your database including what it is, why it benefits your organization and use cases.

Table of Contents

Operational analytics, or operational analytical processing, is a form of data analytics that is focused on improving business operations. It is easily distinguishable from other forms of analytics, as it’s carried out on the fly. This means that data generated from different parts of a business or system is processed in real time and instantly fed back into the decision-making arm of the business for strategic planning. Operational analytics has also been described in some circles as continuous analytics, a name that emphasizes the continuous nature of the analytics loop.

This form of business analytics has gained prominence in recent years due to the increasing rate of adoption of digital technologies and digitalization across every industry. Whether you’re a software developer, a data engineer, or a decision maker in your organization, you have a need to leverage real-time information from your IT infrastructure to make decisions that will benefit your bottom line. If set up correctly, operational data analytics will assist your organization in speeding up this decision-making process, giving you a competitive advantage in the market.

In this article, you will learn about operational analytics and its benefits for your organization. You’ll also look at the typical requirements needed for any database to be considered an engine for operational analytics.

What Is Operational Analytics and Why Do You Need It?

An operational analytics system is one that allows you to make quick decisions from streams of real-time data. It lets you receive data from multiple sources and sync that data directly to the interactive user-facing business intelligence tools, such as Braze, Salesforce and Marketo, that your team relies on for insights and decision-making.

Operational analytics shifts your focus from conventional analytics, which involves using software systems to understand data, to actually turning insights from your data into action to improve your bottom line. Usually, operational analytics makes use of the combination of data mining, machine learning and AI to help your organization make better decisions.

Operational Analytics vs. Traditional Analytics

Traditionally, business analytics is focused on providing decision makers a high-level overview of organizational key performance indicators (KPIs) on everyday operations for strategic purposes. The general idea has always been to aggregate data from different sources and then visualize this data to paint a picture of the current status of the business.

Operational analytics does not deviate from these basic principles of business analytics. In fact, it was developed as an improvement upon traditional analytics as businesses grew to require faster decision-making. The big differentiator is that operational analytics ensures complete integration between all your systems.

This makes your warehouse data a single source of truth and accessible across all tools used by the business, on both the technical and non-technical sides. Operational analytics is effectively the analysis of your organization’s day-to-day operational data.

As an example, operational analytics is at play when you automatically load your company’s product usage data into a customer relationship platform to provide actionable and real-time insights to your marketing team. The system empowers you to quickly react to any anomaly in the day-to-day business operations, such as when there’s a sudden drop in user engagement, and allows you to immediately implement initiatives that address the anomaly.

To apply traditional analytics to the previous example, you would require historical data, which would need to be pulled, processed and visualized, after which you would need to conduct meetings with several stakeholders before any serious action can be taken. This is a time-consuming process. Operational analytics significantly reduces the time needed for data processing and deliberation by ensuring your team reacts to customer behavior as it changes in real time.

How Will Operational Analytics Benefit You?

According to a poll conducted by Capgemini Consulting, over eighty percent of participants agreed that operational analytics contributes to driving profits and creating a competitive advantage. By investing in operational analytics, your company can benefit in several target areas, including:

Near-real-time decision-making
Trustworthy data from one central hub
Improved customer loyalty by reaching every customer at the right time
Having a consistent picture of the business in every tool
Improved efficiency for data teams since they can spend less time doing integration and more time on models and analyses

Use Cases of Operational Analytics

Almost every industry across the globe has adopted operational analytics for one purpose or another. It’s impossible to capture every industry scenario where operational analytics has found a practical application, but some popular applications include:

In financial institutions: Operational analytics is used by financial institutions for fraud detection and liquidity risk analysis. It takes on the task of analyzing consumer spending patterns, categorizing customers based on their credit risk, analyzing product usage patterns and much more, and uses that data to segment customers in fraud and risk classifications.
In oil and gas industries: One of several ways operational analytics is used in the oil and gas industries is to facilitate the preventive maintenance strategy of mechanical assets. Since real-time operational data of these assets can be streamed to maintenance management systems, it’s easy for maintenance teams to detect potential mechanical faults and take preventative actions before they occur.
In medicine: Nowadays, hospitals and emergency services employ operational analytics to predict the number of patients to be received daily and even prepare beds and prescriptions before patients arrive.

What Are the Database Requirements for Operational Analytics?

A well-deployed operational analytics system will have a pipeline for gathering data into your database, transformation steps to make sense of the data and a final key-value pair storage for quick retrieval by your frontline applications.

A database for operational analytics will have the following qualities.

Support for Complex Queries

Data-driven businesses need the ability to perform complex queries to offer solutions to business problems they face every day. For instance, the operational analytics engine of an online payment provider must execute complex queries in real time to monitor its global transactions for fraud detection. A typical operational database allows your application code to express complex queries in a declarative manner.

This allows your team to focus on what data to retrieve for your application logic without needing to worry about how the query is executed. This means that when real-time analysis is needed in an operational analytics database, your developers do not have to embed complex data logic like join optimizations, aggregations, sorting or relevance in the original application code. The database should support these operations to ensure fast and efficient processing of information from multiple sources.

A SQL database is an example of a database that allows declarative queries for complex operations on data.

Low Data Latency

A low-latency database is a database management system (DBMS) designed specifically for high performance and near-zero lag time for end users. Latency itself measures the interval it takes for a database to receive and execute a query.

The databases that support operational analytics are designed to store streams of data that come at varying rates. They are optimized for high-throughput operations, and an update to any record in the database is usually visible within seconds. This ensures high database availability with no service interruptions.

High Query Volume

As stated earlier, operational analytics engines are built for a high-throughput operation. Depending on their use case, it’s common for some businesses relying on operational analytics to execute thousands of concurrent queries every second.

A financial institution, for example, needs to simultaneously process enormous numbers of transactions for multiple users in real time. This means that hundreds or thousands of database queries must be executed in parallel for your user-facing fraud detection application to flag fraudulent transactions.

For your database to be effective for operational analytics, it must be capable of processing a high number of queries simultaneously without compromising on performance.

Live Sync with Various External Sources

Your organization probably has different sources of data that need to talk to each other to maintain a single source of truth. An operational analytics database must have inherent mechanisms that will allow it to connect and continuously sync with these multiple data sources. Your team should be able to easily incorporate multiple applications and services from different arms of the business without losing the state of the database. This removes any data silos and helps your team attain data consistency across all systems.

Mixed Types

An operational analytics database must be capable of storing data of mixed types in the same database field. With the low-latency requirements for operational databases, your operational analytics database should be able to store new data without having to transform them to a single data type at write time. In databases without this capability, the additional layer of data cleaning can slow down your data ingestion.

Conclusion

In this article, you learned that operational analytics is a form of business analytics that helps you draw actionable insights from your real-time operational data. You also saw how operational analytics is different from the conventional business analytics you might be used to, and looked at some of its use cases across several industries. Additionally, you learned what to look for in an operational analytics engine when your organization is investing in one.

SingleStore is a real-time, distributed SQL database that meets all the requirements for operational analytics mentioned in this article, and many more. It provides fast, scalable analytics across all of your operational data. We provide global corporations with intelligent databases that can simultaneously run transactions and analytics, allowing clients to focus on running their businesses. You can try out SingleStoreDB Cloud for free.

Feed: SingleStore Blog.
Author: .

Get to know SingleStoreDB, the fastest-growing distributed SQL database to power real-time applications and analytics.

In today’s data-driven world, two things are certain: Customers want their data, and they want it fast. With a unified data engine for transactional and analytical workloads, SingleStoreDB powers those fast, real-time analytics and applications customers expect. How? SingleStoreDB is built with the key features and capabilities necessary to truly deliver up-to-the-minute insights — no matter the query complexity, size or number of users. It’s where fast, unified and resilient converge to make real time a reality.

But let’s take a step back: What exactly is SingleStoreDB? What makes it unique, and why is it the right database for real-time analytics and applications? Our latest webinar, “Introduction to SingleStoreDB” takes a closer look at these things — and more. Here are the highlights.

The challenge

We see modern applications all around us today. “We all are now in the digital service economy,” says Domenic Ravita, VP of Product Marketing & Developer Relations at SingleStore. “We can have anything we want instantaneously…and that’s not just for consumer apps — this is how business is done.”

These applications are prevalent in nearly every industry, from cybersecurity to IoT, fintech, eCommerce and more. To run efficiently, modern apps must also have the ability to:

Access to real-time data
Deliver fast, interactive customer experiences
Scale effortlessly
Run anywhere, anytime

Yet modern, real-time applications also come with complexities and challenges — something organizations tend to solve for this by constantly adding (or ‘stitching’) various technologies and data stores together. This includes open-source databases like MySQL, PostgreSQL and MongoDB, as well as data warehouses like Snowflake.

The problem? These individual data stores simply aren’t powerful enough to deliver the real-time experiences for your applications, APIs and dashboards. And, constantly adding new technologies to accommodate required functionalities ends up being extremely costly for businesses.

The solution

“What SingleStore provides is a solution and anecdote to these modern application challenges,” says Ravita. “We’re a database built to address these kinds of performance, scale and concurrency challenges.” As a unified, distributed, relational multi-model cloud database designed for real-time applications and analytics, SingleStoreDB solves three core needs:

Speed. SingleStoreDB delivers low-latency, single-digit millisecond responses on user queries — and has the power for parallel streaming ingestion of millions of events/second.
Simplicity. SingleStoreDB is multi-model and built to handle various data types (including JSON, time-series, geospatial and full-text search), and runs in multi- or hybrid-cloud environments. And, SingleStoreDB features familiar SQL tooling and is MySQL wire-protocol compatible — simplifying your application architecture.

Live Demo: Real-Time Marketing Analytics With SingleStoreDB

As mentioned earlier, SingleStoreDB’s functionalities serve a wide variety of industries, including adtech, martech, fintech and beyond. To demonstrate exactly how it works, SingleStore Director of Solutions Engineering Sarung Tripathi walks you through how to apply it to a digital marketing example.

In his application, Tripathi demonstrates how you can use SingleStoreDB to serve ads to users based on their buying behaviors, based on the following criteria:

Request history

You’ll get an in-depth look at how to connect to SingleStoreDB, setup the schema, ingest data, set-up offers and more — and learn how you can get $500 in free SingleStoreDB credits. Want to run it yourself? You can also check out the code on Github.

Introduction to SingleStoreDB is available on demand now.

Try SingleStoreDB free.

Feed: SingleStore Blog.
Author: .

Getting a MySQL error 2002? We break down what this error means, and steps you can take it resolve it.

MySQL error 2002 refers to connection problems that arise either at the time of connection, or when a query is being executed. MySQL allows a dedicated server connection by using a socket. The official error reference is:

MySQL Error 2002
(CR_CONNECTION_ERROR) Can't connect to local MySQL server through socket '%s' 
(%d)

MySQL Error 2002 (hy000) — What Are the Causes?

The issue can be thrown due to various reasons, including:

MySQL server crash. If the MySQL server crashes for any reason, the socket connection established in the handler will be broken, resulting in error 2002.
Access issues. If authenticated user credentials are revoked post connection, the socket connection will be aborted.
Version conflicts. Conflicts in configurations, versions and query formats could also lead to error 2002.

MySQL Error 2002 (hy000) — Solutions

The following are possible solutions for error 2002 (based on the situation):

Validate the MySQL server status. Check if the MySQL server is functional — if not, be sure to start the server. On occasion, restarting the server also fixes the problem. If the issue persists, check the access and error log to uncover specific problems and take appropriate action.
Fix any configuration conflicts. If the MySQL is working properly, check configuration for access control, MySQL may be blocking your database handlers access. Once configuration is fixed remember to restart MySQL server for changes to take effect.

Connection errors, like MySQL error 2002 can break the flow of the entire service. However, by adhering to best practices and stability precautions, you can work to avoid the error.

SingleStoreDB

Additional Resources

Feed: SingleStore Blog.
Author: .

Find out more about what happens during a MySQL injection attack, where your database might vulnerable and what you can do to prevent it.

An injection attack uses available paths to retrieve data from the database, and either hijack or attack the integrity of the data. Injection attacks are also used to scrap all privileged database information — like lists of users and their personal information.

One of the most common ways for an injection attack to work is by using the flaws of the implementation and introducing a query inside the input. The code is then executed and the attacker can retrieve the target from the response.

What Are The Ramifications of a MySQL Injection Attack?

The following highlight a few critical ramifications of a MySQL injection attack:

Query parameter possibilities. Attackers can utilize trial and error tactics to determine the possibilities of injection they can achieve — and if they can fully attack the database.
Access hijacking. Access hijacking is done for numerous reasons, like exposing site vulnerabilities to general users, data theft and server hijacking.
Critical data theft. One of the most common reasons for injection attacks is to steal secure, critical data including user profile information and financial data.
Denial of service. Denial of Service (DOS) is the most commonly known services hack. Service is blocked for regular or subscribed users — which for some organizations, can lead to serious financial losses.
Traps. Once a pattern has been established traps can be set for the system, allowing hackers to execute damaging queries at a later time.

How to Prevent MySQL Injection Attacks

You can take the following steps in MySQL to secure your system against injection attacks:

Input validation. Define a set of possible inputs in the implementation, and validate all inputs before executing a query.
Input checking functions. Define a set of characters that are not allowed as a parameter, and use prepared statements wherever possible.
Validate input sources. Only a set of pre-defined sources should be allowed to access the database — all others requests should be blocked.
Access rights. A predefined access list should be maintained, and each access instance should be logged at the application layer.
Security precautions. When setting up your database, be sure to configure it with proper security precautions in the production environment.

SingleStoreDB

SingleStoreDB is MySQL wire compatible and offers the familiar syntax of SQL, but is based on modern underlying technology that allows infinitely higher speed and scale versus MySQL. This is one of the many reasons SingleStore is the #1, top-rated relational database on TrustRadius.

Data Security in SingleStoreDB

SingleStore takes an all-encompassing approach to security, reliablity and trust. From industry-leading security certifications to full access controls, we protect the integrity of your — and your customers data.

Read more about our comprehensive data security.

Additional Resources

Feed: SingleStore Blog.
Author: .

A developer’s life is anything but a sitcom. But if you’re building database functionality into an application or an operating environment, you eventually may run into a host of challenges with single-node open source databases that, if you weren’t tasked with fixing them, might seem comical.

In general, database performance problems relate to data ingestion, scaling, speed and not being able to easily store all the different kinds of data you want. In this blog you’ll get a personal view into what those problems look like, as developers share their experiences with the three signs of outgrowing popular open-source databases like MySQL, PostgreSQL and MariaDB. We’ve also included some tips on what to look for in a new database solution to ease the pain.

Sign 1: Application Performance Hits a Wall

Jack Ellis is co-founder of Fathom, Inc., a SaaS firm that believes website analytics should be simple, fast and privacy focused. Fathom delivers a simple, lightweight, privacy-first alternative to Google Analytics. Jack describes how his application’s performance suffered because he had maxed out MySQL:

“Despite keeping summary tables only (data rolled up by the hour), our [MySQL] database struggled to perform SUM and GROUP BY. And it was even worse with high cardinality data. One example was a customer who had 11,000,000 unique pages viewed on a single day. MySQL would take maybe 7 minutes to process a SUM/GROUP query for them, and our dashboard requests would just time-out. To work around this limitation, I had to build a dedicated cron job that pre-computed their dashboard data.”

Read the impact story: Why Fathom Analytics Ditched MySQL, Redis and DynamoDB

Josh Blackburn is co-founder and head of technology at IEX Cloud, a data infrastructure and delivery platform for financial and alternative data sets that connects developers and financial data creators. Josh’s team builds high-performance APIs and real time streaming data services used by hundreds of thousands of applications and developers. He had hit a similar wall with MySQL running in Google Cloud:

“We average about 500,000 to 800,000 data ops per second, typically during market hours. These could be really tiny requests, but you can see our ingress and egress rates; we’re consuming a lot of data from multiple resources, but we’re also passing a lot of that out the door… In our case, we’ve got to keep up not just with the stock market, with real-time prices, but also with everyone coming in and needing all that data in real time.”

Josh summed up his data ingestion challenge, “We were in a tight spot to find something that would scale and had better performance, especially on the ETL side, because we’re loading hundreds of gigs of data every day.”

Read the impact story: IEX Cloud Speeds Financial Data Distribution 15x With SingleStore

Sign 2: An Open-Source Database Doesn’t Support Your Business Needs

Gerry Morgan is lead developer at dailyVest, a fintech company using 401(k) participant data and analytics to improve the health and performance of retirement plans. Each month, over 7 million investors and plan participants can access digestible insights delivered via visual dashboards.

Data volumes are growing at 36% a year, fueled by billions of transactions, and Gerry found that dailyVest’s Azure SQL database couldn’t support business growth. He said:

“[We were] not just increasing resource requirements in our cloud environment, but also increasing costs [of Azure Cloud resources]… We were also seeing some performance degradation in Azure SQL. Not so much that our customers would have noticed, but we noticed there was some drop off in speed in our ingestion of data. We wanted to improve our ETL operation, but at the same time improve the customer experience — all customers will be happy if you make things faster, even if they haven’t noticed if things were particularly slow.”

Read the impact story: dailyVest Empowers 401(k) Plans for 7 Million Plan Participants

Mohammed Radwan is head of engineering at Foodics, a restaurant management software company serving more than 22,000 establishments in 35 markets. The company processes more than 5 billion orders per year, offering dashboard analytics for business owners and managers. At first, Foodics used a combination of CitusDB for and MySQL to power the business, later swapping out MySQL for a commercial version of PostgreSQL.

Foodics ran into reliability problems with CitusDB, experiencing outages that lasted three hours at a time up to four times per month. Only 200 users could concurrently use the existing system. Foodics had 5,000 customers, but downtime and a lack of fast data were accelerating churn. Although the company had just received $20 million in Series B funding in 2021, the unreliable system limited growth and put future funding at risk. Mohammed said:

“Like many tech companies, we started with MySQL. It was compatible with what we had and was easy to use. It fulfilled its purpose for a while, but when we needed to grow and expand, MySQL couldn’t enable that.”

When experiencing scaling issues with MySQL or other open source databases, developers often turn to sharding middleware or NoSQL. These approaches, however, can compromise the performance of ACID-compliant transactions and complex joins, particularly in high-volume production environments.

Sign 3: You’re dealing with database sprawl

Here, the writing on the wall is clear: if you need to incorporate multiple data types into your application or environment – such as time series, JSON, documents and other specialty data types – you are going to need to spin up specialty databases to contain them. These separate databases will need to be connected, maintained and upgraded, creating database sprawl — and exponential complexity.

If your single-node database supports only standard SQL numeric data, you will likely experience significant growing pains if you try to augment it to support multiple data types.

What should you look for in a replacement database?

Most single-node open source growing pains can be solved by a database that offers:

Streaming data ingestion overcomes open source databases’ inability to ingest, process and analyze streaming data necessary to power modern interactive SaaS applications and production environments.
Low-latency query performance solves query performance problems as data or concurrency demands grow.
Limitless scalability addresses the struggle that single-node architectures face when attempting to scale as business or user volumes grow.
Robust analytical abilities to overcome open source databases’ basic to non-existent analytical capabilities — driving fast, interactive user experiences.
Hybrid workload handling to eliminate the need for separate OLTP and OLAP systems; instead, these hybrid workloads can be handled in a single, unified system.

Your list may be much more granular. Jack at Fathom had a lengthy list of non-negotiables for any database he might consider to replace MySQL:

It must be ridiculously fast
It must grow with us. We don’t want to be doing another migration any time soon
It must be a managed service. We are a small team, and if we start managing our database software, we’ve failed our customers. We’re not database experts and would rather pay a premium price to have true professionals manage something as important as our customers’ analytics data
It must be highly available. Multi-AZ would be ideal, but high availability within a single availability zone is acceptable too
Cost of ownership should be under $5,000/month. We didn’t want to spend $5,000 off the mark, as this would be on top of our other AWS expenses, but we were prepared to pay for value
The software must be mature
Companies much larger than us must already be using it
Support must be great
Documentation must be well-written and easy to understand

For Mohammed, delivering 24/7 resource availability was paramount. He said:

“We can’t take time off or delay reports. The most important thing for us is concurrency. As we grow, we need to ensure that our customer base grows with us. We needed a database that allows for seamless reporting without worrying about how many customers are using it all at once.”

After all the challenges Foodics had weathered, Mohammed needed a database that would offer:

The ability to place all analytics-related data in a single unified data store
A performant analytics engine with columnstore to democratize data access
Real time and near real-time analytics with very fast reads and quick ingestio
A multi-tenant architecture to use a single database for all customers
Support for a large and growing customer base in the tens of thousands
100 concurrent queries per second, or approximately 1% of Foodics’ customer base at the time, to support the large number of reports being generated
The capability to process billions of orders and 5 million transactions per month
Scale up and out capabilities to support Foodics’ accelerated growth strategy
High availability with almost zero downtime

Developers Choose MySQL Wire-Compatible SingleStoreDB

Fathom, IEX Cloud, dailyVest and Foodics all chose SingleStoreDB, a real-time, distributed SQL database, to replace open source database technology. Mohammed’s reasons why are a common theme:

“We are a small team, so we did not want to spend time tuning a database. We wanted something that just worked out of the box. For this reason, we went with SingleStoreDB Cloud running on AWS. With SingleStore, we can just plug and play and do everything we need to empower our customers. It allows us to focus on what we are really here to do: serve our customers.”

SingleStoreDB is MySQL wire-compatible, making it incredibly easy to migrate from any flavor of MySQL (including AWS RDS, Google Cloud SQL, Azure MySQL or others). It supports familiar SQL syntax, so developers don’t need to learn a completely new technology to get started.

Most developers can quickly complete their migration and get started with SingleStore in hours or a few days. To learn about migration, check out these resources:

After Migration, a Bigger, Better House

All of the developers experienced major improvements in speed, performance, scalability and flexibility after they migrated to SingleStoreDB. Here’s how Jack tells Fathom’s “after” story:

We no longer need a dedicated data-export environment…We do our data exports by hitting SingleStore with a query that it will output to S3 for you typically within less than 30 seconds. It’s incredible. This means we can export gigantic files to S3 with zero concern about memory. We would regularly run into data export errors for our bigger customers in the past, and I’ve spent many hours doing manual data exports for them. I cannot believe that is behind me. I’m tearing up just thinking about it.
Our queries are unbelievably fast. A day after migrating, two of my friends reached out telling me how insanely fast Fathom was now, and we’ve had so much good feedback.
We can update and delete hundreds of millions of rows in a single query. Previously, when we needed to delete a significant amount of data, we had to chunk up deletes into DELETE with LIMIT. But SingleStoreDB doesn’t need a limit and handles it so nicely
We used to have a backlog, as we used INSERT ON DUPLICATE KEY UPDATE for our summary tables… [W]e had to put sites into groups to run multiple cron jobs side by side, aggregating the data in isolated (by group) processes. But guess what? Cron jobs don’t scale, and we were starting to see bigger pageview backlogs each day. Well, now we’re in SingleStore, data is fully real time. So if you view a page on your website, it will appear in your Fathom dashboard with zero delays.
Our new database is sharded and can filter across any field we desire. This will support our brand new, Version 3 interface, which allows filtering over EVERYTHING.
We are working with a team that supports us. I often feel like I’m being cheeky with my questions, but they’re always so happy to help. We’re excited about this relationship.
SingleStoreDB has plans up to $119,000/month, which is hilarious. That plan comes with 5TB of RAM and 640 vCPU. I don’t think we’ll get there any time soon, but it feels good to see they’re comfortable supporting that kind of scale. They’re an exciting company because they’re seemingly targeting smaller companies like us, but they’re ready to handle enterprise-scale too.
And as for price, we’re spending under $2,000/month, and we’re over-provisioned, running at around 10% – 20% CPU most of the day.

Josh from IEX Cloud summed up, “SingleStore enables us to do monitoring and analysis in the same system that houses our historical data, and this creates enormous efficiencies for us. We’ve been able to consolidate multiple databases, run our platform faster, and speed the onboarding processes for new data sets.”

If you’ve outgrown your single-node open source database and are ready to move into a bigger, better house, try SingleStore for free today.

Feed: SingleStore Blog.
Author: .

Learn more about MySQL data sharding — including what it is, specific sharding techniques, pros and cons of using sharding, and more.

As data increases in MySQL, it’s not uncommon for schema performance to deteriorate. This deterioration is caused by:

Increase in Throughput. As data volume increases, the size of indices also grows — and at a certain point, simple queries have unprecedented return times.
Data Redundancy. One traditional optimization technique is to repeat data instead of using foreign keys, reducing return time for queries. But, it’s a poor practice that diminishes the purpose of relational databases.
Bottlenecks. As data volume increases bottlenecks occur in batched tasks, backups or any intensive task associated with the database. This can result in an overall downgrade in performance and efficiency of the system.
Storage Congestion. Single server optimization exponentially increases database size, which considerably impacts database performance.

Sharding can be used to overcome these challenges. Data sharding is a technique where data is split into mutually exclusive segments, which is achieved by splitting tables into horizontal chunks. In a distributed environment, these chunks can be placed on partitions — and then nodes — which would balance the throughput.

MySQL Data Sharding: What Are the Sharding Techniques?

Before we discuss practical steps for sharding, let’s briefly take a look at different types of sharding:

Hash Sharding. Hash functions are used for distribution of data partitions, and placement of data in those partitions.
Range Sharding. In range sharding, a particular length is defined for a partition, which consists of a range of keys. Partitions do not need to be equal in length.
Geo Sharding. In this data split, stored procedures are used to format data into its required form, and then distributed among partitions.

In MySQL, sharding can be achieved with the following steps:

Key Selection. This step can be deployed using either hash or range techniques (depending on the use case). Security-intensive applications often use hash functions.
Schema Modifications. Depending on the Key section, the schema needs to be modified. This can be accomplished with ALTER commands.
Distribution on Nodes. A scheme needs to be created at the application layer, which places data in the correct partition and retrieves it when required.

MySQL Data Sharding: What Are the Pros & Cons?

The follow are pros for MySQL data sharding:

Sharding reduces throughput considerably when applied properly.
Sharding can help reduce your storage footprint.
Sharding allows node balancing — and if shards are optimally placed, users can have access to relevant data and the ability to handle complex queries.

There are some instances where MySQL data sharding is not the best approach, and only presents further challenges:

Establishing an analytics interface over a sharded database is very difficult due to limitations on JOINS and Aggregations.
Sharded databases in MySQL are mostly ACID (Atomicity, Consistency, Isolation and Durability) compliant.
MySQL does not provide automated sharding — sharding is normally implemented at the application layer. That means development teams are responsible for the entirety of sharding and maintenance. As such, MySQL is not suited in situations where sharding is required.

SingleStoreDB

Additional Resources

Feed: SingleStore Blog.
Author: .

Looking to expand your developer portfolio? Add the world’s fastest distributed SQL database for real-time analytical applications to your skillset.

Built with a unique, three-tiered storage architecture and designed for millisecond response times, SingleStoreDB eliminates performance bottlenecks and unnecessary data movement.

Brands like Hulu, Uber, Comcast and more choose SingleStoreDB to reduce operational and design burdens of their real-time analytical applications, supercharging limitless data experiences. Now, we’re happy to introduce the SingleStoreDB Certified Developer Exam — and flex your expertise developing applications on the #1 database for unified operational and analytical processing.

A SingleStoreDB certification gives you an advantage over other job seekers — as well as recognition and opportunity for advancement in a crowded database and developer market.

Who can get certified?

The SingleStoreDB Certified Developer Exam is open to anyone interested in expanding their certification portfolio, and demonstrating an in-depth, well-rounded expertise of developing applications on the world’s fastest distributed SQL database for real-time analytical applications.

What is covered in the exam?

The exam questions have been developed and vetted by our own team of engineers. Designed to help you gain a well-rounded, comprehensive knowledge of SingleStoreDB, you’ll be tested against four key domains:

Ingest. Demonstrate your knowledge of features including SingleStoreDB Pipelines, transforms, transactions and connectivity.
Develop. Demonstrate your knowledge of SingleStoreDB Procedural SQL, DML, DDL and various data types.

Want to dive deeper into SingleStoreDB ahead of the exam? Check out SingleStore docs.

What Else Can I Expect From the Exam?

Here are a few additional details to know ahead of registering for the exam — including testing time, price, study materials and more:

Length: The exam consists of 50 questions across the four previously described domains (architecture, design, ingest and develop).
Timing: You’ll have 80 minutes to complete the exam.
Format: The exam is proctored and completed online.
Price: The exam fee is $100.
Score to Pass: A passing score is 750 on a scale from 100-1000. In the event you don’t pass, there is a mandatory waiting period between attempts: 10 days after your first attempt, 30 days after your second attempt and 60 days after all subsequent attempts.
Study Guide & Materials: Take a look at our comprehensive exam guide here.
Duration of Certification: Your SingleStoreDB Developer Exam Certification is valid for one year.

Ready to become an expert on the world’s fastest database? Register for the SingleStoreDB Certified Developer Exam today.

Feed: SingleStore Blog.
Author: .

SingleStore is announcing the data from the first-of-its-kind Data Intensity Calculator — already used by 125 companies. Find out what their results were in this blog.

Only a few months have passed since we introduced the groundbreaking Data Intensity Calculator. Already, 125 companies have used this online tool to measure their data intensity.

Half of the applications that these organizations tested are considered highly data intensive. And nearly 65% of these companies — which span industries including finance, retail and tech — said they expect their data will grow between 10% and 100% within the next year.

But before we get into the nitty gritty, let’s revisit what data intensity means, how we measure it and why understanding and addressing data intensity are so critical.

Assessing Data Intensity Helps You Better Understand Your Infrastructure Requirements

Data intensity measures the data requirements of an application. It’s important to get a handle on the data intensity of your applications. Only then will you know what infrastructure you need to enable the right level of application end-user experiences right now and going forward.

We make gauging the data intensity of your applications easy. The calculator we launched in May allows you to assess your applications’ data intensity for free — and in just three minutes.

The SingleStore Data Intensity Index measures data intensity based on five considerations:

Concurrency; the application’s requirement to support a large number of users or concurrent queries, without sacrificing the SLA on query latency
Data size; the volume of the data sets needed to feed the application
Query complexity; the extent to which the application must handle simple and complex queries
Data ingest speed; the application’s need to ingest high volumes of data at speed
Query latency; the amount of time it takes to execute a query and get results

Now that we’ve run through this data intensity refresher, let’s take a closer look at the findings.

Applications Are Showing Significant Complexity and Concurrency, Growing Data Requirements

The vast majority of applications assessed using the Data Intensity Calculator registered a high or medium level of complexity. Of the 124 applications tested, 106 fell into these categories. Thirty-four of those applications are considered highly complex, requiring six or more joins, and 72 required three to five joins. The remaining 18 applications needed just one or two joins.

Concurrency requirements of applications also came in on the high end. Just six of the 124 Data Intensity Calculator entries said the number of concurrent queries that their database typically handles was less than five queries. More than twice as many (14) said more than 1,000 queries. Additionally:

19 of the total had five to 10 concurrent queries
61 had 10 to 100 concurrent queries
24 had 100 to 1,000 queries

Data size was another category in which the Data Intensity Calculator registered some of the highest application demands. Forty-one of the applications tested had a data size of 1 to 10 terabytes (TB). About ⅕ (23 out of 124 applications) employed 10-50 TB of data. Nine of the applications currently rely on 50 to 100 TB of data to get the job done. And, a dozen of the applications needed more than 100 TB of data to deliver on their intended experiences.

But that is just a fraction of the data that organizations expect their applications to need in the near future. Over the next 12 months:

40 companies said their data will grow 10-30%
30 organizations expect their data to increase 30-60%
11 organizations anticipate a whipping growth of 60-100%
10 companies say their data is poised to exceed 100% data growth

Businesses Must Also Support TheirApplications’ Data Ingest Speeds and Latency Control Needs

When it comes to data ingest:

28 of 124 of the applications required less than 1,000 rows per second
40 of the applications needed 1,000 to 10,000 rows per second ingest rates
33 demanded 10,000 to 100,000 rows per second ingest performance. About half of those (16) needed 100,000 to 1 million rows per second ingest,
Seven required more than 1 million rows per second.

In terms of latency, the largest group of applications (44) needed to keep delays between 100 milliseconds and 1 second. Thirty-two applications had more stringent latency requirements, between 10 and 100 milliseconds. Eight applications needed latency of 10 milliseconds or less.

Controlling latency of data-intensive applications is key to delivering the real-time experiences that customers now expect. But, as IDC notes, legacy systems that rely on batch-based processing don’t sync in real time — these processes are more like sending a letter than a text.

Data delays put business at risk by adversely impacting customer experience, leading to customer annoyance, churn and lost revenue, the global market intelligence firm says. IDC explains that real time is imperative for enterprise intelligence and a better customer experience because business happens in real time; the best kind of enterprise intelligence allows organizations to make decisions based on the most current data; and streaming data use cases exist across all industries, from manufacturing to financial, and retail to healthcare.

Data-Intensive Applications Call for a New Kind of Data Infrastructure

The bottom line is that a fair share of today’s applications are already data intensive, and all signs suggest that the world will see a whole lot more of these applications in the future.

IDC expects new data that is created, captured, replicated and consumed to more than double between now and 2026. McKinsey & Co. notes that “tech customers’ needs and expectations are rapidly evolving” in the wake of the pandemic-fueled digital transformation. And data-intensive applications have become the lifeblood of today’s most competitive businesses.

Growing data intensity and user expectations call for a new approach to data infrastructure. One that brings transactional and analytical capabilities into a single database that can handle rapidly growing volumes of data, supports concurrency, ingests data fast, is designed for both simple and complex queries, and can execute queries and provide results with minimal latency.

Legacy and speciality databases aren’t up to the task. SingleStoreDB is the one database that can do it all.

Want to learn how data intensive your own applications are? Try out our free Data Intensity Calculator.

Want to learn more about what infrastructure you need to deliver on the promise of better end-user experiences and enterprise intelligence? Schedule your time to chat with SingleStore engineers today.

Feed: SingleStore Blog.
Author: .

SingleStoreDB’s new Code Engine — Powered by Wasm empowers users to utilize user-defined functions and table-valued functions written in C, C++ and Rust with more supported languages on the way. This is a quick guide on how to get started using this feature on SingleStoreDB with a local Rust environment!

WebAssembly (or Wasm for short), is a binary instruction format designed as a portable compilation target for programming languages. With SingleStoreDB’s Code Engine — Powered by Wasm, users can take advantage of reusing native code in a sandboxed environment, while running the function all inside a SingleStoreDB Workspace at blazing speeds.

I’ll demo how to get started creating Wasm files locally with Rust and VS Code, and then we will upload the files into an AWS S3 Bucket and load/call the Rust user-defined functions on a SingleStoreDB Cloud workspace.

Let’s start with our local Installs:

Install v.s. code. The download link can be found here

Download the WASI SDK (in this case we used wasi-sdk-16.0-macos.tar.gz for Mac)

Extract the WASI SDK file from your Downloads folder

tar -xzvf wasi-sdk-16.0-macos.tar.gz

Move the WASI SDK file in the folder of your choice (in my case, I placed it in the opt folder)

Ensure that your $PATH variable is prefixed with this location when you are running the build commands suggested in this tutorial

export PATH=/opt/wasi-sdk-16.0/bin:$PATH

Download and install the Rust toolchain

curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh

At the prompt, select option 1

Configure current Rust shell

source $HOME/.cargo/env

Install wit-bindgen program. This will generate our rust bindings which allows the importing of WASI APIs that run during runtime. Normally, Wasm only allows for int 32, int 64, float 32 and float 64. By exporting the Rust to Wasm bindings it allows for use of the canonical abi for more complex data types (such as strings).

cargo install --git https://github.com/bytecodealliance/wit-bindgen 
wit-bindgen-cli

Install cargo-wasi. This is a subcommand for cargo that provides convenient defaults for running the Rust code on the wasm32-wasi target.

cargo install cargo-wasi

Great! Our local environment has everything it needs to start creating our Wasm files!

VS Code

Create a new empty folder (mine is called demo_wasm) and open VS Code

Open the folder by pressing the F1 key, and select File: Open Folder…

Then open the folder:

Create a new file called power.wit and add the wit specifications. In this case we are creating a “power-of” function that takes a base number and exponent and returns a sign32 integer.

power-of: func(b: s32, exp: s32) -> s32

Then save the file with command + S

In the terminal window in VS Code, run the following command to initialize the rust source tree

cargo init --vcs none --lib

This will create our src folder with the lib.rs file, as well as a Cargo.toml file

Edit the Cargo.toml file with the right function name and dependencies, and save

[package]
name = "power"
version = "0.1.0"
edition = "2018"
[dependencies]
wit-bindgen-rust = { git = "https://github.com/bytecodealliance/wit-bindgen.git", rev = 
"60e3c5b41e616fee239304d92128e117dd9be0a7" }
[lib]
crate-type = ["cdylib"]

Open the lib.rs file and replace the default add function with the power function logic then save

wit_bindgen_rust::export!("power.wit");
struct Power;

impl power::Power for Power {
    fn power_of(base: i32, exp: i32) -> i32 {
        let mut res = 1;
        for _i in 0..exp {
            res *= base;
        }
        res
    }
}

Build the .wasm file

cargo wasi build --lib

New .wasm file can be found in target/wasm32-wasi/debug folder

cd target/wasm32-wasi/debug/
ls

We have our power.wit and power.wasm file created! We can now move these to object storage (I’ll be using an AWS S3 bucket).

AWS S3

Open AWS console

Navigate to S3 Service

Create or navigate to your S3 bucket

Upload your .wasm and .wit file for your function by clicking the Orange “Upload” button to enter the upload page

Add the .wasm and .wit files in the singlestore-wasm-toolkit folder by dragging and dropping. Or, you can upload using the “Add Files” button. Then hit “Upload”

The upload should be successful

Setup the SingleStoreDB Cloud Workspace

Select the name of the workspace, your cloud provider and region of choice, password and then confirm

Enter the Workspace Name and desired cluster size. Then click “Next” and create the workspace in the next screen

When the Workspace is finished deploying, select the “Connect” dropdown, and click on the SQL Editor

Setup is complete

Load and run the Wasm function in Workspaces on SingleStoreDB Cloud

Create and use the database

create database wasm_demo;
use wasm_demo;

Create the Wasm function

-- Wasm udf power of
create function `power_of`
as wasm
from S3 's3://dlees2bucket/power.wasm' -- your S3 bucket location goes here
CONFIG '{"region": "us-east-1"}' -- specify your aws region
CREDENTIALS '{"aws_access_key_id": "your_aws_access_key_id_goes_here",
"aws_secret_access_key": "your_aws_secret_access_key_goes_here",
"aws_session_token": "your_aws_session_token_goes_here_if_applicable"}'
with wit from S3 's3://dlees2bucket/power.wit' -- your S3 bucket location goes here
CONFIG '{"region": "us-east-1"}' -- specify your aws region
CREDENTIALS '{"aws_access_key_id": "your_aws_access_key_id_goes_here",
"aws_secret_access_key": "your_aws_secret_access_key_goes_here",
"aws_session_token": "your_aws_session_token_goes_here_if_applicable"}';

Use the power_of Wasm user-defined function

SELECT `power_of`(4, 3);

Congratulations! You have just created your first WASM UDF in SingleStoreDB!

Bonus Wasm Table-Valued Function

I’ve also uploaded wasm table-valued function files on my AWS S3 bucket where we split a string based on a character.

For TVFs, all we need to add is the RETURNS TABLE line in the Wasm function create statement. Here is a function we are calling to split a string based on a character and returning the first index of each string.

-- Wasm TVF split string
CREATE FUNCTION `split_str` RETURNS TABLE -- Add RETURNS TABLE for table-valued functions
AS wasm
from S3 's3://dlees2bucket/split.wasm'-- your S3 bucket location goes here
CONFIG '{"region": "us-east-1"}' -- specify your aws region
CREDENTIALS '{"aws_access_key_id": "your_aws_access_key_id_goes_here",
"aws_secret_access_key": "your_aws_secret_access_key_goes_here",
"aws_session_token": "your_aws_session_token_goes_here_if_applicable"}'
with wit from S3 's3://dlees2bucket/split.wit'-- your S3 bucket location goes here
CONFIG '{"region": "us-east-1"}' -- specify your aws region
CREDENTIALS '{"aws_access_key_id": "your_aws_access_key_id_goes_here",
"aws_secret_access_key": "your_aws_secret_access_key_goes_here",
"aws_session_token": "your_aws_session_token_goes_here_if_applicable"}':

Use the split Wasm TVF

In Summary:

Setting up your local machine can easily create powerful Wasm User-defined Functions in Rust
We shared a step-by step-guide for creating the Wasm specification .wit file and the Rust binary .wasm file
These files were uploaded to object storage (AWS S3) and were used to create our Wasm functions inside a newly created Workspace

Wasm is an exciting new technology that we’ve added to the SingleStoreDB ecosystem. This empowers developers in executing functions directly on SingleStore’s distributed system at runtime with near native performance in a secure environment.

Users can efficiently leverage existing code that is compiled to Wasm in a secure environment right in SingleStoreDB. This eliminates the need to rewrite the same complex logic into SQL saving time at near native performance.

Here is a repo with the latest programming languages that support Wasm.

Try SingleStore for free today.

Feed: SingleStore Blog.
Author: .

SingleStore has had a blockbuster first half of 2022, despite uncertain times for business and the economy as a whole.

1. We closed a funding round of $146 million

Our Series F-2 round, led by Goldman Sachs and joined by new investor Prosperity7, is the latest sign of our success. Closing the round provided $146 million, ensuring a solid financial foundation as we move forward with product development and innovation; accelerating sales, international expansion; and other strategic initiatives.

2. We increased value and capabilities for our customers

Above all else, we are passionate about adding value for SingleStore customers by delivering new capabilities. At our 2022 summer product launch event, we unleashed expanded capabilities to address real-time applications and workloads. These capabilities now empower SingleStore users with a whole new level of ease of use, efficiency, optimization, performance, resilience, scalability, security and speed.

3. We expanded our customer base

Our growing base of customers attests to the fact that smart enterprises clearly understand the value that SingleStore delivers. In August of this year, we announced Captain Metrics, DataDock Solutions, tech giant Dell Technologies, Digital Asset Research, Foodics, impact.com and Thentia as SingleStore customers. These forward-thinking, industry-leading companies are leveraging SingleStore’s superior database performance and real-time insights to enable a range of advertising, customer service, food services, financial services and technology use cases.

4. We have proof: SingleStore delivers more for less

Third-party research published this year revealed that SingleStore provided greater value at a far lower cost than the competition. A GigaOm benchmark study demonstrated that SingleStoreDB — which uniquely combines transactional and analytical workloads in a single unified engine — delivers a 50% lower total cost of ownership (TCO) compared to the combination of MySQL and Snowflake and a 60% lower TCO than the combination of PostgreSQL and Redshift.

5. Our customers love us and we have awards to prove it

Additionally, we are honored to have earned a variety of awards this year. The Dresner Advisory Services 2022 Industry Excellence Awards categorized SingleStore as an overall Leader in Analytical Data Infrastructure (ADI). And the TrustRadius Best of Summer 2022 awards recognized SingleStore for its value, feature set and relationship in relational databases.

I’ve said it before, and I’ll say it again: The only awards that truly matter are those that reflect the voice of the customer. That’s why these two awards are particularly significant for us.

6. The SAS Viya integration with SingleStore launched

The SAS and SingleStore engineering teams have been working to pioneer a first-of-its-kind integration that addresses key aspects of the data movement challenge. The SAS Viya with SingleStore integration uses a streaming protocol to reduce data movement and replication. This ultimately lowers cloud computing costs. SAS has integrated their analytic embedded process engine (EP) into SingleStore’s distributed database for best in class analytics and real-time performance. These are deep technical innovations that enable customers to have an excellent experience. Curious? Try it for free.

7. Our leadership and diversity is growing

With the addition of Sue Bostrom to our board, I’m excited to continue to keep adding amazing talent to our leadership. Diversity has been top of mind for us and we value people from all walks of life and experiences. I commend the progress we have made but we have a lot more work to do in our diversity, equity and inclusion efforts. We are committed to being seen as an example in the industry.

Bonus: The year isn’t over yet!

Our opportunities to serve customers and win new business only continue to grow.

Real-time analytics will be a trend throughout 2022, into 2023 and beyond. In today’s digital services economy, the world is full of businesses that are now service providers. Those businesses need a distributed, relational, cloud-native, multi-cloud, multi-model database to address the growing volume, velocity and video of data and today’s data-intensive applications.

Businesses will use data and analytics to help alleviate persistent supply chain challenges. And more organizations will use SingleStore to manage point of sale, product inventory and logistics data in a single, scalable database to get real-time visibility and do optimization and forecasting.

E-commerce and retail organizations, among others, will also increasingly employ modern data management to optimize interactions with customers. With SingleStore, companies can provide customers with personalized experiences using real-time segmentation, attribution and smart recommendations. Businesses can also drive sales with customer overlap analysis using converged real-time data and historical analysis in our single, SQL-accessible database.

Going forward, machine learning and artificial intelligence will become even more essential. Growing adoption of these technologies will also equate to a growing need for SingleStore because legacy and stitched-together databases can’t meet growing real-time demands.

We are excited to work with our great customers and investors, who are helping us to spread the word about our singular, innovative approach to data-intensive applications. And we look forward to heading into the fourth quarter — and soon, a new year — even stronger than before.

Feed: SingleStore Blog.
Author: .

With more than 100 million downloads, MySQL is the open-source relational database management system with the fastest growth.

Looking for a better experience with your database management system? An efficient and easy-to-use database management system can save you a lot of time and money. A database management system allows you to manage and administer databases and thus, have a significant effect on daily business operations. A poor system can cause severe issues like activity lag and bad user experience, and you would surely want to avoid that. Here is why a MySQL server might be worth considering.

MySQL Server

With more than 100 million downloads, MySQL is the open-source relational database management system with the fastest growth. MySQL stands for “My Structured Query Language”. Many major websites, including Facebook, Wikipedia, Twitter, YouTube, Flickr and others presently utilize it as their preferred database for usage in online apps. The most-used standard language for accessing MySQL databases is SQL.

MySQL server offers a database management system with querying, connection, good data structure capabilities and the ability to integrate with several platforms. In extremely demanding production applications, it can reliably and swiftly handle massive datasets. MySQL server also offers a variety of useful features, including connection, speed and security, which make it excellent for database access.

Top Reasons to Choose MySQL Server

There are various reasons you might want to consider a MySQL server. However, the most critical ones are:

Open Source

This implies that the MySQL server’s basic version can be installed and used by anybody, and that the source code can be altered and customized by outside parties. Advanced versions include tiered price structures that include more capacity, tools, and services.

Availability

You can rely on MySQL to ensure continuous uptime because of its steadfast dependability and unwavering availability. High-speed master/slave replication setups and specialized Cluster servers that allow fast failover are just a few of MySQL’s high-availability choices.

Compatibility

One of the core benefits of using a MySQL server is that it is highly compatible with diverse systems, languages and data models that include other DBMS alternatives, SQL and NoSQL databases, and cloud databases. MySQL also includes a wide range of database architecture and data modeling features (e.g., conceptual or logical data models). As a result, it becomes a straightforward and useful alternative for many enterprises, all while alleviating concerns about becoming “locked in” to the system.

Management Ease

Another key feature of the MySQL server is that the average time from software download to installation completion for MySQL is less than 15 minutes, which is an amazing quick-start capacity. No matter the operating system — Microsoft Windows, Linux, or any other—this rule applies. Self-management features, including automated space expansion and dynamic configuration changes, can significantly ease your workload once deployed.

As a DBA, you can manage, debug, and oversee the functioning of several MySQL servers from a single workstation, thanks to the comprehensive array of graphical administration and migration tools that MySQL offers.

Final Words

MySQL server is undoubtedly the best choice for you to have a speedy DBMS that brings great value to your data and business operations. Its open-source availability, along with high compatibility and management ease, enhances its performance considerably. Besides, all these features combine to reduce the hassle of database management and daily data flow. You can simply begin working with the MySQL server by downloading the latest version and building and loading the server.

One issue with MySQL, PostgreSQL and other legacy incumbent databases is that they often bottleneck with streaming ingest and have problems scaling– this issue creates a “price for performance” problem for high-growth data-intensive applications which makes the free open-source DB options inadequate. Enter: SingleStoreDB.

SingleStoreDB

Resources