Feed: SingleStore Blog.
Author: .
You database application developers are makers and fixers too. You’re grown up. But what grown-up builder doesn’t like their toys? We’re proud to announce the arrival of the SingleStore tool truck for Spring 2022!

Speed, scale and elasticity
Flexible parallelism
With FP, when running one parallel query when you have four times as many cores as partitions, your CPU meter will look like this:

instead of this:

Your shiny new CPU cores will not be idling in the toybox, so they’ll get your queries done a lot quicker — potentially many times faster!
Spilling
In this release, Spilling for hash group-by, select-distinct and count-distinct operations is on by default. If they’d use more memory than available, part of the in-memory hash table for the operations will be written (spilled) to disk. This allows queries to run (albeit more slowly), rather than fail if they hit the memory limit. Here are the variables that control spilling:
singlestore> show variables like '%spill%';
+------------------------------------------+-----------+
| Variable_name | Value |
+------------------------------------------+-----------+
| enable_spilling | ON |
| spilling_minimal_disk_space | 500 |
| spilling_node_memory_threshold_ratio | 0.750000 |
| spilling_query_operator_memory_threshold | 104857600 |
+------------------------------------------+-----------+
LLVM upgrade
We upgraded our LLVM code generation framework to version 10 from version 3.8. Most queries perform (at most) marginally faster. A few queries, like a simple delete from a 1000-column table, are much improved. They will either run when they would cause an error before, or they compile much faster — in some cases, up to 100x faster.
Many-column DELETE statements now compile up to 100x faster
Materialized CTEs
Common Table Expressions (CTEs) can be expensive to compute, but return a small result set (say, if they include a group-by/aggregate on a lot of data). The same CTE can sometimes be referenced more than once in the same query. TPC-DS Q4 is an example:
WITH year_total AS (
… huge aggregate query…
)
SELECT
t_s_secyear.customer_id
, t_s_secyear.customer_first_name
, t_s_secyear.customer_last_name
, t_s_secyear.customer_email_address
FROM year_total t_s_firstyear
JOIN year_total t_s_secyear
ON t_s_secyear.customer_id = t_s_firstyear.customer_id
JOIN year_total t_c_firstyear
ON t_s_firstyear.customer_id = t_c_firstyear.customer_id
JOIN year_total t_c_secyear
ON t_s_firstyear.customer_id = t_c_secyear.customer_id
JOIN year_total t_w_firstyear
ON t_s_firstyear.customer_id = t_w_firstyear.customer_id
JOIN year_total t_w_secyear
ON t_s_firstyear.customer_id = t_w_secyear.customer_id
WHERE t_s_firstyear.sale_type = 's'
AND … lots more filters …;
It uses CTE year_total six times!
SingleStoreDB now automatically recognizes this and will compute the CTE once, save the result internally and reuse that result each time it appears in the query, rather than recomputing it for each reference. This is all within the scope of one query; the saved CTE is discarded when the query completes. This can speed up the query up to a factor of ‘N,’ if the CTE is used ‘N’ times.
Row-level decoding for string

You only need to read a few bytes from the column segment, not the whole thing.
Faster upgrades via reduced effort for codegen
SingleStoreDB compiles queries to machine code, one of the sources of its speed. Compilation takes time, so we’ve worked to make compilation faster and ensure it happens less often.
Features you can code with
That covers speed and scale. What can a builder really put their hands on in this release?
SET statement for user-defined session variables
Local variables can be used for all kinds of things. A common use is to break work down into multiple steps to make queries easier to write and read. Here’s a script that finds every employee in an organization with the maximum salary:
create table emp(id int, name varchar(30), salary float);
insert emp values (1,"Bob",10000),(2,"Sue","12000");
set @maxsal = (select max(salary) from emp);
select * from emp where salary = @maxsal;
+------+------+--------+
| id | name | salary |
+------+------+--------+
| 2 | Sue | 12000 |
+------+------+--------+
Session variables are typed, and the type is inferred from context. If you want a specific type, use a cast, like so:
set @d = 0 :> double;
Multi-assignment is supported:
set @x = 1, @y = 2;
The user-defined variables for your session can be seen in information_schema.user_variables. Here they are for the current session:
select * from information_schema.user_variables;
+---------------+-------------------------+---------------------+
| VARIABLE_NAME | VARIABLE_VALUE | VARIABLE_TYPE |
+---------------+-------------------------+---------------------+
| y | 2 | bigint(20) NOT NULL |
| x | 1 | bigint(20) NOT NULL |
| d | 0.00000000000000000e+00 | double NULL |
| maxsal | 1.20000000000000000e+04 | float NULL |
+---------------+-------------------------+---------------------+
Matching expressions to computed columns
Matching expressions to computed columns lets you speed up query processing while coding your queries in a more natural way. It allows you to make some queries faster, without having to change the text of those queries.
Here’s an example:
CREATE TABLE assets (
tag_id BIGINT PRIMARY KEY,
name TEXT NOT NULL,
description TEXT,
properties JSON NOT NULL,
weight AS properties::%weight PERSISTED DOUBLE,
license_plate AS properties::$license_plate PERSISTED LONGTEXT,
KEY(license_plate), KEY(weight));
explain select * from assets where properties::$license_plate = "XYZ123";
+------------------------------------------------------------------------------------------------------------------+
| EXPLAIN |
+------------------------------------------------------------------------------------------------------------------+
| Gather partitions:all alias:remote_0 parallelism_level:segment |
| Project [assets.tag_id, assets.name, assets.description, assets.properties, assets.weight, assets.license_plate] |
| ColumnStoreFilter [assets.license_plate = 'XYZ123' index] |
| ColumnStoreScan db1.assets, KEY __UNORDERED () USING CLUSTERED COLUMNSTORE table_type:sharded_columnstore |
+------------------------------------------------------------------------------------------------------------------+
Notice the ColumnStoreFilter uses a computed column license_plate, and the index on it, to solve the query even though the query references properties::$license_plate. This is a field that would normally have to be computed every time by reaching into the JSON field properties.
It’s important to use a large data type like longtext or double for computed columns to avoid failure to match that can occur due to overflow issues. Warnings are available to help you identify the cause of match failures — for example, if you change the CREATE TABLE statement for assets to make license_plate be of type text instead of longtext, and do:
compile select * from assets where properties::$license_plate = "XYZ123";
show warnings;
You will see:
Warning 2626, Prospect computed column: assets.license_plate of type text
CHARACTER SET utf8 COLLATE utf8_general_ci NULL cannot suit expression of
type longtext CHARACTER SET utf8 COLLATE utf8_general_ci NULL.
You can add computed columns and index them using ALTER TABLE. e.g.:
create table t(id int, key(id));
alter table t add id2 as id*2 persisted bigint;
create index idx on t(id2);
Persisted computed column matching can help you change the database to speed up queries without touching your application.
VECTOR_SORT and other new vector functions
We’ve filled out our vector toy collection in this release with these new functions:
VECTOR_KTH_ELEMENT |
find the k’th element of a vector, 0-based |
VECTOR_SUBVECTOR |
take a slice of a vector |
VECTOR_ELEMENTS_SUM |
total the elements of a vector |
VECTOR_NUM_ELEMENTS |
gets the length of the vector, in elements |
Versions of these functions are available for all common integer and floating point sizes.
Here’s an example using the sort function:
select json_array_unpack(vector_sort(json_array_pack('[300,100,200]')));
+------------------------------------------------------------------+
| json_array_unpack(vector_sort(json_array_pack('[300,100,200]'))) |
+------------------------------------------------------------------+
| [100,200,300] |
+------------------------------------------------------------------+
ISNUMERIC
Need to know if a string contains a number, but also be able to handle commas, currency signs and the like? ISNUMERIC is popular in other DBMSs for doing this, and is now available in SingleStoreDB:
select "$5,000.00" as n, isnumeric(n);
+-----------+--------------+
| n | isnumeric(n) |
+-----------+--------------+
| $5,000.00 | 1 |
+-----------+--------------+
This’ll make it easier to port applications from SQL Server and Azure SQL DB, for example.
SELECT FOR UPDATE – now for Universal Storage
Some of our large enterprise customers have strict security requirements and enforce policies whereby only the minimum necessary privileges will be granted to users in their organizations, including application owners and application administrators who may not need all the permissions of the root user. To help ensure that only the minimum privileges are needed to get the job done with respect to GRANT of permissions, we’ve added a new, more granular mode for GRANT option.
SECRET()
Want to keep a secret? The new SECRET function takes a literal and makes sure it doesn’t show up in trace logs or query plans or profiles. You can use it to hide passwords, certificates and the like. For example, if you run:
profile select * from t where s = SECRET('super-secret-password');
then run
show profile json;
you’ll see
"query_text":"profile select * from t where s = SECRET('<password>')",
whereas if you didn’t use SECRET, then ‘super-secret-password’ would appear in the profile output.
We’ve added a couple of new storage features that have directly visible commands you can use.
Backup with split partitions for unlimited storage DBs
You can run
BACKUP DATABASE db1 WITH SPLIT PARTITIONS TO S3
"backup_bucket/backups/6_1_2022"
CONFIG '{"region":"us-east-1"}'
CREDENTIALS
'{"aws_access_key_id":"your_access_key_id",
"aws_secret_access_key":"your_secret_access_key"}';
Then restore the backup to a new database name. Assuming the restore succeeded, you can then detach the old database, detach the new database and attach the new database using the old database name, then allow your applications to proceed.
Drop milestone
Of course, when a milestone isn’t needed any more, you may want to remove it. You can now do that with:
DROP MILESTONE milestone name [FOR database_name]
Milestones are automatically removed once they are no longer within the PITR retention period.
We hope you like the SingleStoreDB toys delivered in this release! Whether you like speed, scale, power, control or simplicity, there’s something in it for you.