A Look into the Implementation of SingleStore’s Workload Monitoring UI

Feed: SingleStore Blog.
Author: .

The Workload Monitoring UI allows SingleStore users to analyze their clusters’ workloads. They can do this by viewing all of the activities that ran during a specific period of time, as well as each query’s resource usages (CPU, network, disk I/O, etc.) and other properties. In this blog post, we explain how we implemented the graphical interface for this feature.

What is Workload Monitoring?

Workload Monitoring is a visual UI currently implemented in SingleStore Studio on top of the Workload Profiling SingleStore database feature. The goal of Workload Monitoring is to allow the user to analyze a specific cluster workload. Each query has information such as time spent across CPU usage, memory access, disk I/O access, as well as other metrics. The following screenshot shows the Activities page where you can see the recorded queries as well as choose the properties the user can filter by.

screenshot of workload monitoring page — Workload Monitoring page

How is Workload Monitoring implemented?

There are two ways to retrieve the data — one can record the cluster usage in real time in the UI, or rely on a set of data that is continuously being collected (usually referred to as “historical monitoring“).

The first one implies a manual start in the Workload Monitoring page in Studio. The user can choose between recording for a fixed time interval, or manually starting and stopping the recording in the UI. Both processes follow similar approaches: they sample data from the cluster and show what activities were running during the time interval.

When you select a fixed interval, the frontend runs

SET SESSION activities_delta_sleep_s = <interval>

. This query will set a session variable to be used when running the query that retrieves the activities —

SELECT * FROM INFORMATION_SCHEMA.MV_ACTIVIES_EXTENDED

. This query will take “activities_delta_sleep_s” seconds to respond and then finally return a list of activities which ran during the time period as well as all the resource statistics about these activities.

(Queries and other “tasks” that run within SingleStore are referred to as “activities”. So, a SQL query run by a user will be made up of more than one activity, but other jobs such as backups will also generate activities. The resource usage of these can sometimes be relevant too, but it’s usually queries that matter the most.)

If the user chooses to start and stop the recording manually, the outcome is the same but it’s executed a bit differently. When the recording starts, the activities are retrieved from the

mv_activities_extended_cumulative table

and saved in memory. This table returns all activities which ever ran in the cluster, keyed by their activity name. For each activity, the table returns its various metrics with a cumulative, always increasing value. Here’s a sample row from this table for an INSERT SQL query:

*************************** 894. row ***************************
                    NODE_ID: 3
              ACTIVITY_TYPE: Query
              ACTIVITY_NAME: Insert_trade_ab59956168a51a79
   AGGREGATOR_ACTIVITY_NAME: insert_trade_25c621542ce291e1
              DATABASE_NAME: trades
               PARTITION_ID: 1
                CPU_TIME_MS: 2498
           CPU_WAIT_TIME_MS: NULL
            ELAPSED_TIME_MS: 19746
           LOCK_ROW_TIME_MS: 0
               LOCK_TIME_MS: 0
               DISK_TIME_MS: NULL
            NETWORK_TIME_MS: 0
         LOG_BUFFER_TIME_MS: 0
          LOG_FLUSH_TIME_MS: 16996
LOG_BUFFER_LARGE_TX_TIME_MS: 0
     NETWORK_LOGICAL_RECV_B: 0
     NETWORK_LOGICAL_SEND_B: 1171283
         LOG_BUFFER_WRITE_B: 2231793
        DISK_LOGICAL_READ_B: 31016
       DISK_LOGICAL_WRITE_B: 110
       DISK_PHYSICAL_READ_B: NULL
      DISK_PHYSICAL_WRITE_B: NULL
                  MEMORY_BS: 525
        MEMORY_MAJOR_FAULTS: NULL
 PIPELINE_EXTRACTOR_WAIT_MS: 0
 PIPELINE_TRANSFORM_WAIT_MS: 0
    LAST_FINISHED_TIMESTAMP: 2021-09-15 07:15:09
                  RUN_COUNT: 0
              SUCCESS_COUNT: 31882
              FAILURE_COUNT: 0

When the recording stops, activities are retrieved again from the same cumulative table. Both activity groups are compared and the frontend takes the ones that were running by calculating the delta between starting and ending activity groups. We do this by filtering the activities which have a higher run count in the second group than in the first as well as activities that show up on the second group but not on the first. This process ensures that only the activities that have changed during the recording are shown.

For activities which are present in both groups, we have to get the delta value of all metrics’ cumulative values. As an example, if a query spent 4000 milliseconds in CPU time in the first snapshot and 4300 milliseconds in CPU time in the second snapshot, then this activity spent 300 milliseconds in CPU time during the interval that’s being analyzed. Because we’re dealing with always-increasing cumulative values, we need to be careful with overflows. This can be tricky because SingleStore returns these counters as 64 bit integers but the frontend can only hold 53 bit integers. So, the frontend has to parse the numbers as strings and use bignumber.js to perform all computations.

Note that when the recording is in progress, the user is free to navigate between pages, run queries and interact with the cluster. The results are displayed when the interval ends or when the user chooses to stop the profiling.

screenshot of workload monitoring real-time recording options — Workload Monitoring real-time recording options

The second approach, the historical monitoring, requires some configuration at the cluster level. You can see in our documentation how to configure this. The process involves designating a database to monitor the cluster, allowing it to ingest data, and configuring the Studio state file to have access to this database (the monitoring database can live in an external cluster as well).

The SingleStore pipeline functionality allows for the data to be easily ingested into the monitoring database. From the user’s point of view, when this kind of monitoring is enabled, the Workload Monitoring page doesn’t show any “recording” feature but rather just displays all the activities in the last hour by default. The user can then choose to select any past time interval they’d like. Of course, the monitoring data will only be available from the moment one sets up their monitoring database.

With historical monitoring, the user is free to choose any time interval and analyze the workload that was running then. Here, the frontend is still doing the exact same calculations as when a user chooses to record usage manually, but not running the same queries as before. Instead, the frontend queries the historical database (which could potentially involve connecting to another cluster, if the monitoring database is not stored locally), which contains a table very similar to

mv_activities_extended_cumulative

, but with an extra column called

timestamp

. The frontend will take the activity groups for two

timestamp

values and perform the same computations described earlier.

Screenshot of workload monitoring page for a cluster with historical monitoring enabled — Workload Monitoring page for a cluster with historical monitoring enabled.

Besides all this, there are a lot of other computations that occur in order to provide a good user experience. These mainly include unit conversions, but the frontend also has to group activities and sub-activities correctly. Since SingleStore is a distributed database, all queries in SingleStore are composed by sub-activities running in its various nodes. The frontend should therefore show the node breakdown when the activity is expanded.

Moreover, it also displays the raw information in a more graphical and aesthetically pleasing way by taking all the time spent by a query to execute and showing colorful bars that display a breakdown of where a query is spending its time. This is very helpful to diagnose various types of problems. For example, if all queries are spending an inordinate amount of time waiting for disk, there may be a problem with the disk performance of the machines where SingleStore nodes are running.

screenshot of the nodes page of the workload monitoring page — The nodes page of the Workload Monitoring page.

Finally, the UI also supports browsing resource usage by node which is done by deriving the necessary information from the raw activities data and then grouping it by node.

Future Plans

This is a feature that is currently part of our Studio product for self-managed customers. However, we have been working on extracting it from Studio and implementing it in our Managed Service UI as well. Our plan is to have it available to both sets of users soon.

A Look into the Implementation of SingleStore’s Workload Monitoring UI

What is Workload Monitoring?

How is Workload Monitoring implemented?

Future Plans

Trending Articles

Practice Sheet of Right form of verbs for HSC Students

Download: FK ft Shenky – Nakuyewa ”Prod by: Shenky”

How to win at Markstrat (Markstrat Tips and Tricks) – Vodites

Ominde Commission Report and Recommendations – Ominde Report of 1964

Bureau of Internal Revenue: Regional Offices (Directory)

GO 53 on Enhancement of Ex-gratia upto 5 Lakhs Toddy Tappers in Telangana

Cakewalk CA-2A Leveling Amplifier v2.0.1.97 WiN, v2.0.1.96 OSX Incl Keygen

Mp3 Download: Mdu - Kunjenjenjena

How the kill the job , when DTP request running for long hours.

Microsoft Intune から展開しているアプリのアップデートについて

18-year-old girl was beaten for half an hour by two Northampton men in 'an...

Car crash in Dunton Bassett leaves driver in critical condition

Macky 2, Two Others In Road Accident

Application log 00000000000000089514: Could not convert queue DLVST90CLNT

Detroit mafia: D’Anna Brothers agree to plea deal

Delivery block field greyed out using VA02

Muloraki Au

【個人撮影】スマホのプライベート映像♪「中に出さないで///」カラオケ屋での生ハメ撮りが流出ｗ【リベンジポルノ】＠PornHub

BREAKING NEWS: Diamond Platnumz Is Reported Dead After Ghastly Car Accident

FIAT 500 B0111 B0112