There is no silver bullet when it comes to optimizing queries. However, with the knowledge gained from monitoring tools it is possible for DBAs to understand and fix slow running queries.
During normal PostgreSQL operations, resource-consuming unoptimized queries inevitably occur, causing slow response times that can lead to user frustration and customer attrition. In financial services, for example, slow response times can result in failed transactions. In e-commerce, it can cause customers to abandon a shopping cart. Whether the delay caused by unoptimized queries is from milliseconds to minutes or from minutes to hours or days, it is vital for every database engineer, DBA, and software developer to be able to detect and fix problem queries as soon as possible. In this article, we’ll review the key tools and extensions that enable those working with PostgreSQL to optimize their queries.
Choose a Workload Monitoring Tool
Workload monitoring tools help administrators see what a workload consists of – the types of queries that are sent to the database and which ones consume resources and take too long to execute. Postgres monitoring tools work with Zabbix Server, a free, open-source monitoring solution from Zabbix Inc., which can be downloaded from the official Zabbix website. There are two Postgres-specific monitoring tools to choose from. Both work well, so you may want to try both and see which one you prefer.
Mamonsu is an open source active Zabbix agent that can collect RDBMS metrics, connect to a Zabbix Server, and send the metrics to it. It interacts with PostgreSQL 9.5 and higher and can work with various operating systems. Mamonsu provides various metrics related to PostgreSQL activity, including connections and locks statistics, autovacuum workers, the oldest transaction identifier, and many others. You can download Mamonsu for free from the PostgresPro repository on Github.
Zabbix Agent 2, another tool for collecting various metrics, is available for PostgreSQL version 10+ and Zabbix Server version 4.4+. The features of Zabbix Agent 2 include the ability to collect more than 95 metrics from multiple PostgreSQL instances using a single agent, support for custom plugins written in Golang, and options for monitoring and checking metrics in real-time via a command line. Zabbix Agent 2 сan be downloaded from the Zabbix repository.
It is important to note that these tools will not by themselves identify the problem queries. Instead, they provide information that can help you make the identification and figure out how to fix the problems. Armed with this information, you become a detective, putting together clues to solve a mystery. The more experience you have, the easier this detection becomes.
Below are some of the types of information you can collect using monitoring tools.
Connection metrics – Connection metrics report the number of client connections to the database and the total number of possible connections, which is limited by the amount of memory on the primary node. Once the limit is reached, new client connections are blocked until existing ones are closed. A sudden increase in the number of connections could, for example, indicate that slow queries are consuming bandwidth. Slow queries require locks, and locks require new connections to the database. One reason for a high lock count is that some transactions haven’t committed their changes and, therefore, haven’t released the acquired locks.
Locks sampling – Lock sampling shows the types of locks used within a defined timeframe. One type of lock is a “write query” lock, which could indicate that a query performance issue is being caused by newly written queries.
Query Plan – PostgreSQL has a built-in complex planner that automatically devises a query plan for each query it receives. The planner seeks to choose the optimal approach for the query structure and the properties of the data to ensure the best performance. The EXPLAIN command shows the query plan for each query, which provides insight into what aspects of query execution may be affecting performance. A DBA can then try to tune the query plan manually to improve performance by gathering planner statistics and using PostgreSQL configuration settings and modules like pg_hint_plan.
See also: Enabling DevOps at the Junction of AIOps and Open Source
Extensions for tracking resource-intensive queries
While workload monitoring tools can help detect the time intervals when a database is performing poorly, they can’t show any resource-consuming query texts. The following PostgreSQL extensions can be used for that purpose.
pg_stat_statements shows which queries have the longest execution time. It is included in the PostgreSQL standard distribution. For more information, visit: https://www.postgresql.org/docs/13/pgstatstatements.html
pg_stat_kcache is used for detecting queries that consume the most CPU system and user time. This extension is not part of the PostgreSQL distribution, so it should be downloaded separately. For more information, visit: https://github.com/powa-team/pg_stat_kcache
auto_explain is used for tracking query plans and parameters. It is part of the PostgreSQL distribution. For more information, visit: https://www.postgresql.org/docs/13/auto-explain.html
pg_store_plans is used to gather information about execution plan statistics of all SQL statements executed by a server. For more information, visit: http://ossc-db.github.io/pg_store_plans/
pg_wait_sampling is used to collect information about wait events of a particular process. Two kinds of statistics are collected. The “history of wait events” shows samples of wait events for each process. “Waits profile” shows a count of samples per each process and each wait event. This extension is not part of the PostgreSQL distribution, so it should be downloaded separately. For more information, visit: https://github.com/postgrespro/pg_wait_sampling
plprofiler is used to help create performance profiles of PL/pgSQL functions and stored procedures in the form of a FlameGraph, which helps identify the longest procedure or function. This extension is not part of the PostgreSQL distribution, so it should be downloaded separately. For more information, visit: https://github.com/bigsql/plprofiler
pgpro_stats is a combination of pg_stat_statements, pg_stat_kcache and pg_wait_sampling modules. It is included in the Postgres Professional distribution and can show query execution plans with waits profiles related to them. For more information, visit https://postgrespro.ru/docs/enterprise/12/pgpro-stats?lang=en
Using pg_profile to detect resource-consuming queries
pg_profile, which can be downloaded from this repository, is a particularly useful extension for creating a historical workload repository containing various metrics. These metrics include:
- SQL query statistics
- DML statistics
- Metrics related to index usage
- Top growing tables
- Top tables by Delete/Update operations
- Metrics related to vacuum and autovacuum process
- User functions statistics
pg_profile regularly collects data from the Postgres Statistics Collector and extensions, making it possible to detect most resource-intensive activities for defined time periods in the past. pg_profile is easy to install, and it can be used in restricted environments without accessing the server file system. It enables DBAs to set a specific time interval, for example, two hours, and create a report showing many performance statistics for that time interval. A pg_profile differential report compares performance statistics for two time periods side-by-side. Since one or two samples per hour is usually sufficient to detect the most resource-consuming activities, pg_profile doesn’t create much overhead.
pgpro_pwr is an enhanced version of pg_profile that can interact with pgpro_stats, collect its metrics and save them to a separate database for further processing. It is included in the Postgres Professional distribution and shows wait statistics and query execution plans in separate sections of a pgpro_pwr report. The more detailed statistics in pgpro_pwr enable easier problem detection.
Additional query optimization tips
With the information gained from Postgres monitoring, DBAs can identify and fix issues that cause slow-running queries. Additional optimizing strategies include:
Index Only Scan – Sometimes, it makes sense to rewrite a query and create additional indexes. PostgreSQL can use the Index Only Scan access method to retrieve data from the index and not from the table, which reduces random reads count and improves performance.
Reviewing execution plans using the LIMIT clause instead of the DISTINCT clause, which can reduce the need for calculations, which significantly improve performance.
It is also possible to create extended statistics for the multiple columns of a table to reduce the difference between actual and estimated row counts. Since PostgreSQL 12, the statistics will work even for range conditions, such as “between,” “less or equal than,” “greater or equal than,” etc.
Conclusion
Postgres has multiple mechanisms for achieving goals, and choosing the right one is key. DBAs should always try to find a better or faster way to execute the query, so it is important that someone highly experienced with Postgres lead and teach the DBA team. Alternatively, companies hire an expert consultant to cope with typical workloads that are being processed too slowly.
There is no silver bullet when it comes to optimizing queries. However, with the knowledge gained from monitoring tools – along with patience and an eye for detail – it is possible for DBAs to understand and fix slow running queries. Remember, all the tools mentioned above are available for download, along with more resources and explanations about how to use them. More tips are also available in the PostgreSQL documentation and this blog post on PostgreSQL query optimizations.