Configuring PostgreSQL for Pretty Good Performance
PostgreSQL is a highly customizable relational database management system (RDBMS) with a dazzling array of configuration options. Fortunately, especially in recent versions of PostgreSQL, most of these settings do not need to be changed. The trick is to figure out which options do need to be changed and what values are most appropriate.
A few minutes spent adjusting your postgresql.conf file can result in major performance improvements. Further improvements may be possible with sufficient time and thought, but in this article I'll focus on how to quickly get your settings into the right ballpark.
One of the most important methods of improving PostgreSQL performance, especially on large data sets, is to increase the parameter
shared_buffers,  which controls the amount of memory PostgreSQL uses for its private  buffer cache. Because PostgreSQL also makes use of the system buffer  cache, and for other reasons, experience has shown that it's usually  best to set this parameter to about 25% of system memory on UNIX and  Linux systems, but not more than about 8GB. (Sometimes 8GB is already  too much.)On Windows, even smaller values are appropriate, between 256MB and 512MB. Raising this parameter may cause PostgreSQL not to start, because most operating system distributions have very conservative limits on the amount of System V shared memory that can be allocated by default. Fortunately, the PostgreSQL documentation gives detailed instructions on how to increase these limits, and the process is typically not difficult.
Aside from shared_buffers, the most important memory-allocation parameter is
work_mem.  The default value of 1MB allows any sort, hash join, or materialize  operation to use up to 1MB of physical memory. Larger operations will  use a less efficient algorithm that allows data to spill to disk.  Raising this value can dramatically improve the performance of certain  queries, but it's important not to overdo it. In the worst case, every  connection could be performing several sorts simultaneously, so if you  have 100 simultaneous connections to the database and work_mem is set to  10MB, you might use several gigabytes of memory — if you don't have  enough, your system will begin swapping, and performance will drop off  dramatically. Reasonable values are typically between 4MB and 64MB,  depending on the size of your machine, how many concurrent connections  you expect, and the complexity of your queries.While you're setting work_mem, it's also a good idea to set the related parameter
maintenance_work_mem,  which controls the amount of physical memory PostgreSQL will attempt to  use for maintenance operations, such as routine vacuuming and index  creation. As with work_mem, it's important not to set this parameter so  high that your system begins swapping, but because it's rare to have  more than 3 or 4 maintenance operations in progress simultaneously, it  isn't necessary to be quite so careful about the value. A good rule of  thumb is to set this to about 5% of system memory, but not more than  about 512MB. Larger values won't necessarily improve performance.Finally, you should increase the default value of
wal_buffers,  which defaults to 64kB. Although even this very small setting does not  always cause a problem, there are situations where it can result in  extra fsync calls, and degrade overall system throughput. Increasing  this value to 1MB or so can alleviate this problem. On very busy  systems, an even higher value may be needed, up to a maximum of about  16MB. Like shared_buffers, this parameter increases PostgreSQL's initial  shared memory allocation, so if bumping it up causes the system not to  start, you'll need to increase the operating system limit.Tuning Checkpoint Parameters
Once you've got the memory-related parameters set correctly, the next thing you should think about doing is tuning your checkpoint parameters. Increasing thecheckpoint_segments parameter, which  defaults to 3, can dramatically improve performance during bulk data  loads. A reasonable starting value is 30. Once you've increased this  parameter, it also makes sense to increase checkpoint_completion_target,  which defaults to 0.5, to 0.9; this will decrease the performance  impact of checkpointing on a busy system (but is ineffective for small  values of checkpoint_segments, which is why the default is 0.5).Finally, increasing
checkpoint_timeout from 5 minutes to a  larger value, such as 15 minutes, can reduce the I/O load on your  system, especially when using large values for shared_buffers. The  downside of making these adjustments is that your system will use a  modest amount of additional disk space, and will take longer to recover  in the event of a crash. However, for most users, this is a small price  to pay for a significant performance improvement.With memory allocation and checkpoint parameters out of the way, it's time to adjust your planner parameters. These settings are generally not quite so important as the memory allocation and checkpoint parameters, but they can definitely make a difference, especially on more complex queries. The parameters
random_page_cost and seq_page_cost,  control the planner's estimate of how expensive it will be to obtain  each database page. The default values assume very little caching, so  it's frequently a good idea to reduce them. Even if your database is  significantly larger than physical memory, you might want to try setting  these parameters to 2 and 1 (rather than the default values of 4 and 1)  to see whether you get better query plans that way. If your database  fits entirely within memory, you can lower these values much more,  perhaps to 0.1 and 0.1. Never set random_page_cost less than  seq_page_cost, but consider setting them equal (or very close to equal)  if your database fits mostly or entirely within memory.You should also configure the parameter
effective_cache_size.  Despite being measured in megabytes, this parameter does not allocate  any memory. Instead, it is used by the query planner to estimate certain  caching effects. When this parameter is set too low, the planner may  decide not to use an index even when it would be beneficial to do so. An  appropriate value is approximately 75% of physical memory.Finally, for best performance, it's a good idea to consider setting the
synchronous_commit  parameter to off. When this parameter is turned off, an unexpected  crash or power failure could result in the loss of a transaction that  was reported to the client as committed. For financial or other  mission-critical applications, this is unacceptable, and the default  value of on should be retained. However, many web applications can  tolerate the loss of a few seconds with of updates in the event of a  crash, and the performance gain from changing this setting can be  massive.Going Farther
If you're building a database server from scratch, there are many things that you can do to improve performance and reliability above and beyond adjusting parameters in postgresql.conf. Greg Smith's book PostgreSQL 9.0 High Performance is the definitive guide to building a high-performance PostgreSQL system, and the chapter on database hardware (PDF) is freely available as a sample.It covers not only how to maximize performance, but also steps you should take to maximize the reliability of your database on either new or existing hardware. If you're serious about maximizing PostgreSQL performance, both that chapter and the rest of the book are highly recommended reading. If, however, you're a casual PostgreSQL user or just trying the product out, the suggestions above will enable you get you much of the performance benefit in a fraction of the time.
by robert hass












0 comments:
Posting Komentar