PGSQL PERFORMANCE 29 RE SEQSCAN RATHER THAN INDEX
From: time@no-spam (David Brown)
Subject: Re: Seqscan rather than Index
Date: Fri, 17 Dec 2004 11:18:36 +0000


> You might want to reduce random_page_cost a little.

> Keep in mind that your test case is small enough to fit in RAM and is > probably not reflective of what will happen with larger tables.

I am also running 8.0 rc1 for Windows. Despite many hours spent tweaking various planner cost constants, I found little effect on cost estimates. Even reducing random_page_cost from 4.0 to 0.1 had negligible impact and failed to significantly influence the planner.


Increasing the statistics target for the last_name column to 250 or so *may* help, at least if you're only selecting one name at a time. That's the standard advice around here and the only thing I've found useful. Half the threads in this forum are about under-utilized indexes. It would be great if someone could admit the planner is broken and talk about actually fixing it!


I'm unconvinced that the planner only favours sequential scans as table size decreases. In my experience so far, larger tables have the same problem only it's more noticeable.


The issue hits PostgreSQL harder than others because of its awful sequential scan speed, which is two to five times slower than other DBMS. The archives show there has been talk for years about this, but it seems, no solution. The obvious thing to consider is the block size, but people have tried increasing this in the past with only marginal success.


Regards
David
---------------------------(end of broadcast)---------------------------
TIP 5: Have you checked our extensive FAQ?

http://www.postgresql.org/docs/faqs/FAQ.html



From: dev@no-spam (Richard Huxton)
Subject: Re: Seqscan rather than Index
Date: Fri, 17 Dec 2004 13:03:50 +0000

David Brown wrote:
>> You might want to reduce random_page_cost a little.
> > >> Keep in mind that your test case is small enough to fit in RAM and >> is probably not reflective of what will happen with larger tables.
> > > I am also running 8.0 rc1 for Windows. Despite many hours spent > tweaking various planner cost constants, I found little effect on > cost estimates. Even reducing random_page_cost from 4.0 to 0.1 had > negligible impact and failed to significantly influence the planner.

I'm not sure setting random_page_cost below 1.0 makes much sense.

> Increasing the statistics target for the last_name column to 250 or > so *may* help, at least if you're only selecting one name at a time.

Not going to do anything in this case. The planner is roughly right about how many rows will be returned, it's just not expecting everything to be in RAM.

> That's the standard advice around here and the only thing I've found > useful. Half the threads in this forum are about under-utilized > indexes. It would be great if someone could admit the planner is > broken and talk about actually fixing it!

Not sure I agree here - when the stats are accurate, you can get the planner to make near-optimal choices most of the time. Is there any particular pattern you've seen?

> I'm unconvinced that the planner only favours sequential scans as > table size decreases. In my experience so far, larger tables have the > same problem only it's more noticeable.

Hmm - assuming your statistics are good, this would suggest the other cost settings just aren't right for your hardware.

> The issue hits PostgreSQL harder than others because of its awful > sequential scan speed, which is two to five times slower than other > DBMS. The archives show there has been talk for years about this, but > it seems, no solution. The obvious thing to consider is the block > size, but people have tried increasing this in the past with only > marginal success.

Must admit this puzzles me. Are you saying you can't saturate your disk I/O? Or are you saying other DBMS store records in 0.5 to 0.2 times less space than PG?

--
Richard Huxton Archonet Ltd
---------------------------(end of broadcast)---------------------------
TIP 8: explain analyze is your friend

From: gsstark@no-spam (Greg Stark)
Subject: Re: Seqscan rather than Index
Date: 17 Dec 2004 10:47:57 -0500

Richard Huxton <dev@no-spam> writes:

> Not going to do anything in this case. The planner is roughly right about how > many rows will be returned, it's just not expecting everything to be in RAM.

That doesn't make sense or else it would switch to the index at random_page_cost = 1.0. If it was still using a sequential scan at random_page_cost < 1 then perhaps he had some problem with his query like mismatched data types that forced it to use a full scan.

> > That's the standard advice around here and the only thing I've found > > useful. Half the threads in this forum are about under-utilized > > indexes. It would be great if someone could admit the planner is > > broken and talk about actually fixing it!
> > Not sure I agree here - when the stats are accurate, you can get the planner to
> make near-optimal choices most of the time. Is there any particular pattern > you've seen?

The most common cause I've seen here is that Postgres makes very pessimistic assumptions about selectivity when it doesn't know better. Every other database I've tested assumes 'col > ?' is about 5% selectivity . Postgres assumes 33%.

Postgres is also more pessimistic about the efficiency of index scans. It's willing to use a sequential scan down to well below 5% selectivity when other databases use the more traditional rule of thumb of 10%.

In combination these effects do seem to cause an _awful_ lot of complaints.

> > The issue hits PostgreSQL harder than others because of its awful > > sequential scan speed, which is two to five times slower than other > > DBMS. The archives show there has been talk for years about this, but > > it seems, no solution. The obvious thing to consider is the block > > size, but people have tried increasing this in the past with only > > marginal success.
> > Must admit this puzzles me. Are you saying you can't saturate your disk I/O? Or
> are you saying other DBMS store records in 0.5 to 0.2 times less space than PG?


I don't know what he's talking about either. Perhaps he's thinking of people who haven't been running vacuum enough?

-- greg
---------------------------(end of broadcast)---------------------------
TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to majordomo@no-spam so that your message can get through to the mailing list cleanly

From: tgl@no-spam (Tom Lane)
Subject: Re: Seqscan rather than Index
Date: Fri, 17 Dec 2004 12:44:41 -0500

Greg Stark <gsstark@no-spam> writes:
> Postgres is also more pessimistic about the efficiency of index scans. It's > willing to use a sequential scan down to well below 5% selectivity when other > databases use the more traditional rule of thumb of 10%.

However, other databases are probably basing their analysis on a different execution model. Since we have to visit both heap and index in all cases, we do indeed have a larger penalty for index use.

I've looked pretty closely at the cost model for index access, believe me.
It's not pessimistic; if anything it is undercharging for index access.
(For one thing it treats the index's internal fetches as sequential access, when in reality they are probably random.)

I think the one effect that's not being modeled is amortization of index fetches across successive queries. The cost model is pretty much based on the assumption that each query starts from ground zero, whereas in reality a heavily used index will certainly have all its upper levels in RAM, and if it's not too large the leaf pages might all be cached too.
I wouldn't want to switch the planner over to making that assumption exclusively, but we could talk about having a cost parameter that dials the assumption up or down.

Awhile back I tried rewriting btcostestimate to charge zero for accessing the metapage and the upper index levels, but charge random_page_cost for fetching leaf pages. For small indexes this came out with worse (larger) numbers than we have now, which is not the direction we want to go in :-(. So I think that we have to somehow honestly model caching of index pages across queries.

Of course, to be completely fair such a modification should account for caching of heap pages as well, so it would also bring down the estimates for seqscans. But I'd be willing to accept a model that considers only caching of index pages as a zero-order approximation.

regards, tom lane
---------------------------(end of broadcast)---------------------------
TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match

From: gsstark@no-spam (Greg Stark)
Subject: Re: Seqscan rather than Index
Date: 17 Dec 2004 13:24:33 -0500

Tom Lane <tgl@no-spam> writes:

> Greg Stark <gsstark@no-spam> writes:
> > Postgres is also more pessimistic about the efficiency of index scans. It's > > willing to use a sequential scan down to well below 5% selectivity when other
> > databases use the more traditional rule of thumb of 10%.
> > However, other databases are probably basing their analysis on a > different execution model. Since we have to visit both heap and index > in all cases, we do indeed have a larger penalty for index use.

It's only in special cases that other databases do not have to look at the heap. For simple queries like "select * from x where foo > ?" they still have to look at the heap. I never looked into how much of a bonus Oracle gives for the index-only case, I'm not sure it even takes it into account.

> I've looked pretty closely at the cost model for index access, believe me.
> It's not pessimistic; if anything it is undercharging for index access.

I think there's another effect here beyond the physical arithmetic. There's a kind of teleological reasoning that goes something like "If the user created the index chances are it's because he wanted it to be used".

I guess that argues more for more aggressive selectivity estimates than for biased index costing though. If I'm doing "where foo > ?" then if there's an index on foo I probably put it there for a reason and want it to be used even if postgres doesn't really have a clue how selective the query will be.

> I think the one effect that's not being modeled is amortization of index > fetches across successive queries.
And across multiple fetches in a single query, such as with a nested loop.

It seems like the effective_cache_size parameter should be having some influence here.

-- greg
---------------------------(end of broadcast)---------------------------
TIP 7: don't forget to increase your free space map settings

From: tgl@no-spam (Tom Lane)
Subject: Re: Seqscan rather than Index
Date: Fri, 17 Dec 2004 13:36:52 -0500

Greg Stark <gsstark@no-spam> writes:
> Tom Lane <tgl@no-spam> writes:
>> I think the one effect that's not being modeled is amortization of index >> fetches across successive queries.
> And across multiple fetches in a single query, such as with a nested loop.

Right, that's effectively the same problem. You could imagine making a special-purpose solution for nestloop queries but I think the issue is more general than that.

> It seems like the effective_cache_size parameter should be having some > influence here.

But it doesn't :-(. e_c_s is currently only used to estimate amortization of repeated heap-page fetches within a single indexscan.

regards, tom lane
---------------------------(end of broadcast)---------------------------
TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match

From: sgunderson@no-spam ("Steinar H. Gunderson")
Subject: Re: Seqscan rather than Index
Date: Fri, 17 Dec 2004 22:56:27 +0100

On Fri, Dec 17, 2004 at 10:47:57AM -0500, Greg Stark wrote:
>> Must admit this puzzles me. Are you saying you can't saturate your disk I/O? Or
>> are you saying other DBMS store records in 0.5 to 0.2 times less space than PG?

> I don't know what he's talking about either. Perhaps he's thinking of people > who haven't been running vacuum enough?

I'm a bit unsure -- should counting ~3 million rows (no OIDs, PG 7.4,
everything in cache, 32-byte rows) take ~3500ms on an Athlon 64 2800+?

/* Steinar */
-- Homepage: http://www.sesse.net/

---------------------------(end of broadcast)---------------------------
TIP 7: don't forget to increase your free space map settings

From: sgunderson@no-spam ("Steinar H. Gunderson")
Subject: Re: Seqscan rather than Index
Date: Fri, 17 Dec 2004 23:09:07 +0100

On Fri, Dec 17, 2004 at 10:56:27PM +0100, Steinar H. Gunderson wrote:
> I'm a bit unsure -- should counting ~3 million rows (no OIDs, PG 7.4,
> everything in cache, 32-byte rows) take ~3500ms on an Athlon 64 2800+?

(I realize I was a bit unclear here. This is a completely separate case, not related to the original poster -- I was just wondering if what I'm seeing is normal or not.)

/* Steinar */
-- Homepage: http://www.sesse.net/

---------------------------(end of broadcast)---------------------------
TIP 1: subscribe and unsubscribe commands go to majordomo@no-spam

From: sgunderson@no-spam ("Steinar H. Gunderson")
Subject: Re: Seqscan rather than Index
Date: Sat, 18 Dec 2004 00:55:48 +0100

On Fri, Dec 17, 2004 at 05:02:29PM -0600, Frank Wiles wrote:
> It depends more on your disk IO than the processor. Counting isn't > processor intensive, but reading through the entire table on disk > is. I've also seen a huge difference between select count(*) and > select count(1) in older versions, haven't tried it on a recent > version however.
Like I said, all in cache, so no disk IO. count(*) and count(1) give me identical results. (BTW, I don't think this is a count problem, it's a "sequential scan" problem -- I'm just trying to find out if this is natural or not, ie. if this is just something I have to expect in a relational database, even with no I/O.)

/* Steinar */
-- Homepage: http://www.sesse.net/

---------------------------(end of broadcast)---------------------------
TIP 7: don't forget to increase your free space map settings

From: tgl@no-spam (Tom Lane)
Subject: Re: Seqscan rather than Index
Date: Fri, 17 Dec 2004 23:37:37 -0500

Frank Wiles <frank@no-spam> writes:
> I've also seen a huge difference between select count(*) and > select count(1) in older versions,

That must have been before my time, ie, pre-6.4 or so. There is certainly zero difference now.

regards, tom lane
---------------------------(end of broadcast)---------------------------
TIP 5: Have you checked our extensive FAQ?

http://www.postgresql.org/docs/faqs/FAQ.html

From: bruno@no-spam (Bruno Wolff III)
Subject: Re: Seqscan rather than Index
Date: Fri, 17 Dec 2004 22:39:18 -0600

On Fri, Dec 17, 2004 at 22:56:27 +0100,
"Steinar H. Gunderson" <sgunderson@no-spam> wrote:
> > I'm a bit unsure -- should counting ~3 million rows (no OIDs, PG 7.4,
> everything in cache, 32-byte rows) take ~3500ms on an Athlon 64 2800+?

It doesn't seem totally out of wack. You will be limited by the memory bandwidth and it looks like you get something on the order of a few hundred references to memory per row. That may be a little high, but it doesn't seem ridiculously high.

---------------------------(end of broadcast)---------------------------
TIP 2: you can get off all lists at once with the unregister command (send "unregister YourEmailAddressHere" to majordomo@no-spam

From: sgunderson@no-spam ("Steinar H. Gunderson")
Subject: Re: Seqscan rather than Index
Date: Sat, 18 Dec 2004 14:45:40 +0100

On Fri, Dec 17, 2004 at 10:39:18PM -0600, Bruno Wolff III wrote:
> It doesn't seem totally out of wack. You will be limited by the memory > bandwidth and it looks like you get something on the order of a few > hundred references to memory per row. That may be a little high, but > it doesn't seem ridiculously high.

I just tested 8.0.0rc1 -- I got a _50%_ speedup on this operation...

/* Steinar */
-- Homepage: http://www.sesse.net/

---------------------------(end of broadcast)---------------------------
TIP 6: Have you searched our list archives?

http://archives.postgresql.org