Current File : //usr/share/doc/postgresql-9.2.24/html/gin-tips.html |
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd">
<HTML
><HEAD
><TITLE
>GIN Tips and Tricks</TITLE
><META
NAME="GENERATOR"
CONTENT="Modular DocBook HTML Stylesheet Version 1.79"><LINK
REV="MADE"
HREF="mailto:pgsql-docs@postgresql.org"><LINK
REL="HOME"
TITLE="PostgreSQL 9.2.24 Documentation"
HREF="index.html"><LINK
REL="UP"
TITLE="GIN Indexes"
HREF="gin.html"><LINK
REL="PREVIOUS"
TITLE="Implementation"
HREF="gin-implementation.html"><LINK
REL="NEXT"
TITLE="Limitations"
HREF="gin-limit.html"><LINK
REL="STYLESHEET"
TYPE="text/css"
HREF="stylesheet.css"><META
HTTP-EQUIV="Content-Type"
CONTENT="text/html; charset=ISO-8859-1"><META
NAME="creation"
CONTENT="2017-11-06T22:43:11"></HEAD
><BODY
CLASS="SECT1"
><DIV
CLASS="NAVHEADER"
><TABLE
SUMMARY="Header navigation table"
WIDTH="100%"
BORDER="0"
CELLPADDING="0"
CELLSPACING="0"
><TR
><TH
COLSPAN="5"
ALIGN="center"
VALIGN="bottom"
><A
HREF="index.html"
>PostgreSQL 9.2.24 Documentation</A
></TH
></TR
><TR
><TD
WIDTH="10%"
ALIGN="left"
VALIGN="top"
><A
TITLE="Implementation"
HREF="gin-implementation.html"
ACCESSKEY="P"
>Prev</A
></TD
><TD
WIDTH="10%"
ALIGN="left"
VALIGN="top"
><A
HREF="gin.html"
ACCESSKEY="U"
>Up</A
></TD
><TD
WIDTH="60%"
ALIGN="center"
VALIGN="bottom"
>Chapter 55. GIN Indexes</TD
><TD
WIDTH="20%"
ALIGN="right"
VALIGN="top"
><A
TITLE="Limitations"
HREF="gin-limit.html"
ACCESSKEY="N"
>Next</A
></TD
></TR
></TABLE
><HR
ALIGN="LEFT"
WIDTH="100%"></DIV
><DIV
CLASS="SECT1"
><H1
CLASS="SECT1"
><A
NAME="GIN-TIPS"
>55.4. GIN Tips and Tricks</A
></H1
><P
></P
><DIV
CLASS="VARIABLELIST"
><DL
><DT
>Create vs. insert</DT
><DD
><P
> Insertion into a <ACRONYM
CLASS="ACRONYM"
>GIN</ACRONYM
> index can be slow
due to the likelihood of many keys being inserted for each item.
So, for bulk insertions into a table it is advisable to drop the GIN
index and recreate it after finishing bulk insertion.
</P
><P
> As of <SPAN
CLASS="PRODUCTNAME"
>PostgreSQL</SPAN
> 8.4, this advice is less
necessary since delayed indexing is used (see <A
HREF="gin-implementation.html#GIN-FAST-UPDATE"
>Section 55.3.1</A
> for details). But for very large updates
it may still be best to drop and recreate the index.
</P
></DD
><DT
><A
HREF="runtime-config-resource.html#GUC-MAINTENANCE-WORK-MEM"
>maintenance_work_mem</A
></DT
><DD
><P
> Build time for a <ACRONYM
CLASS="ACRONYM"
>GIN</ACRONYM
> index is very sensitive to
the <TT
CLASS="VARNAME"
>maintenance_work_mem</TT
> setting; it doesn't pay to
skimp on work memory during index creation.
</P
></DD
><DT
><A
HREF="runtime-config-resource.html#GUC-WORK-MEM"
>work_mem</A
></DT
><DD
><P
> During a series of insertions into an existing <ACRONYM
CLASS="ACRONYM"
>GIN</ACRONYM
>
index that has <TT
CLASS="LITERAL"
>FASTUPDATE</TT
> enabled, the system will clean up
the pending-entry list whenever the list grows larger than
<TT
CLASS="VARNAME"
>work_mem</TT
>. To avoid fluctuations in observed response time,
it's desirable to have pending-list cleanup occur in the background
(i.e., via autovacuum). Foreground cleanup operations can be avoided by
increasing <TT
CLASS="VARNAME"
>work_mem</TT
> or making autovacuum more aggressive.
However, enlarging <TT
CLASS="VARNAME"
>work_mem</TT
> means that if a foreground
cleanup does occur, it will take even longer.
</P
></DD
><DT
><A
HREF="runtime-config-client.html#GUC-GIN-FUZZY-SEARCH-LIMIT"
>gin_fuzzy_search_limit</A
></DT
><DD
><P
> The primary goal of developing <ACRONYM
CLASS="ACRONYM"
>GIN</ACRONYM
> indexes was
to create support for highly scalable full-text search in
<SPAN
CLASS="PRODUCTNAME"
>PostgreSQL</SPAN
>, and there are often situations when
a full-text search returns a very large set of results. Moreover, this
often happens when the query contains very frequent words, so that the
large result set is not even useful. Since reading many
tuples from the disk and sorting them could take a lot of time, this is
unacceptable for production. (Note that the index search itself is very
fast.)
</P
><P
> To facilitate controlled execution of such queries,
<ACRONYM
CLASS="ACRONYM"
>GIN</ACRONYM
> has a configurable soft upper limit on the
number of rows returned: the
<TT
CLASS="VARNAME"
>gin_fuzzy_search_limit</TT
> configuration parameter.
It is set to 0 (meaning no limit) by default.
If a non-zero limit is set, then the returned set is a subset of
the whole result set, chosen at random.
</P
><P
> <SPAN
CLASS="QUOTE"
>"Soft"</SPAN
> means that the actual number of returned results
could differ somewhat from the specified limit, depending on the query
and the quality of the system's random number generator.
</P
><P
> From experience, values in the thousands (e.g., 5000 — 20000)
work well.
</P
></DD
></DL
></DIV
></DIV
><DIV
CLASS="NAVFOOTER"
><HR
ALIGN="LEFT"
WIDTH="100%"><TABLE
SUMMARY="Footer navigation table"
WIDTH="100%"
BORDER="0"
CELLPADDING="0"
CELLSPACING="0"
><TR
><TD
WIDTH="33%"
ALIGN="left"
VALIGN="top"
><A
HREF="gin-implementation.html"
ACCESSKEY="P"
>Prev</A
></TD
><TD
WIDTH="34%"
ALIGN="center"
VALIGN="top"
><A
HREF="index.html"
ACCESSKEY="H"
>Home</A
></TD
><TD
WIDTH="33%"
ALIGN="right"
VALIGN="top"
><A
HREF="gin-limit.html"
ACCESSKEY="N"
>Next</A
></TD
></TR
><TR
><TD
WIDTH="33%"
ALIGN="left"
VALIGN="top"
>Implementation</TD
><TD
WIDTH="34%"
ALIGN="center"
VALIGN="top"
><A
HREF="gin.html"
ACCESSKEY="U"
>Up</A
></TD
><TD
WIDTH="33%"
ALIGN="right"
VALIGN="top"
>Limitations</TD
></TR
></TABLE
></DIV
></BODY
></HTML
>