PostgreSQL Indexing

Before starting the concept behind the index, firstly, we will address it in the real world, how it is helpful!!

Since childhood books are part of our day to day life, once I thought of reading a book and picked one from a library nearby. While I was searching the topics of my interest, I flipped a few pages but did not find the same. During that time, I thought there is any mechanism that can help me in quickly browsing the topic of my interest.

There I realized the use and power of an index in a book - this helped me in quickly browsing the contents and pages of my interest.
The same principle applies to database indexes - this helps you in searching the data of your interest in a faster and efficient manner.

Now let's take a deep dive into database indexes:  The index is a mechanism to improve the database performance, and it is used to arrange the data a better way, which makes it easy to fetch the data by achieving direct access to data pages. Regarding performance, an Index is a vital tool for a DBA. 

PostgreSQL is one of those databases known for its great extensibility and vibrant collection of core and 3D party additions, and Indexing is one of them. PostgreSQL databases cover a wide variety of cases, whether it is simple B-Tree for all types or specific geospatial GiST indexes for the non-scalar data types (e.g., geometrical shapes, fts, arrays).

Table of Contents

·         Basic Index Terminology
·         Index Types
·         B-Tree Index
·         Hash Index
·         GiST Index
·         SP-GiST Index
·         GIN Index
·         BRIN Index

PostgreSQL Basic Index Terminology:

Before understanding the different Index type and their uses, we must take a look at terminology because a DBA will come on this one sooner or later.
Index Access Method: The index type (B-tree, GiST, GIN, etc)
Type: Indexed the column’s data type.
Operator: A function between two data types
Operator Family: An operator family defines a collection of related operator classes, and perhaps some additional operators and support functions that are compatible with these operator classes but not essential for the functioning of any individual index
Operator Class (also stated as index strategy): It defines the operator that the index uses for the column.

In PostgreSQL’s system catalog, access methods are stored in pg_am, operator classes in pg_opclass, operator families in pg_opfamily. The dependencies of the above are shown in the diagram below:

Types of Indexes in PostgreSQL

PostgreSQL provides the following Index types:

B-tree is default index, it can operate against all data types and also can use to retrieve the NULL value. 
Hash index handles equality only.
GiST index is suitable for non-scalar data types (e.g. geometrical shapes, fts, arrays)
SP-GiST index is space partitioned GIST, an evolution of GiST for handling non-balanced structures (quadtrees, k-d trees, radix trees)
GIN index is suitable for complex types (e.g. jsonb, fts, arrays)
BRIN index is a relatively new type of index which supports data that can be sorted by storing min/max values in each block
Let’s try to understand the index uses with some real world examples.

B-Tree Index

B-Tree is the default index and it can apply on all data types, it is most of the commonly used index in PostgreSQL. B-Tree index entries are sorted by default in ascending order. It supports in many cases to supply sort order for an index. Suppose we you are working on pagination and you want to recent data entries first so you can use the index sorting as desc, it may be possible indexed column may contain the null value also, B-Tree index provides the solution for this also you can consider/ sort the null value’s rows at the last.

Create a B-Tree Index using the following query

CREATE INDEX mytable_mycolumn_index ON mytable (mycolumn DESC NULLS LAST);

Hash Index

Pre-Postgres 10 is only useful for comparisons of equality, but you almost never want to use them because they are not secure for transactions, need to be manually rebuilt after crashes and not repeated to followers, so the benefit over using a B-Tree is very less. There are efforts in PgSQL 11 to make hash indexes a first-class index method along with its bigger brothers (B-tree, GiST, GIN).

Create a Hash Index using the following query

CREATE INDEX mytable_mycolumn_idx ON mytable USING hash(mycolumn);

GiST Index

PostgreSQL provides the Generalized Search Tree (GiST) index that allows you to build general balanced tree structures, and can be used for operations beyond equality and range comparisons. GiST Index support to geometric data types, tsquery and tsvector (as well as full-text search). It can be used in exclusion constraints to create GiST index.

Create a GiST Index using the following query

CREATE INDEX mytable_mycolumn_idx ON mytable USING hash(mycolumn);

Note: If you are getting the following error that means some features are not on in PgSQL.

ERROR:  data type character varying has no default operator class for access method "gist"
HINT:  You must specify an operator class for the index or define a default operator class for the data type.”
To fix the above issue run the following query.

CREATE extension btree_gist ;

SP-GiST Index

Space-partitioned - Generalized Search Tree (SP-GiST) Index is same like GiST enables the development of many different strategies in the domain of non-balanced disk-based data structures. The default PgSQL distribution offers support for two-dimensional points, (any type) ranges, text and inet types. It can be used in exclusion constraints the same as GiST.  

Create an SP-GiST Index using the following query

CREATE INDEX mytable_mycolumn_idx ON mytable USING hash(mycolumn);

GIN Index

Generalized Inverted Index provides many indexing strategies. GIN is compatible when we want to index composite types of columns. As we know, the default PostgreSQL distribution provides support for any array type, jsonb, and full-text search (tsvector). Jsonb type is the growing feature of PostgreSQL that depends on the GIN for index support. GIN also used commonly to the index for full-text search.

Create a GIN Index using the following query

CREATE INDEX mytable_mycolumn_idx ON mytable USING hash(mycolumn);

BRIN Index

Block Range Index (BRIN Index) was introduced in PostgreSQL 9.5. BRIN works on very large tables by storing summary information for a set of pages called “Block Range”.  If you are using BRIN Index then be ready for extra logic for PostgreSQL query statements and also need for extra maintenance.

Create a BRIN Index using the following query

CREATE INDEX part_machine_id_idx_brin ON part USING BRIN(machine_id);

No comments:

Post a Comment

Please do not enter any spam link in the comment box.

NUnit : Introduction

There are several unit test tools in the market, but one of them is NUnit, the most popular tool for doing unit testing. So in this tutorial...