Database Indexing Basics.

Database Indexing Basics.

Introduction

A database index is a data structure that improves the speed of data retrieval operations on a database table at the cost of additional space and write time to the database. Database indexing is one of the most important concepts in database management. Indexing is used to improve the performance of queries by allowing the database to find data faster. In this article, we will discuss the basics of database indexing, including how indexing works, the types of indexes, and the best practices for index creation and maintenance.

How indexing works

Indexing works by creating a separate data structure that stores a copy of the data in the indexed column(s), along with a reference to the actual data in the table. When a query is executed, the database engine first looks for the index data structure to find the data it needs. If the index is available, the engine retrieves the data from the index structure instead of scanning the entire table. This process can greatly speed up the execution time of queries.

Types of indexes

There are several types of indexes, but the most common ones are:

  1. B-tree index: A B-tree index is a data structure that organizes data in a tree-like structure. It is commonly used for equality and range searches.

  2. Bitmap index: A bitmap index is a data structure that uses bitmaps to represent the occurrence of values in a column. It is commonly used for low cardinality columns.

  3. Hash index: A hash index is a data structure that uses a hash function to compute the address of a value. It is commonly used for equality searches.

Best practices for index creation

Here are some best practices for index creation:

  1. Choose the right columns: Index only the columns that are frequently used in queries. Indexing too many columns can slow down write performance.

  2. Choose the right index type: Choose the index type based on the type of queries that will be executed.

  3. Index cardinality: Index columns with high cardinality, which means columns with a large number of unique values.

  4. Index fragmentation: Monitor the index fragmentation and reorganize or rebuild the index when necessary.

  5. Index maintenance: Regularly monitor and maintain the indexes to ensure they are being used efficiently.

Code implementation

First, create a folder and install the mongodb express and pg module using npm

mkdir dbIndex && cd dbIndex && npm init -y && npm i express mongodb pg
  1. Indexing in MongoDB

In MongoDB, indexes are created using the createIndex method. Here is an example of how to create an index in MongoDB using the Node.js driver:

Then, you can create a MongoDB database connection in your Express.js application like this:

const MongoClient = require('mongodb').MongoClient;
const uri = 'mongodb://localhost:27017/dbindex';
const client = new MongoClient(uri, { useNewUrlParser: true });
client.connect(err => {
  const collection = client.db("dbindex").collection("users");
  collection.createIndex({ username: 1 }, function(err, result) {
    console.log("Index created successfully");
    client.close();
  });
});

The MongoDB Node.js driver is imported and creates a constant MongoClient variable that will be used to connect to a MongoDB instance, then a connection URI, which specifies the address of the MongoDB instance that we want to connect to is define.

After establishing a connection to the server, the code then defines a collection object called users within the dbindex database. This is done using the client.db method, which takes the name of the database as an argument and returns a reference to the specified database.

The createIndex method on the collection object is called, which creates an index on the username field of the users collection. The { username: 1 } parameter specifies that the index should be created in ascending order based on the username field.

The createIndex method is a callback function that will be called when the index creation is complete. This callback function simply logs a message to the console indicating that the index was created successfully, and then closes the MongoDB client connection using the client.close() method.

To use the index, we simply need to include the indexed field in our queries:

const collection = client.db("dbindex").collection("users");
collection.find({ username: "Eddy" }).toArray(function(err, data) {
  console.log(data);
  client.close();
});

This code retrieves all documents from the "users" collection in the "dbindex" database where the "username" field is equal to "Eddy".

The .find() method is called on a MongoDB collection object (collection) with the query { username: "Eddy" } as its parameter. The toArray() method is called on the returned cursor to convert the query results to an array of documents.

The callback function specified in the toArray() method is executed with two parameters: err and data. If there is an error during the execution of the query, the error message will be contained in the err parameter. Otherwise, the query results will be contained in the data parameter, which will be logged to the console using console.log(data).

Finally, the client.close() method is called to close the connection to the MongoDB client.

  1. Indexing in PostgreSQL

In PostgreSQL, indexes are created using the CREATE INDEX statement.

const { Pool } = require('pg');
const pool = new Pool({
user: 'your_user',
host: 'localhost',
database: 'your_database',
password: 'your_password',
port: 5432,
});

pool.connect((err, client, release) => {
if (err) {
return console.error('Error acquiring client', err.stack);
}
const query = 'CREATE INDEX username_index ON users (username)';
client.query(query, (err, result) => {
if (err) {
console.error('Error creating index', err.stack);
} else {
console.log('Index created successfully');
}
release();
});
});

The Pool class is a container for a set of database connections, which can be reused across multiple requests. In this example, the new Pool instance is configured with connection information for a PostgreSQL database, including the user, host, database name, password, and port number.

Next, the pool.connect method is called to acquire a client connection from the pool. This method takes a callback function with three arguments: an error object, a client object, and a release function. If an error occurs while acquiring the client, the error object is logged to the console and the function returns early.

Assuming no error occurred, the callback function continues by defining a SQL query to create an index on a table named users. The index will be created on the username column of the users table.

The query is executed using the client.query method, which takes the SQL query string and a callback function as arguments. If an error occurs while executing the query, the error object is logged to the console. Otherwise, a message is logged to the console indicating that the index was created successfully. Finally, the release function is called to release the client connection back to the pool.

const query = 'SELECT * FROM users WHERE username = $1';
const values = ['Eddy'];
pool.query(query, values, (err, res) => {
console.log(res.rows);
pool.end();
});

A constant variable query which is assigned the SQL query string to select all columns from the users table where the username column is equal to the first parameterized value $1. The use of parameterized values helps prevent SQL injection attacks.

Also a constant variable values which is assigned an array containing the string 'Eddy'. This value will be used to replace the first parameterized value $1 in the query string.

The query uses the query() method of a pool object. The pool object represents a pool of client connections to the PostgreSQL database. The query() method takes three arguments: the query string, the parameterized values, and a callback function that will be called when the query completes.

The res object contains the rows returned by the query, and res.rows is an array containing the result set. In this case, it will be an array of rows where the username column is equal to 'Eddy' and this is logged to the console. Finally, the connection pool to the PostgreSQL database is closed.

Conclusion

Database indexing is a powerful tool that can greatly improve query performance. By understanding the different types of indexes, best practices for index creation and maintenance, and how to implement indexing in Express with MongoDB and PostgreSQL, we can optimize your database queries and improve our application's performance.

Did you find this article valuable?

Support Eddy's Space by becoming a sponsor. Any amount is appreciated!