pgvector

A fundamental element of many AI and ML systems is handling and manipulating vectors and embeddings, which represent complex, high-dimensional data in a format that machines can understand and process efficiently.

What are embeddings?

Embeddings refer to transforming data or complex objects like texts, images or audio into a list of numbers in a high-dimensional space.

Embedding

Source: OpenAI

This technique is used in every machine learning(ML)  or deep learning (DL) algorithm that enables capturing / understanding of the meaning and context of data (semantic relationships) and knowledge of complex relationships and patterns within the data (syntactic relationships).

 

Vector embedding

 

What is pgvector?

Pgvector is an open-source vector similarity extension for PostgreSQL. PostgreSQL is a well-known and mature database, it supports all the features we expect from the SQL database, likes joins, subqueries, window functions, stored procedures, triggers, and it also supports ACID, role-based and row-level security, backups, partitioning, sharding, auditing, and much more.

PG

Therefore, we can use all the features required by the enterprise. This makes pgvector enterprise-ready immediately. We can use existing solutions and integrations, and only extend them with pgvector to store embeddings.

Additionally, we don’t need to migrate any data between the SQL database and the vector database. They can all live together in the same space, which makes the integration much easier and more efficient.

Pgvector provides new column types and new operations for the PostgreSQL engine. The column type is used to store vectors up to 2000 dimensions. We can use new operators to perform calculations on the vectors, and perform element-wise addition, subtraction, and multiplication. We also can search for the exact and approximate nearest neighbors, L2 distance, inner product distance, and cosine distance. Almost all important vector operations can be done with pgvector.

With all those nice features, pgvector is used more and more in Gen-AI solutions. Click more to understand our generative AI services for enterprise business use cases.