Sparse Vector: svector
since v0.3.0
Unlike dense vectors, sparse vectors are very high-dimensional but contain few non-zero values.
Typically, sparse vectors can be created from:
- Word co-occurrence matrices
- Term frequency-inverse document frequency (TF-IDF) vectors
- User-item interaction matrices
- Network adjacency matrices
Sparse vectors in pgvecto.rs
are called svector
.
Here's an example of creating a table with a svector column and inserting values:
sql
CREATE TABLE items (
id bigserial PRIMARY KEY,
embedding svector(10) NOT NULL
);
INSERT INTO items (embedding) VALUES ('[0.1,0,0,0,0,0,0,0,0,0]'), ('[0,0,0,0,0,0,0,0,0,0.5]');
Index can be created on svector
type as well.
sql
CREATE INDEX your_index_name ON items USING vectors (embedding svector_l2_ops);
SELECT * FROM items ORDER BY embedding <-> '[0.3,0,0,0,0,0,0,0,0,0]' LIMIT 1;
We support three operators to calculate the distance between two svector
values.
<->
(svector_l2_ops
): squared Euclidean distance, defined as. <#>
(svector_dot_ops
): negative dot product, defined as. <=>
(svector_cos_ops
): cosine distance, defined as.
There is also a function to_svector
to create a svector
. It will set the value at the specified position.
sql
-- to_svector(dim: INTEGER, position: ARRAY, value: ARRAY) -> svector
SELECT to_svector(5, '{0, 4}', '{0.3, 0.5}');
-- [0.3, 0, 0, 0, 0.5]
Sparse vectors are