How you can construct a matryoshka-optimized sentence embedding mannequin for ultra-fast retrieval with 64-dimensional truncation

by root February 12, 2026

written by root February 12, 2026 0 comment 117 views

On this tutorial, you utilize Matryoshka Illustration Studying to fine-tune a Sentence-Transformers embedding mannequin in order that the primary dimension of the vector carries essentially the most helpful semantic sign. We validate the essential promise of MRL by coaching with MatryoshkaLoss on triplet knowledge and benchmarking the search high quality after truncating the embedding to 64, 128, and 256 dimensions. Lastly, we present the right way to save the adjusted mannequin and cargo the mannequin utilizing a small truncate_dim setting for quick and memory-efficient vector search. Please examine Full code here.

!pip -q set up -U sentence-transformers datasets speed up


import math
import random
import numpy as np
import torch


from datasets import load_dataset
from torch.utils.knowledge import DataLoader


from sentence_transformers import SentenceTransformer, InputExample
from sentence_transformers import losses
from sentence_transformers.util import cos_sim




def set_seed(seed=42):
   random.seed(seed)
   np.random.seed(seed)
   torch.manual_seed(seed)
   torch.cuda.manual_seed_all(seed)


set_seed(42)

Set up the required libraries and import all modules required for coaching and evaluation. As a result of we set a deterministic seed, sampling and coaching habits stays constant throughout runs. We additionally be sure that PyTorch and CUDA RNG are tuned when GPUs can be found. Please examine Full code here.

@torch.no_grad()
def retrieval_metrics_mrr_recall_at_k(
   mannequin,
   queries,
   corpus,
   qrels,
   dims_list=(64, 128, 256, None),
   okay=10,
   batch_size=64,
):
   gadget = "cuda" if torch.cuda.is_available() else "cpu"
   mannequin.to(gadget)


   qids = record(queries.keys())
   docids = record(corpus.keys())


   q_texts = [queries[qid] for qid in qids]
   d_texts = [corpus[did] for did in docids]


   q_emb = mannequin.encode(q_texts, batch_size=batch_size, convert_to_tensor=True, normalize_embeddings=True)
   d_emb = mannequin.encode(d_texts, batch_size=batch_size, convert_to_tensor=True, normalize_embeddings=True)


   outcomes = {}


   for dim in dims_list:
       if dim is None:
           qe = q_emb
           de = d_emb
           dim_name = "full"
       else:
           qe = q_emb[:, :dim]
           de = d_emb[:, :dim]
           dim_name = str(dim)
           qe = torch.nn.purposeful.normalize(qe, p=2, dim=1)
           de = torch.nn.purposeful.normalize(de, p=2, dim=1)


       sims = cos_sim(qe, de)


       mrr_total = 0.0
       recall_total = 0.0


       for i, qid in enumerate(qids):
           rel = qrels.get(qid, set())
           if not rel:
               proceed


           topk = torch.topk(sims[i], okay=min(okay, sims.form[1]), largest=True).indices.tolist()
           topk_docids = [docids[j] for j in topk]


           recall_total += 1.0 if any(d in rel for d in topk_docids) else 0.0


           rr = 0.0
           for rank, d in enumerate(topk_docids, begin=1):
               if d in rel:
                   rr = 1.0 / rank
                   break
           mrr_total += rr


       denom = max(1, len(qids))
       outcomes[dim_name] = {f"MRR@{okay}": mrr_total / denom, f"Recall@{okay}": recall_total / denom}


   return outcomes




def pretty_print(outcomes, title):
   print("n" + "=" * 80)
   print(title)
   print("=" * 80)
   for dim, metrics in outcomes.gadgets():
       print(f"dim={dim:>4} | " + " | ".be a part of([f"{k}={v:.4f}" for k, v in metrics.items()]))

Implement a light-weight search evaluator that encodes queries and paperwork, computes cosine similarity, and studies MRR@10 and Recall@10. By renormalizing the embedding after truncation, smaller prefixes stay comparable in cosine area. As well as, we now have put in a compact printer to make it simpler to see earlier than and after comparisons. Please examine Full code here.

DATASET_ID = "sentence-transformers/msmarco-co-condenser-margin-mse-sym-mnrl-mean-v1"
SUBSET = "triplet-hard"
SPLIT = "practice"


TRAIN_SAMPLES = 4000
EVAL_QUERIES = 300


stream = load_dataset(DATASET_ID, SUBSET, break up=SPLIT, streaming=True)


train_examples = []
eval_queries = {}
eval_corpus = {}
eval_qrels = {}


doc_id_counter = 0
qid_counter = 0


for row in stream:
   q = (row.get("question") or "").strip()
   pos = (row.get("optimistic") or "").strip()
   neg = (row.get("damaging") or "").strip()


   if not q or not pos or not neg:
       proceed


   train_examples.append(InputExample(texts=[q, pos, neg]))


   if len(eval_queries) < EVAL_QUERIES:
       qid = f"q{qid_counter}"
       qid_counter += 1


       pos_id = f"d{doc_id_counter}"; doc_id_counter += 1
       neg_id = f"d{doc_id_counter}"; doc_id_counter += 1


       eval_queries[qid] = q
       eval_corpus[pos_id] = pos
       eval_corpus[neg_id] = neg
       eval_qrels[qid] = {pos_id}


   if len(train_examples) >= TRAIN_SAMPLES and len(eval_queries) >= EVAL_QUERIES:
       break


print(len(train_examples), len(eval_queries), len(eval_corpus))

We stream the mined MS MARCO triplet dataset and construct each a coaching set (question, optimistic, damaging) and a small IR benchmark set. Map every question to related optimistic paperwork and embody damaging paperwork to make your search significant. Cease early to maintain Colab-friendly execution whereas nonetheless being giant sufficient to indicate truncation results.

MODEL_ID = "BAAI/bge-base-en-v1.5"


gadget = "cuda" if torch.cuda.is_available() else "cpu"
mannequin = SentenceTransformer(MODEL_ID, gadget=gadget)
full_dim = mannequin.get_sentence_embedding_dimension()


baseline = retrieval_metrics_mrr_recall_at_k(
   mannequin,
   queries=eval_queries,
   corpus=eval_corpus,
   qrels=eval_qrels,
   dims_list=(64, 128, 256, None),
   okay=10,
)
pretty_print(baseline, "BEFORE")

Load a robust base embedding mannequin and document its full embedding dimensions. 64/128/256/ Run a baseline analysis throughout the complete dimension to see how truncation behaves earlier than coaching. Print the outcomes so you possibly can later examine whether or not MRL improves the standard of the preliminary dimensions.

batch_size = 16
epochs = 1
warmup_steps = 100


train_loader = DataLoader(train_examples, batch_size=batch_size, shuffle=True, drop_last=True)


base_loss = losses.MultipleNegativesRankingLoss(mannequin=mannequin)


mrl_dims = [full_dim, 512, 256, 128, 64] if full_dim >= 768 else [full_dim, 256, 128, 64]
mrl_loss = losses.MatryoshkaLoss(
   mannequin=mannequin,
   loss=base_loss,
   matryoshka_dims=mrl_dims
)


mannequin.match(
   train_objectives=[(train_loader, mrl_loss)],
   epochs=epochs,
   warmup_steps=warmup_steps,
   show_progress_bar=True,
)


after = retrieval_metrics_mrr_recall_at_k(
   mannequin,
   queries=eval_queries,
   corpus=eval_corpus,
   qrels=eval_qrels,
   dims_list=(64, 128, 256, None),
   okay=10,
)
pretty_print(after, "AFTER")


out_dir = "mrl-msmarco-demo"
mannequin.save(out_dir)


m64 = SentenceTransformer(out_dir, truncate_dim=64)
emb = m64.encode(
   ["what is the liberal arts?", "liberal arts covers humanities and sciences"],
   normalize_embeddings=True
)
print(emb.form)

Create a MultipleNegativesRankingLoss and wrap it in a MatryoshkaLoss with a descending record of goal prefix dimensions. Nice-tune the mannequin with triplets and rerun the identical truncation benchmark to measure retention enhancements. Additionally, save the mannequin and reload with truncate_dim=64 to see compact search in motion.

In conclusion, we efficiently educated a Matryoshka-optimized embedding mannequin that maintains robust search efficiency even when vectors are truncated to small prefix dimensions similar to 64. We verified its effectiveness by evaluating baseline and post-training search metrics throughout a number of truncation sizes and full embeddings. Utilizing a saved mannequin and the truncate_dim loading sample supplies a clear workflow for constructing smaller and sooner vector indices whereas retaining the choice to rerank with full-dimensional embedding.

Please examine Full code here. Additionally, be happy to comply with us Twitter Remember to hitch us 100,000+ ML subreddits and subscribe our newsletter. dangle on! Are you on telegram? You can now also participate by telegram.

Welcome to Ivugangingo!

At Ivugangingo, we're passionate about delivering insightful content that empowers and informs our readers across a spectrum of crucial topics. Whether you're delving into the world of insurance, navigating the complexities of cryptocurrency, or seeking wellness tips in health and fitness, we've got you covered.

How you can construct a matryoshka-optimized sentence embedding mannequin for ultra-fast retrieval with 64-dimensional truncation

As Hyperliquid soars, ADA falls out of the highest 10 rankings, is Cardano shedding its edge?

Walmart’s Greatest President’s Day Offers of 2026: TVs, Apple, Laptops, and Extra

Converter

Editors Pick

Newsletter

Categories

Related Posts

Leave a Comment Cancel Reply

Latest

Best selling

Top rated

Products

Latest Posts

Welcome to Ivugangingo!

Random Picks