Uncovering the Interior Workings of the LLM: A Singular Worth Perspective | Lewis Owen

Uncovering the Interior Workings of the LLM: A Singular Worth Perspective | Lewis Owen | June 2024

by root June 14, 2024

written by root June 14, 2024 0 comment 198 views

Now, let’s get to the principle topic of this text: analyzing the (Q, Okay, V, O) matrices of the Llama-3–8B-Instruct mannequin with their singular values.

code

First, let’s import all of the packages we’d like for this evaluation.

import transformers
import torch
import numpy as np
from transformers import AutoConfig, LlamaModel
from safetensors import safe_open
import os
import matplotlib.pyplot as plt

Subsequent, obtain the mannequin and reserve it domestically. /tmplisting.

MODEL_ID = "meta-llama/Meta-Llama-3-8B-Instruct"
!huggingface-cli obtain {MODEL_ID} --quiet --local-dir /tmp/{MODEL_ID}

You probably have GPUs in abundance, the next code will not be related to you. Nonetheless, in case you are GPU poor like me, the next code is extraordinarily helpful for loading solely particular layers of the LLama-3–8B mannequin.

def load_specific_layers_safetensors(mannequin, model_name, layer_to_load):
state_dict = {}
recordsdata = [f for f in os.listdir(model_name) if f.endswith('.safetensors')]
for file in recordsdata:
filepath = os.path.be part of(model_name, file)
with safe_open(filepath, framework="pt") as f:
for key in f.keys():
if f"layers.{layer_to_load}." in key:
new_key = key.substitute(f"mannequin.layers.{layer_to_load}.", 'layers.0.')
state_dict[new_key] = f.get_tensor(key)missing_keys, unexpected_keys = mannequin.load_state_dict(state_dict, strict=False)
if missing_keys:
print(f"Lacking keys: {missing_keys}")
if unexpected_keys:
print(f"Sudden keys: {unexpected_keys}")

The explanation we do that is that the free tier of Google Colab GPUs just isn’t sufficient to load LLama-3–8B. fp16 Precision. As well as, this evaluation contains fp32 Accuracy is np.linalg.svd Subsequent, we outline the principle perform that will get the singular values of a given worth. matrix_type , layer_number and head_number.

def get_singular_values(model_path, matrix_type, layer_number, head_number):
"""
Computes the singular values of the required matrix within the Llama-3 mannequin.Parameters:
model_path (str): Path to the mannequin
matrix_type (str): Sort of matrix ('q', 'okay', 'v', 'o')
layer_number (int): Layer quantity (0 to 31)
head_number (int): Head quantity (0 to 31)
Returns:
np.array: Array of singular values
"""
assert matrix_type in ['q', 'k', 'v', 'o'], "Invalid matrix kind"
assert 0 <= layer_number < 32, "Invalid layer quantity"
assert 0 <= head_number < 32, "Invalid head quantity"
# Load the mannequin just for that particular layer since we have now restricted RAM even after utilizing fp16
config = AutoConfig.from_pretrained(model_path)
config.num_hidden_layers = 1
mannequin = LlamaModel(config)
load_specific_layers_safetensors(mannequin, model_path, layer_number)
# Entry the required layer
# At all times index 0 since we have now loaded for the particular layer
layer = mannequin.layers[0]
# Decide the dimensions of every head
num_heads = layer.self_attn.num_heads
head_dim = layer.self_attn.head_dim
# Entry the required matrix
weight_matrix = getattr(layer.self_attn, f"{matrix_type}_proj").weight.detach().numpy()
if matrix_type in ['q','o']:
begin = head_number * head_dim
finish = (head_number + 1) * head_dim
else:  # 'okay', 'v' matrices
# Modify the head_number primarily based on num_key_value_heads
# That is achieved since llama3-8b use Grouped Question Consideration
num_key_value_groups = num_heads // config.num_key_value_heads
head_number_kv = head_number // num_key_value_groups
begin = head_number_kv * head_dim
finish = (head_number_kv + 1) * head_dim
# Extract the weights for the required head
if matrix_type in ['q', 'k', 'v']:
weight_matrix = weight_matrix[start:end, :]
else:  # 'o' matrix
weight_matrix = weight_matrix[:, start:end]
# Compute singular values
singular_values = np.linalg.svd(weight_matrix, compute_uv=False)
del mannequin, config
return record(singular_values)

It’s value noting that we will extract the weights for a given head of the Okay, Q, and V matrices by performing row-wise slicing, which is carried out as follows: Hugging Face.

Implementation of Q, Okay, and V matrices in HuggingFace. In PyTorch, the size of the matrices are (d_out,d_in)Supply: Picture by writer.

As for the O matrix, because of linear algebra we will slice it column-wise to extract the weights of a given head with O weights, as you possibly can see in additional element within the following diagram:

We clarify why we will extract a given head on the O weight matrix by doing column-wise slicing. Supply: Picture by writer.

Welcome to Ivugangingo!

At Ivugangingo, we're passionate about delivering insightful content that empowers and informs our readers across a spectrum of crucial topics. Whether you're delving into the world of insurance, navigating the complexities of cryptocurrency, or seeking wellness tips in health and fitness, we've got you covered.

Uncovering the Interior Workings of the LLM: A Singular Worth Perspective | Lewis Owen | June 2024

code

A British man is ridiculously attempting to sue Apple following a divorce, caused by his wife finding messages to a prostitute he deleted from his iPhone that were still accessible on an iMac.

Albedo gives an up-close have a look at Earth from very low Earth orbit

Converter

Editors Pick

Newsletter

Categories

Related Posts

Leave a Comment Cancel Reply

Latest

Best selling

Top rated

Products

Latest Posts

Welcome to Ivugangingo!

Random Picks