Cheatsheets

Python 新一代依赖管理器 uv
torch.tensor 和 torch.Tensor 的区别
obj.__getitem__(index) Python 特性
def __call__(self, x): Python 特性
小批量随机梯度下降/为啥分批训练 1.内存 2.训练慢 3.梯度噪声
- 训练集验证集（调整超参数）测试集 8-1-1
反向传播计算时，心里需要知道：本地导数、链式法则都是啥

def __add__(self, other):
    other = other if isinstance(other, Value) else Value(other)
    out = Value(self.data + other.data, (self, other), '+')

    def _backward():
        self.grad += 1 * out.grad
        other.grad += 1 * out.grad
    out._backward = _backward

    return out

def __mul__(self, other):
    other = other if isinstance(other, Value) else Value(other)
    out = Value(self.data * other.data, (self, other), '*')

    def _backward():
        self.grad += other.data * out.grad
        other.grad += self.data * out.grad
    out._backward = _backward

    return out

Σ（Sigma：西格马）求和符号，表示对一组元素按某个索引依次累加
- y = i∑xiwi + b 加权求和
nn.Flatten()(x) 和 x.view(-1, 12) 的效果一样

x = torch.tensor([[
    [[1,2],[3,4]],
    [[5,6],[7,8]],
    [[9,10],[11,12]]
]])

fx = nn.Flatten()(x)
fx.shape # torch.Size([1, 12])
vx = x.view(-1, 12)
vx.shape # torch.Size([1, 12])

makemore 数据集：!wget https://raw.githubusercontent.com/karpathy/makemore/refs/heads/master/names.txt
torch.mean 取平均值
关于模型训练
- grad 梯度告诉你参数往哪个方向会让损失变大
- 学习/训练的目的是减少损失
监督学习（具有特征和标签）和非监督学习的区别
拟合、欠拟合、过拟合

语言特性比对

_posts

CloudNative

sakila