Appearance
- Python 新一代依赖管理器 uv
- torch.tensor 和 torch.Tensor 的区别
obj.__getitem__(index)Python 特性def __call__(self, x):Python 特性- 小批量随机梯度下降/为啥分批训练 1.内存 2.训练慢 3.梯度噪声
- 训练集 验证集(调整超参数) 测试集 8-1-1
- 反向传播计算时,心里需要知道:本地导数、链式法则都是啥
py
def __add__(self, other):
other = other if isinstance(other, Value) else Value(other)
out = Value(self.data + other.data, (self, other), '+')
def _backward():
self.grad += 1 * out.grad
other.grad += 1 * out.grad
out._backward = _backward
return out
def __mul__(self, other):
other = other if isinstance(other, Value) else Value(other)
out = Value(self.data * other.data, (self, other), '*')
def _backward():
self.grad += other.data * out.grad
other.grad += self.data * out.grad
out._backward = _backward
return outΣ(Sigma:西格马)求和符号,表示对一组元素按某个索引依次累加y = i∑xiwi + b加权求和
nn.Flatten()(x)和x.view(-1, 12)的效果一样
py
x = torch.tensor([[
[[1,2],[3,4]],
[[5,6],[7,8]],
[[9,10],[11,12]]
]])
fx = nn.Flatten()(x)
fx.shape # torch.Size([1, 12])
vx = x.view(-1, 12)
vx.shape # torch.Size([1, 12])- makemore 数据集:
!wget https://raw.githubusercontent.com/karpathy/makemore/refs/heads/master/names.txt torch.mean取平均值- 关于模型训练
- grad 梯度 告诉你参数往哪个方向会让损失变大
- 学习/训练的目的是减少损失
- 监督学习(具有特征和标签)和非监督学习的区别
- 拟合、欠拟合、过拟合