attention一些点

import torch

# a 模拟的是: batch_size: 2, sequence_length:3, feature: 4
a = torch.arange(24).reshape(2, 3, 4)
# b 代表的是: feature: 4 * 4
b = a.unsqueeze(2).expand(-1, -1, 4, -1) # shape: torch.Size([2, 3, 4, 4])

output:


a[0]
>>> tensor([[ 0,  1,  2,  3],
        [ 4,  5,  6,  7],
        [ 8,  9, 10, 11]])


b[0]

>>>tensor([[[ 0,  1,  2,  3],
         [ 0,  1,  2,  3],
         [ 0,  1,  2,  3],
         [ 0,  1,  2,  3]],
        [[ 4,  5,  6,  7],
         [ 4,  5,  6,  7],
         [ 4,  5,  6,  7],
         [ 4,  5,  6,  7]],
        [[ 8,  9, 10, 11],
         [ 8,  9, 10, 11],
         [ 8,  9, 10, 11],
         [ 8,  9, 10, 11]]])

换种说法就是:

a: [B, L, F]
b: [B, L, F, F]

每个字对应一个[F, F]的矩阵
这个矩阵的第i行第j列的元素的含义是：上一时刻tag为i, 这一时刻tag为j的分数。比如应用到一阶马尔可夫相关的模型中。

情况二（这种情况没真实测试过，需谨慎对待）

# b 代表的是: sequence length: 3 * 3
b = a.unsqueeze(1).expand(-1, 3, -1, -1) # shape: torch.Size([2, 3, 3, 4])


b[0]

tensor([[[ 0,  1,  2,  3],
         [ 4,  5,  6,  7],
         [ 8,  9, 10, 11]],
        [[ 0,  1,  2,  3],
         [ 4,  5,  6,  7],
         [ 8,  9, 10, 11]],
        [[ 0,  1,  2,  3],
         [ 4,  5,  6,  7],
         [ 8,  9, 10, 11]]])

即这个长度为3的句子形成一个首尾的矩阵，最后一维代表相关的概率。

比如在嵌套ner任务中，

	我	爱	北	京
我
爱
北	0.03	0.02	0.05	0.9
京

展开全文 >>

我最敬爱的老婆

2021-07-23

昨晚10点多突然急性肠炎，躺在床上无心睡眠。前天就有相应症状了，只是没那么明显也没在意。

老婆看我难受，就开始拿手机在网上百度，拉肚子应该吃什么药，氯化钠盐水、抗生素什么的。
我说小区外药店应该还开着吧，老婆遂起床跑出去给我买药。随后我起床上厕所给她发消息说我在网上买，你赶紧回来吧。11点多吃了肠炎宁就睡了。

凌晨1点多肚子咕噜咕噜闹肚子，去了厕所，喝了自创的盐糖水，喝了两口瞬间呕吐。我瞬间觉得不能不看了。

老婆迷迷糊糊的从卧室出来，说你又拉肚子了。我嗯了下。她揉着眼说我陪你去医院吧。老婆拿着上衣，搜了急诊，离住的地方1公里多。路上走着准备打车，可惜没有一辆出租车停，老婆说应该人家是换班了。

到了医院，扫了健康码，登记病情，抽血等待40分钟，1楼查看抽血报告、交钱，2楼拿药，4楼挂水。

终于坐着挂吊水了，前前后后等了那么久。

期间有喝醉酒的和他女朋友发着脾气般的打情骂俏。

我老婆直接怼了回去，能不能小点声，人家还得睡觉呢。

哈哈哈，再也不怂了。

拿着手机又去刷吴亦凡的瓜去了～

觉得这个时刻应该记录下来，遂拍了几张照片，老婆的照片就不放出来了。

等我拍了那几张照片翻回来看时，突然觉得冥冥之中遇到我老婆早已缘分注定，科学的尽头便是神学，更坚定了我的想法。

能遇到我老婆是我人生中最大的福气吧。

展开全文 >>

ubuntu16.04多GPU风扇转速调整

2021-07-22

最近闲置出来两块1080ti GPU，内心那叫一个激动哇，虽然有些老，另外训练时转速提不上去，此次就解决这个问题。

方法一（个人只在单GPU上实验成功）

1. 生成xorg.conf

如果:

cannot stat /etc/x11/xorg.conf no such file or directory

# 生成这个文件

$ sudo nvidia-xconfig --enable-all-gpus --cool-bits=4

2. vim /etc/x11/xorg.conf

添加Option "Coolbits" "4"到device nvidia 里面.如果有多个就每个都add。

3. reboot

4.nvidia-settings

打开nvidia-settings软件，然后调风扇转速即可。

方法二

使用coolgpus

1. 关闭图形化界面

1	systemctl stop lightgdm && systemctl lightgdm disable lightgdm

2. 使用coolgpus

[Unit]
Description=Headless GPU Fan Control
After=syslog.target

[Service]
ExecStart=/home/ajones/conda/bin/coolgpus --kill 
Restart=on-failure
RestartSec=5s
ExecStop=/bin/kill -2 $MAINPID
KillMode=none 

[Install]
WantedBy=multi-user.target

1 2	sudo systemctl enable coolgpus sudo systemctl start coolgpus

个人建议

如果使用coolgpus的时候先确保可以正常使用，然后再添加到systemctl里面去。

1
2
3

systemctl stop lightgdm
sudo $(which coolgpus) --speed 99 99

如果可以了再后续操作。

展开全文 >>

lstm使用示例

2021-07-20

注意，本文代码来自于plm-nlp-code。

学习任何模型都需要一个简单可行的例子进行说明，我会基于plm-nlp-code的代码进行说明lstm在序列标注和句子极性二分类两个例子的应用。

序列标注

参考文件lstm_postag.py.

1. 加载数据

1 2	#加载数据 train_data, test_data, vocab, pos_vocab = load_treebank()

其中load_treebank代码：

def load_treebank():
    # 需要翻墙下载，可以自行设置代码
    nltk.set_proxy('http://192.168.0.28:1080')
    # 如果没有的话那么则会下载，否则忽略
    nltk.download('treebank')
    from nltk.corpus import treebank

    sents, postags = zip(*(zip(*sent) for sent in treebank.tagged_sents()))

    vocab = Vocab.build(sents, reserved_tokens=["<pad>"])

    tag_vocab = Vocab.build(postags)

    train_data = [(vocab.convert_tokens_to_ids(sentence), tag_vocab.convert_tokens_to_ids(tags)) for sentence, tags in zip(sents[:3000], postags[:3000])]
    test_data = [(vocab.convert_tokens_to_ids(sentence), tag_vocab.convert_tokens_to_ids(tags)) for sentence, tags in zip(sents[3000:], postags[3000:])]

    return train_data, test_data, vocab, tag_vocab

加载后可以看到，train_data和test_data都是list，其中每一个sample都是tuple,分别是input和target。如下：

>>> train_data[0]
>>> Out[1]: 
([2, 3, 4, 5, 6, 7, 4, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18],
 [1, 1, 2, 3, 4, 5, 2, 6, 7, 8, 9, 10, 8, 5, 9, 1, 3, 11])

2. 数据处理


# 这个函数就是将其变成等长，填充使用<pad>，至于是0还是1还是其他值并不重要，因为还有mask~
def collate_fn(examples):
    lengths = torch.tensor([len(ex[0]) for ex in examples])
    inputs = [torch.tensor(ex[0]) for ex in examples]
    targets = [torch.tensor(ex[1]) for ex in examples]
    inputs = pad_sequence(inputs, batch_first=True, padding_value=vocab["<pad>"])
    targets = pad_sequence(targets, batch_first=True, padding_value=vocab["<pad>"])
    return inputs, lengths, targets, inputs != vocab["<pad>"]

3. 模型部分

class LSTM(nn.Module):
    def __init__(self, vocab_size, embedding_dim, hidden_dim, num_class):
        super(LSTM, self).__init__()
        self.embeddings = nn.Embedding(vocab_size, embedding_dim)
        self.lstm = nn.LSTM(embedding_dim, hidden_dim, batch_first=True)
        self.output = nn.Linear(hidden_dim, num_class)
        init_weights(self)

    def forward(self, inputs, lengths):
        embeddings = self.embeddings(inputs)
        x_pack = pack_padded_sequence(embeddings, lengths, batch_first=True, enforce_sorted=False)
        hidden, (hn, cn) = self.lstm(x_pack)
        hidden, _ = pad_packed_sequence(hidden, batch_first=True)
        outputs = self.output(hidden)
        log_probs = F.log_softmax(outputs, dim=-1)
        return log_probs

其中有几个地方可能需要注意的：

pack_padded_sequence和pad_packed_sequence
因为lstm为rnn模型，样本输入不一定是等长的，那么torch提供了这两个函数进行统一处理，length告诉lstm，等超过length时这个样本后面pad进来的就不再计算了。

>>> from torch.nn.utils.rnn import pack_padded_sequence, pad_packed_sequence
>>> seq = torch.tensor([[1,2,0], [3,0,0], [4,5,6]])
>>> lens = [2, 1, 3]
>>> packed = pack_padded_sequence(seq, lens, batch_first=True, enforce_sorted=False)
>>> packed
PackedSequence(data=tensor([4, 1, 3, 5, 2, 6]), batch_sizes=tensor([3, 2, 1]),
                sorted_indices=tensor([2, 0, 1]), unsorted_indices=tensor([1, 2, 0]))
>>> seq_unpacked, lens_unpacked = pad_packed_sequence(packed, batch_first=True)
>>> seq_unpacked
tensor([[1, 2, 0],
        [3, 0, 0],
        [4, 5, 6]])
>>> lens_unpacked
tensor([2, 1, 3])

lstm输出

hidden, (hn, cn)分别表示每个timestep的输出,最后一个时刻的每层输出,cn表示保存c的值。

所以可以看到，序列标注会用到每个timestep的输出来表示每个token。

F.log_softmax和损失函数计算

如果看源码较多的情况下，你会发现log_softmax或者softmax会和CrossEntropyLoss出现在一起，这里很简单理解，因为CrossEntropyLoss由两个函数组成，log_softmax和NLLLoss，log_softmax或者softmax是做归一化，由分数转成概率，log_softmax是平滑。NLLLoss负责取target index对应的logits score，然后除以总分。目的使之最大。

3. 训练

#训练过程
nll_loss = nn.NLLLoss()
optimizer = optim.Adam(model.parameters(), lr=0.001) #使用Adam优化器

model.train()
for epoch in range(num_epoch):
    total_loss = 0
    for batch in tqdm(train_data_loader, desc=f"Training Epoch {epoch}"):
        inputs, lengths, targets, mask = [x.to(device) for x in batch]
        lengths = lengths.cpu()
        log_probs = model(inputs, lengths)
        loss = nll_loss(log_probs[mask], targets[mask])
        optimizer.zero_grad()
        loss.backward()
        optimizer.step()
        total_loss += loss.item()
    print(f"Loss: {total_loss:.2f}")

这部分没啥好说的了，log_probs为三维矩阵，比如torch.Size([32, 58, 47]),表示batch_size=32,seq_length=58,一共47个tags。

推理部分就是argmax取其最大的tag index，可以看：

acc = 0
total = 0
for batch in tqdm(test_data_loader, desc=f"Testing"):
    inputs, lengths, targets, mask = [x.to(device) for x in batch]
    with torch.no_grad():
        output = model(inputs, lengths)
        acc += (output.argmax(dim=-1) == targets)[mask].sum().item()
        total += mask.sum().item()

句子极性二分类

参考文件lstm_sent_polarity.py。

这个名字自己起的，任务目标具体就是对输入句子做二分类。

1. 加载数据

def load_sentence_polarity():
    nltk.set_proxy('http://192.168.0.28:1080')
    nltk.download('sentence_polarity')
    from nltk.corpus import sentence_polarity

    vocab = Vocab.build(sentence_polarity.sents())

    train_data = [(vocab.convert_tokens_to_ids(sentence), 0)
                  for sentence in sentence_polarity.sents(categories='pos')[:4000]] \
        + [(vocab.convert_tokens_to_ids(sentence), 1)
            for sentence in sentence_polarity.sents(categories='neg')[:4000]]

    test_data = [(vocab.convert_tokens_to_ids(sentence), 0)
                 for sentence in sentence_polarity.sents(categories='pos')[4000:]] \
        + [(vocab.convert_tokens_to_ids(sentence), 1)
            for sentence in sentence_polarity.sents(categories='neg')[4000:]]

    return train_data, test_data, vocab

关于数据格式:

1
2
3

train_data[321]
Out[2]: ([6, 6, 6, 4489, 1337, 15065, 3252, 6], 0)
# 前面部分表示句子的每个token，后面表示label。

label一共有两个，0和1,所以为二分类。

2. 有趣的点

整个训练过程貌似和上例没什么不同，但是可以举几个比较有意思的地方。

关于二分类使用CrossEntropyLoss还是BCELoss

这两者本质是一样的，BCELoss就是CrossEntropyLoss的特例。你可以看loss.py。

BCEWithLogitsLoss和BCELoss的区别就是一个需要用sigmoid一个不需要。

你可以尝试改动这个代码，将作者使用到的log_softmax和NLLLoss改成使用sigmoid和BCELoss。

lstm中hn的输出

既然hn表示timestep的最后一个时刻的输出，那么我们也有理由相信，最后一个时刻的feature可以代表整个句子的feature。

那么就需要关注下hn的输出具体是什么样子了。

源码输出example比如:

1 2	hn.shape Out[2]: torch.Size([1, 32, 256])

其中1是因为num_layers为1，又不是双向lstm，所以为1。

而如果改成双向lstm，bidirectional=True，那么，

1 2	hn.shape Out[2]: torch.Size([2, 32, 256])

如果num_layers为3,那么：

1 2	hn.shape Out[2]: torch.Size([6, 32, 256])

到这里我们就要理解下他输出的含义了？

他表示一共有6个层，即3个双向lstm，而双向的实现，就是正向计算一次，反向再计算一次，即[::-1]，那么一共6层。

整个模型更改如下：

class LSTM(nn.Module):
    def __init__(self, vocab_size, embedding_dim, hidden_dim, num_class):
        super(LSTM, self).__init__()
        self.embeddings = nn.Embedding(vocab_size, embedding_dim)
        self.lstm = nn.LSTM(embedding_dim, hidden_dim, batch_first=True, bidirectional=True, num_layers=3)
        self.output = nn.Linear(hidden_dim * 6, num_class)

    def forward(self, inputs, lengths):
        embeddings = self.embeddings(inputs)
        x_pack = pack_padded_sequence(embeddings, lengths, batch_first=True, enforce_sorted=False)
        hidden, (hn, cn) = self.lstm(x_pack)
        outputs = self.output(hn.permute(1,0,2).reshape(-1, 6 * 256))
        log_probs = F.log_softmax(outputs, dim=-1)
        return log_probs

展开全文 >>

python bdist_wheel

2021-07-14

利用cython进行编译成库.

1. setup.py

# coding: utf-8
import os
import sysconfig


from setuptools import setup, find_packages

# https://stackoverflow.com/questions/21594925/error-each-element-of-ext-modules-option-must-be-an-extension-instance-or-2-t
# 注意，这个顺序必须在这里，要比setup低.
from Cython.Build import cythonize

from setuptools.command.build_py import build_py as _build_py


def read():
    with open("./requirements.txt", "r") as f:
        return f.readlines()


# Notice1: your modules.
packages = set(find_packages()) | {'your modules.'}

# Notice2: exclude files.
EXCLUDE_FILES = [
    # 'your package/ignore python files.'
    'package/file1.py'

]


def get_ext_paths(root_dir, exclude_files):
    """get filepaths for compilation"""
    paths = []

    for root, dirs, files in os.walk(root_dir):
        for filename in files:
            if os.path.splitext(filename)[1] != '.py':
                continue

            file_path = os.path.join(root, filename)
            if file_path in exclude_files:
                continue

            paths.append(file_path)
    return paths


# noinspection PyPep8Naming
class build_py(_build_py):

    def find_package_modules(self, package, package_dir):
        ext_suffix = sysconfig.get_config_var('EXT_SUFFIX')
        modules = super().find_package_modules(package, package_dir)
        filtered_modules = []
        for (pkg, mod, filepath) in modules:
            if os.path.exists(filepath.replace('.py', ext_suffix)):
                continue
            filtered_modules.append((pkg, mod, filepath,))
        return filtered_modules


setup(
    name='package name', # Notice3
    version='0.3.0',

    ext_modules=cythonize(
        get_ext_paths('project root path', EXCLUDE_FILES), # Notice4
        compiler_directives={'language_level': 3.6}
    ),
    cmdclass={
        'build_py': build_py
    },
    packages=packages,
    include_package_data=True,
    install_requires=read(),
)

2. build

1
2
3

python setup.py build_ext --inplace
python setup.py bdist_wheel

3. 结论

通过这种方式编译成一个库，正常情况下无法查看源码。
在开发过程中，IDE无法自动导入，就正常打包方便开发同学调试、使用，而在生产环境使用2方式进行编译成一个安装包，开发替换一下安装包路径即可。

展开全文 >>

pytorch学习率调整

2021-07-02

keras

在keras中，比如动态调整学习率，可以：


import tensorflow as tf


def step_decay(epoch):
        if epoch < 3:
            lr = 1e-5
        else:
            lr = 1e-6
        return lr

tf.keras.callbacks.LearningRateScheduler(step_decay, verbose=2)

lr_scheduler

在pytorch中，提供了torch.optim.lr_scheduler

1. StepLR

# -*- coding: utf8 -*-
#

import torch
from pyecharts import options
from pyecharts.charts import Line
from torch import optim
from torch.nn import Linear
from torch.optim import lr_scheduler


class TestModel(torch.nn.Module):
    def __init__(self):
        super(TestModel, self).__init__()
        self.linear = Linear(100, 2)

    def forward(self, x):
        return self.linear(x)


def line_graph(xs, ys):
    line = Line()
    line.add_xaxis(xs)
    line.add_yaxis(series_name='学习率', y_axis=ys, is_smooth=True)
    line.set_global_opts(
        title_opts=options.TitleOpts(title='学习率调整图'),
        toolbox_opts=options.ToolboxOpts()
    )
    line.set_series_opts(
        label_opts=options.LabelOpts(is_show=False),
        # markline_opts=options.MarkLineOpts(
        #     # 设置平均值的标记线
        #     data=[options.MarkLineItem(name='平均值', type_='average')],
        #     # 设置最大值的标记线
        #     # data = [options.MarkLineItem(name='最大值', type_='max')]
        # )
    )

    line.render('折线图.html')


model = TestModel()

optimizer = optim.Adam(params=model.parameters(), lr=0.05)

# Assuming optimizer uses lr = 0.05 for all groups
# lr = 0.05     if epoch < 30
# lr = 0.005    if 30 <= epoch < 60
# lr = 0.0005   if 60 <= epoch < 90

scheduler = lr_scheduler.StepLR(optimizer, step_size=30, gamma=0.1)

x = list(range(100))
y = []
for epoch in range(100):
    optimizer.step()
    scheduler.step()
    lr = scheduler.get_lr()
    y.append(scheduler.get_lr()[0])

line_graph(x, y)

2. MultiStepLR


scheduler = lr_scheduler.MultiStepLR(optimizer, [30, 80], 0.1)

这个可以设置区间，在30 ~ 80 为一个学习率

3. ExponentialLR

1	scheduler = lr_scheduler.ExponentialLR(optimizer, gamma=0.9)

指数衰减

transformers库

在transformers库中，也提供了一些，比如：

1. get_linear_schedule_with_warmup

学习率预热

1	num_warmup_steps = 0.05 * len(train_dataloader) * epochs

optimizer = optim.Adam(params=model.parameters(), lr=1e-3)

scheduler = get_linear_schedule_with_warmup(
    optimizer=optimizer,
    num_warmup_steps=10000,
    num_training_steps=100000)

学习率先不断上升，然后再不断减小。

在预热期间，学习率从0线性增加到优化器中的初始lr。随后线性降低到0

理论

展开全文 >>

批次数据处理

2021-06-28

在数据处理里，常使用DataSet和DataLoader，关于具体使用此处不介绍，对于每一个batch_size里的数据来讲，
一般数据是shuffle=True，即表示打乱顺序，能够使数据更无规律和更为随机。但是如果对于数据样本不定长的情况或者说分布不均匀的情况下，
要使其定长，做法就是pad到一个固定长度，如果长短分布差距大呢？？

比如：

1 2	example1: 我有一个小摩托，我从来也不急，骑着我的小摩托，从此把它骑。 example2: 我爱南京。

如果将example2和example1pad到一个固定长度，pad太多。（虽然你可以扯进来mask，但是要在model的每一layer都要设置mask,关于mask此处忽略。）

如何在一个batch_size样本里面使数据分布更为接近？

kmean algo for clustering the feature by length.

kmeans等聚类算法就可以辅助解决，以每条数据的长度为依据。



def kmeans(x, k, max_it=32):
    r"""
    KMeans algorithm for clustering the sentences by length.

    Args:
        x (list[int]):
            The list of sentence lengths.
        k (int):
            The number of clusters.
            This is an approximate value. The final number of clusters can be less or equal to `k`.
        max_it (int):
            Maximum number of iterations.
            If centroids does not converge after several iterations, the algorithm will be early stopped.

    Returns:
        list[float], list[list[int]]:
            The first list contains average lengths of sentences in each cluster.
            The second is the list of clusters holding indices of data points.

    Examples:
        >>> x = torch.randint(10,20,(10,)).tolist()
        >>> x
        [15, 10, 17, 11, 18, 13, 17, 19, 18, 14]
        >>> centroids, clusters = kmeans(x, 3)
        >>> centroids
        [10.5, 14.0, 17.799999237060547]
        >>> clusters
        [[1, 3], [0, 5, 9], [2, 4, 6, 7, 8]]
    """

    # the number of clusters must not be greater than the number of datapoints
    x, k = torch.tensor(x, dtype=torch.float), min(len(x), k)
    # collect unique datapoints
    d = x.unique()
    # initialize k centroids randomly
    c = d[torch.randperm(len(d))[:k]]
    # assign each datapoint to the cluster with the closest centroid
    dists, y = torch.abs_(x.unsqueeze(-1) - c).min(-1)

    for _ in range(max_it):
        # if an empty cluster is encountered,
        # choose the farthest datapoint from the biggest cluster and move that the empty one
        mask = torch.arange(k).unsqueeze(-1).eq(y)
        none = torch.where(~mask.any(-1))[0].tolist()
        while len(none) > 0:
            for i in none:
                # the biggest cluster
                b = torch.where(mask[mask.sum(-1).argmax()])[0]
                # the datapoint farthest from the centroid of cluster b
                f = dists[b].argmax()
                # update the assigned cluster of f
                y[b[f]] = i
                # re-calculate the mask
                mask = torch.arange(k).unsqueeze(-1).eq(y)
            none = torch.where(~mask.any(-1))[0].tolist()
        # update the centroids
        c, old = (x * mask).sum(-1) / mask.sum(-1), c
        # re-assign all datapoints to clusters
        dists, y = torch.abs_(x.unsqueeze(-1) - c).min(-1)
        # stop iteration early if the centroids converge
        if c.equal(old):
            break
    # assign all datapoints to the new-generated clusters
    # the empty ones are discarded
    assigned = y.unique().tolist()
    # get the centroids of the assigned clusters
    centroids = c[assigned].tolist()
    # map all values of datapoints to buckets
    clusters = [torch.where(y.eq(i))[0].tolist() for i in assigned]

    return centroids, clusters

这样使用的话，那么batch_size就木有啥子用处了，有可能一个batch样本量为1,也有可能为设置的上限。

但是转念一想，其实也可以不用如此复杂，直接sorted by length也是可以的嘛，具体解释就忽略了。

展开全文 >>

数据增强

2021-06-28

图像常见数据增强有翻转，旋转，缩放比例等不同的transforms。
对于文本，可以增加噪声干扰，也可以通过添加slide window（此处重点，嘿嘿。）

举个简单例子描述下滑动窗口。

a = [1,2,3,4,5]

[a[i: i + 1] for i in range(0, len(a), 1)]
Out[36]: [[1], [2], [3], [4], [5]]

[a[i: i + 2] for i in range(0, len(a), 2)]
Out[37]: [[1, 2], [3, 4], [5]]

[a[i: i + 3] for i in range(0, len(a), 3)]
Out[38]: [[1, 2, 3], [4, 5]]

[a[i: i + 4] for i in range(0, len(a), 4)]
Out[39]: [[1, 2, 3, 4], [5]]

[a[i: i + 5] for i in range(0, len(a), 5)]
Out[40]: [[1, 2, 3, 4, 5]]

但是在某些训练结果下，其作用貌似并不大，反正可以尝试。

展开全文 >>