chineseBert
别样的embeddings--拼音,字形;推演开来:表情,谐音embeddings
github:github.com/ShannonAI/C…
特别适合黑产文本的识别, 类似这样的同音同义文本:
Aa01A01= qq 扣扣 QQ 企鹅 抠抠 QQ 🐧🐧 叩叩 扣抠 q,q 企鹅号 球球
Aa01A02= 微信 ⅴ信 weixin 违心 wx 威信 微杏 魏❤ 薇❤ 威星 魏信 ⅴ╳ 卫⭐ 微号 微信号 w信 威信号 违心 微x 薇心 薇辛
Aa01A03= 进群 +群 加群 加峮 +裙 加裙 来群 +裙 进峮 家裙 +扣扣裙 ➕抠裙 +扣裙 加企鹅群 抠郡
Aa01A04= +q +Q 加q 加扣 加抠 加🐧 +🐧 加我q ➕q 加企鹅号 ➕🐧 ➕q,q
Aa01A05= mai号 卖号 卖hao 麦号 maihao mài号 在线出售号 荬hao 荬号 脉号 mai账号
Aa01A06= 租号 zu号 zuhao 租账号 zu账号 出租此号 租账号
Aa01A07= 佳v 佳个v +v
0 零 〇 〇 ⁰ ₀ 0
1 一 壹 ① ¹ ₁ 1
2 二 贰 ② ² ₂ 2
3 三 叁 ③ ³ ₃ 3
4 四 肆 ④ ⁴ ₄ 4
5 五 伍 ⑤ ⁵ ₅ 5
6 六 陆 ⑥ ⁶ ₆ 6
7 七 柒 ⑦ ⁷ ₇ 7
9 九 玖 ⑨ ⁹ ₉ 9
8 八 捌 ⑧ ⁸ ₈ 8
加 + 佳 + 嘉 家 伽 迦 珈 茄 颊 痂 茄 ➕ jia
群 裙 峮 qun
卖 麦 荬 脉 迈
#脱敏混淆数据
+个卫⭐151916216
+个吗:扣抠7613136
+他v一3125₇377⁸
+企鹅号群八⑦零八五八
+佳伽1347三⁶伍12
+妹妹抠抠,陆₆98₁753
+微信①⑨⑨②③⑥①⑤③
+我:3²⑨12壹6
+我QQ6453985
+我q26⁰6₃₇0₉
+我q3436⁰065〇
+我q贰³9八887
+我q,q18叁3685
+我q,q6⁹15361给你玩
+我q,q吧,4〇⁹62柒4
+我q,q老婆贰2306陆⑤7²
+我v13〇¹730171
+我zzzz5881
+我一8771壹427
resnet随想
能否skip connect 跨越多层是否会更好?浅层和深层的融合.
git仓库瘦身工具 BFG
场景:
总会有一些实习生不太懂使用git,把大模型或数据集上传到git中,造成了仓库动辄几G
或几十G
,clone
特别慢,影响工作效率.
方案选择:
方案1: 网上很多教程,通过命令查找大文件然后进行删除 比如这篇:blog.csdn.net/luchengtao1… 但速度特别慢,有时候情况比较特殊还不一定能解决,一番折腾还是没能解决问题. 不推荐.
方案2: BFG,请毫不犹豫使用它,丝滑般流畅
非要说缺点,也有一点:要配置java
环境,稍微麻烦了一点,但磨刀不误砍柴工.
BFG使用特别注意事项:
注意1:最新分支的最新内容是不会进行清理的,具体原因看官网.
注意2:清理完后,需要git push
不一定能成功,报错信息如下
remote: GitLab: You are not allowed to force push code to a protected branch on this project.
先取消保护分支即可
什么是one-hot
连续表情去除
# 连续出现的无意义符号
import emoji
flags = re.search(emoji.get_emoji_regexp().pattern + '{2,}', item_str)
if flags:
flags.group()
# re_str = '(?P<value>' + emoji.get_emoji_regexp().pattern.replace('(', '') + '{2,}'
# 重复的表情符号只保留一个:
# example:✊✊✊✊✋✋✋✋ --->✊✋
new_emoji = ''.join(sorted(set(flags.group()), key=flags.group().index))
emoji_count += 1
new_str = item_str.replace(flags.group(), new_emoji)
# print(new_str, '\n')
str_all_remove = remove_all_special_symbols(new_str)
if len(str_all_remove) == 0 and not any(
emoji_item in new_str for emoji_item in ['❌', '⭕', '🖕🏼', '🐔']):
# 纯表情或纯特殊符号的去除
print(item_str)
print('all negative:', new_str)
print('remove:', item, '\n')
去除所有特殊符号
def remove_all_special_symbols(content):
result = ''
for char in content:
i2 = ord(char)
if (i2 >= 0x4e00 and i2 <= 0x9fa5) or \
(i2 >= 0x3400 and i2 <= 0x4db5) or \
(i2 >= 0x0030 and i2 <= 0x003a) or \
(i2 >= 0x0061 and i2 <= 0x007b):
result += char
return result
go+tensorflow+gpu 环境配置
配环境配了一天,很心塞...
最大的坑就是 按照官网的根本跑不通...
package main
import (
"fmt"
tg "github.com/galeone/tfgo"
tf "github.com/galeone/tensorflow/tensorflow/go"
)
func main() {
root := tg.NewRoot()
A := tg.NewTensor(root, tg.Const(root, [2][2]int32{{1, 2}, {-1, -2}}))
x := tg.NewTensor(root, tg.Const(root, [2][1]int64{{10}, {100}}))
b := tg.NewTensor(root, tg.Const(root, [2][1]int32{{-10}, {10}}))
Y := A.MatMul(x.Output).Add(b.Output)
// Please note that Y is just a pointer to A!
// If we want to create a different node in the graph, we have to clone Y
// or equivalently A
Z := A.Clone()
results := tg.Exec(root, []tf.Output{Y.Output, Z.Output}, nil, &tf.SessionOptions{})
fmt.Println("Y: ", results[0].Value(), "Z: ", results[1].Value())
fmt.Println("Y == A", Y == A) // ==> true
fmt.Println("Z == A", Z == A) // ==> false
}
root@98d4e01d6441:~/test# go run main.go
2021-06-02 03:31:17.394266: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.11.0
2021-06-02 03:31:17.425760: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2021-06-02 03:31:17.425997: I tensorflow/compiler/jit/xla_gpu_device.cc:99] Not creating XLA devices, tf_xla_enable_xla_devices not set
2021-06-02 03:31:17.426492: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcuda.so.1
2021-06-02 03:31:18.147130: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:941] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-06-02 03:31:18.147571: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1720] Found device 0 with properties:
pciBusID: 0000:01:00.0 name: GeForce RTX 3080 computeCapability: 8.6
coreClock: 1.74GHz coreCount: 68 deviceMemorySize: 9.78GiB deviceMemoryBandwidth: 707.88GiB/s
2021-06-02 03:31:18.147607: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:941] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-06-02 03:31:18.147897: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1720] Found device 1 with properties:
pciBusID: 0000:02:00.0 name: GeForce RTX 3080 computeCapability: 8.6
coreClock: 1.74GHz coreCount: 68 deviceMemorySize: 9.78GiB deviceMemoryBandwidth: 707.88GiB/s
2021-06-02 03:31:18.147926: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:941] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-06-02 03:31:18.148214: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1720] Found device 2 with properties:
pciBusID: 0000:05:00.0 name: GeForce RTX 3080 computeCapability: 8.6
coreClock: 1.74GHz coreCount: 68 deviceMemorySize: 9.78GiB deviceMemoryBandwidth: 707.88GiB/s
2021-06-02 03:31:18.148243: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:941] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-06-02 03:31:18.148527: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1720] Found device 3 with properties:
pciBusID: 0000:06:00.0 name: GeForce RTX 3080 computeCapability: 8.6
coreClock: 1.74GHz coreCount: 68 deviceMemorySize: 9.78GiB deviceMemoryBandwidth: 707.88GiB/s
2021-06-02 03:31:18.148539: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.11.0
2021-06-02 03:31:18.150121: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcublas.so.11
2021-06-02 03:31:18.150139: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcublasLt.so.11
2021-06-02 03:31:18.150620: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcufft.so.10
2021-06-02 03:31:18.150743: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcurand.so.10
2021-06-02 03:31:18.150868: W tensorflow/stream_executor/platform/default/dso_loader.cc:60] Could not load dynamic library 'libcusolver.so.10'; dlerror: libcusolver.so.10: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/local/cuda/extras/CUPTI/lib64:/usr/local/cuda/lib64:/usr/local/nvidia/lib:/usr/local/nvidia/lib64
2021-06-02 03:31:18.151220: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcusparse.so.11
2021-06-02 03:31:18.151305: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudnn.so.8
2021-06-02 03:31:18.151313: W tensorflow/core/common_runtime/gpu/gpu_device.cc:1757] Cannot dlopen some GPU libraries. Please make sure the missing libraries mentioned above are installed properly if you would like to use GPU. Follow the guide at https://www.tensorflow.org/install/gpu for how to download and setup the required libraries for your platform.
Skipping registering GPU devices...
2021-06-02 03:31:18.151334: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1261] Device interconnect StreamExecutor with strength 1 edge matrix:
2021-06-02 03:31:18.151340: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1267] 0 1 2 3
2021-06-02 03:31:18.151346: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1280] 0: N N N N
2021-06-02 03:31:18.151349: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1280] 1: N N N N
2021-06-02 03:31:18.151353: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1280] 2: N N N N
2021-06-02 03:31:18.151357: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1280] 3: N N N N
2021-06-02 03:31:18.151983: I tensorflow/core/platform/profile_utils/cpu_utils.cc:112] CPU Frequency: 3000000000 Hz
Y: [[200] [-200]] Z: [[200] [-200]]
Y == A true
Z == A false
鉴黄模型NudeNet初步调研
两大功能:
-
图片分类:区分黄色图片与非黄色图片
-
目标检测:
-
敏感部位是否暴露
-
可检测臀部,腹部,胸部,生殖部位,腋下
-
可检测是否穿内衣
-
可区分男女
github:
测试代码:
import os
from nudenet import NudeDetector
import matplotlib.pyplot as plt
from matplotlib.pyplot import imshow,figure
import numpy as np
# initialize detector (downloads the checkpoint file automatically the first time)
detector = NudeDetector() # detector = NudeDetector('base') for the "base" version of detector.
data_dir = './test_images/small_data/train_data/sexy/'
image_list = os.listdir(data_dir)
font = ImageFont.truetype("/usr/share/fonts/truetype/dejavu/DejaVuSerif-Bold.ttf", 50)
# Detect single image /home/xin/.nudenet
%matplotlib inline
import random
random.shuffle(image_list)
for image_name in image_list[:10]:
image_path = os.path.join(data_dir,image_name)
result = detector.detect(image_path)
im = Image.open(image_path)
draw = ImageDraw.Draw(im)
plt.figure(figsize=(8, 6), dpi=80)
print('--------------------------')
for item in result:
box = item['box']
score = item['score']
label = item['label'].lower()
draw.rectangle(box,width=5)
# draw.text(box[:2],'score:\n'+str(score)+'\nlabel:\n'+label,font=font,fill=(255,0,0,255))
draw.text(box[:2],label,font=font,fill=(255,0,0,255))
print(label)
imgplot = plt.imshow(np.asarray(im))
plt.show()