LogicOverlord

AI infra开发工程师

赞

2

|

搜索文章

pytorch中算子注册原理

注：新手文章，欢迎指正！以下内容基于pytorch2.0.0 pytorch的官方教程https://pytorch.org/tutorials/advanced/extend_dispatcher.

2年前
1.2k
点赞
评论

nvcc和gcc联合编程

示例一：示例二：示例三：参考资料： https://stackoverflow.com/questions/9421108/how-can-i-compile-cuda-code-then-li

2年前
78
点赞
评论

cuda编程中cudaMallocPitch和cudaMemcpy2D的用法

cuda编程分别device端内存的时候，最常用的是cudaMalloc、cudaMemcpy()和cudaFree()函数，本文讲解如何使用这两个函数。

2年前
827
点赞
评论

cuda编程中block size和grid size选择以及占用率计算

cuda编程中block size和grid size选择，如何理论计算cuda占用率，考虑三个因素，最终利用木桶效应得出结论。

2年前
1.9k
点赞
评论

（转载）GPU资源占用与利用率

来源：https://zhuanlan.zhihu.com/p/353410111 SM上的内存资源是有限的，如果每个线程占用的内存资源过多则一个SM上同时可执行的线程数就会减少。同理，如果每个线程块

2年前
391
点赞
评论

(转载)如何设置 CUDA Kernel 中的 grid_size 和 block_size？

来源：https://my.oschina.net/oneflow/blog/5348639 撰文 | 柳俊

2年前
184
点赞
评论

setup.py中cmd_class的用法以及pytorch的build_ext

python中setup.py的用法，如果构建cmdclass，pytorch的extension机制

2年前
502
点赞
评论

个人成就

文章被点赞 5

文章被阅读 17,098

加入于

2023-07-06