科学网 › 标签 › CUDA

标签: CUDA

相关帖子	版块	作者	回复/查看	最后发表

没有相关内容

相关日志

悬赏45万找CUDA高手：知识就是金钱: outcrop 2018-11-21 21:53; Bounty Rewards 20 000 CHF for a Multi GPU miner and a guide on how to set up and use it 20 000 CHF for the Stratum miner and a guide how to use it 25 000 CHF for improving the efficiency of the currently existing AE mining software CHF，瑞士法郎；三个层次的挑战65000CHF，折合rmb有45万左右。精通CUDA编程的上！可惜我不够懂。悬赏细则（墙外，能应战的翻墙肯定不是问题）： https://blog.aeternity.com/bounties-multi-gpu-miner-guide-stratum-miner-and-mining-optimization-51029d468e79; 个人分类: 计算机应用技术|2827 次阅读|0 个评论

[转载]GPU and CUDA: gll89 2017-8-29 05:16; The GPU (graphics processing units), as a specialized computer processor, addresses the demands of real-time high-resolution 3D graphics compute-intensive tasks. GPU had envolved into highly parrel multi-core systems allowing very efficient manipulation of large blocks of data. This design is more effective than general-purpose CPUs (central processing unit) for algorithms in situations where processing large blocks of data is done in paralle. CUDA is a parallel computing platform and application programming interface (API) model created by Nvidia. It allows software developers/engineers to use a CUDA-enabled GPU for general purpose processing. The CUDA platform is designed to work with programming languages such as python, C, C++ and Fortran. This accessibility makes it easier for specialists in parallel programming to use GPU resources.; 个人分类: DeepLearning|1337 次阅读|0 个评论

[转载]CUDA 培训（中外大学、科研机构、专家个人的CUDA培训）链接大全: depengchen 2015-11-24 10:07; CUDA 培训本页面包含了一些在线课程，有助于开发者着手编程或讲授 CUDA ，同时还包含了大学 CUDA 课程的链接。本页面分为三个部分，以便帮助开发者实现入门介绍性 CUDA 技术培训课程 CUDA 大学课程 CUDA 研讨会与专题报告会介绍性 CUDA 技术培训课程第 1 卷: CUDA 编程入门练习 (针对 Linux 和 Mac) Visual Studio 练习 (针对 Windows) 练习说明第 2 卷: CUDA 案例研究 CUDAcasts - 可下载的 CUDA 培训播客 (Podcast) GPU 计算入门 CUDA 编程模型概述 CUDA 编程基础知识 - 第 1 部分 CUDA 编程基础知识 - 第 2 部分点此链接，观看更多 GPU 计算在线研讨会 CUDA 大学课程伊利诺伊大学: ECE 498AL 由胡文美 (Wen-mei W. Hwu) 教授和英伟达 CUDA 科学家 David Kirk 主讲。 GPU 计算入门 (60.2 MB) CUDA 编程模型 (75.3 MB) CUDA API (32.4 MB) CUDA 中的简单矩阵乘法 (46.0 MB) CUDA 存储器模型 (109 MB) 共享存储器矩阵乘法 (81.4 MB) CUDA API 的更多特性 (22.4 MB) 有关 CUDA Tools 的实用信息 (15.7 MB) 线程执行硬件 (140 MB) 存储器硬件 (85.8 MB) 存储体冲突 (115 MB) 并行线程执行 (32.6 MB) 控制流量 (96.6 MB) 精度 (137 MB) 这些课程均为可下载的「CUDAcast」，里面的视频预先调整了尺寸，兼容多种主要的播放器。所有 PowerPoint 课程演示文稿均可在课程大纲: ECE 498AL 中找到。斯坦福大学: 由英伟达开发技术团队主讲的英伟达简短课程: 利用 CUDA 的高性能计算。 (链接至下列 Flash 视频，至于 Sliverlight，请访问课程主页 ) 利用 CUDA 的高性能计算入门。利用 CUDA 的高性能计算入门。 (CUDA C 基础知识 2) 基础优化 1 - 全局存储器基础优化 2 - 共享存储器常规网格上的有限差分模板决定内核性能的限制因素斯坦福大学: CS193G 由 Jared Hoberock 和 David Tarjan 主讲大规模并行计算入门 GPU 历史以及 CUDA 编程基础知识 CUDA 线程与原子 CUDA 存储器性能注意事项并行样式 I 并行样式 II Thrust 入门面向吞吐量的处理器上的稀疏矩阵-向量乘法 PDE 解算器 Fermi 架构光线追踪案例研究未来的吞吐量路径规划案例研究优化 GPU 性能最终讲座待定 PowerPoint 版的这些演示文稿可以在此链接中找到。 CS193G 作业 CS193G 教程加州大学戴维斯分校: EE171, 并行计算机架构由 John Owens 副教授主讲课程资料威斯康星大学麦迪逊分校: ME964 , 针对工程应用的高性能计算由 Dan Negrut 助理教授主讲课程资料北卡莱罗纳大学夏洛特分校 (UNCC): SIGCSE 2011 研讨会: 利用 GPU 的通用计算: 在 CUDA 编程工作中开发实习大学生课程由 Barry Wilkinson 和 Yaohang Li 主讲研讨会资料 ITCS 6010/8010 计算机科学话题: 用于高性能计算的 GPU 编程 (CUDA 编程) 由 Barry Wilkinson 主讲课程资料讲授 CUDA 课程的大学，在这里你可以申请学习诸多课程。 CUDA 研讨会与专题报告会 GPU 技术大会: 搜索录像 2010 年超级计算大会英伟达 GPU 计算影院 2009 年超级计算大会英伟达 GPU 计算影院 2009 年超级计算大会专题报告: 利用 CUDA 的高性能计算 2008 年超级计算大会专题报告: 利用 CUDA 的高性能计算 2007 年超级计算大会专题报告: 利用 CUDA 的高性能计算 2008 NVISION 大会专题报告 CUDA 入门 (介绍了 CUDA 编程模型、CUDA 编程基础知识以及 BLAS 和 FFT 库) 高级 CUDA 培训 (利用粒子模拟和有限差分案例研究介绍了 10 系列架构以及优化技巧) 2008 年 NVISION 大会上的所有演示文稿 ISC 2008 案例研究: 计算流体动力学 (CFD) CUDA 咨询与培训服务 Acceleware 专业服务 Stone Ridge 技术 Wipro 全球咨询服务 Dobbs 博士文章系列 CUDA - 面向大众的超级计算: 第 1 部分 : CUDA 让你能够使用熟悉的编程概念 CUDA - 面向大众的超级计算: 第 2 部分 : 首例程序内核 CUDA - 面向大众的超级计算: 第 3 部分 : 错误处理以及全局存储器的性能限制 CUDA - 面向大众的超级计算: 第 4 部分 : 了解和使用共享存储器 (1) CUDA - 面向大众的超级计算: 第 5 部分 : 了解和使用共享存储器 (2) CUDA - 面向大众的超级计算: 第 6 部分 : 全局存储器与 CUDA 评估工具 CUDA - 面向大众的超级计算: 第 7 部分 : 利用下一代 CUDA 硬件增添乐趣 CUDA - 面向大众的超级计算: 第 8 部分 : 在 CUDA 中使用库 CUDA - 面向大众的超级计算: 第 9 部分 : 在 CUDA 中扩展高级语言 CUDA - 面向大众的超级计算: 第 10 部分 : 强大的数据并行 CUDA 库 CUDPP CUDA - 面向大众的超级计算: 第 11 部分 : 再次访问 CUDA 存储器空间 CUDA - 面向大众的超级计算: 第 12 部分 : CUDA 2.2 改变数据传送规范 CUDA - 面向大众的超级计算: 第 13 部分 : 在 CUDA 中使用纹理存储器 CUDA - 面向大众的超级计算: 第 14 部分 : 调试 CUDA 与使用 CUDA-GDB CUDA - 面向大众的超级计算: 第 15 部分 : 在 CUDA 和 OpenGL 中使用像素缓冲区对象 CUDA - 面向大众的超级计算: 第 16 部分 : CUDA 3.0 可提供扩展功能 CUDA - 面向大众的超级计算: 第 17 部分 : CUDA 3.0 可提供扩展功能，使开发变得更轻松 CUDA - 面向大众的超级计算: 第 18 部分 : 在 CUDA 和 OpenGL 中使用顶点缓冲区对象 CUDA - 面向大众的超级计算: 第 19 部分 : Parallel Nsight 第 1 部分: 配置和调试应用程序 CUDA - 面向大众的超级计算: 第 20 部分 : Parallel Nsight 第 2 部分: 使用 Parallel Nsight 分析功能 CUDA - 面向大众的超级计算: 第 21 部分 : Fermi 架构与 CUDA; 个人分类: 异构并行|2100 次阅读|0 个评论

基于GPU流处理单元的MrBayes并行加速: hypermarket 2015-11-20 10:01; 随着用于分子系统发育推断数据量的增大，尤其是基于基因组和转录组的大数据时代的来临，越来越多研究者面临计算周期延长的挑战，因此人们提出了一些通过并行计算进行加速的解决方案。其中最典型的要数基于多CPU的解决方案，这种解决方案出现的时间最早、最成熟，目前也都受主流系统发育重建软件的支持。不过其局限性也比较明显，例如使用的灵活性、能源消耗率、对后验概率算法的加速效果等。基于这些考虑，我们设想，能否基于Nvidia显卡的GPU硬件、在CUDA计算环境下，实现在单机上对MrBayes的加速？在南开大学计算机与控制工程学院刘晓光和王刚教授研究组的努力下，这一设想得到了比较好的实现。在2011年实现的对于核酸数据的加速水平为15-20倍的样子，矩阵越大加速比越高，最高可达40倍以上。这意味着原先需要2-3个星期的计算可以被缩短到1天之内。在2013年，对于核酸数据数据的加速比被提高到63-170倍，这意味着原先需要9个星期的计算可以被缩短到1天之内。在2015年，实现了对于蛋白质数据的加速。目前，对于MrBayes中形态和二级结构等其它数据的加速改造没有进行过尝试，由于这方面的数据量增长潜力不大，对适用于这些数据进行加速改造的意义不大。此外，目前对于核酸-蛋白质混合数据的加速改造尚未实现，还需要更多尝试和努力。参考文献 Zhou J-F, *Liu X-G, Stones DS, Xie Q, Wang G. 2011. MrBayes on a Graphics Processing Unit. Bioinformatics 27: 1255-1261. http://bioinformatics.oxfordjournals.org/content/27/9/1255.full.pdf+html Bao J, Xia H, Zhou J-F, *Liu X, *Wang G. 2013. Efficient Implementation of MrBayes on Multi-GPU. Mol. Biol. Evol. 30: 1471-1479. http://mbe.oxfordjournals.org/content/30/6/1471.full.pdf+html Pang S, Stones RJ, *Ren MM, Liu XG, Wang G, Xia HJ, Wu HY, Liu Y, Xie Q. 2015. GPU MrBayes V3.1: MrBayes on Graphics Processing Units for Protein Sequence Data. Mol. Biol. Evol. doi: 10.1093/molbev/msv129, http://mbe.oxfordjournals.org/content/early/2015/05/25/molbev.msv129.abstract?papetoc; 4782 次阅读|0 个评论

CudaPre3D: An Alternative Preprocessing Algorithm: meigang 2015-6-19 23:06; CudaPre3D: An Alternative Preprocessing Algorithm for Accelerating 3D Convex Hull Computation on the GPU Abstract In the calculating of convex hulls for point sets, a preprocessing procedure that is to filter the input points by discarding non-extreme points is commonly used to improve the computational efficiency. We previously proposed a quite straightforward preprocessing approach for accelerating 2D convex hull computation on the GPU. In this paper, we extend that algorithm to being used in 3D cases. The basic ideas behind these two preprocessing algorithms are similar: first, several groups of extreme points are found according to the original set of input points and several rotated versions of the input set; then, a convex polyhedron / hull is created using the found extreme points; and finally those interior points that locate inside the formed convex polyhedron / hull are discarded. Experimental results show that: when employing the proposed preprocessing algorithm, it achieves the speedups of about 4x on average and 5x ~ 6x in the best cases over the cases where the proposed approach is not used. In addition, more than 95% input points can be discarded in most experimental tests. https://www.researchgate.net/publication/275887547_CudaPre3D_An_Alternative_Preprocessing_Algorithm_for_Accelerating_3D_Convex_Hull_Computation_on_the_GPU; 2517 次阅读|0 个评论

[转载]【GPU开发笔记】CUDA初探——查询设备: 热度 1 lemoncyb 2015-1-20 09:24; CUDA编程主要做的就是和GPU打交道，在和这样的一个陌生的家伙交流之前，我们需要做的就是先得认识和熟悉这个家伙。在深入研究如何编写设备代码之前，我们需要通过某种机制来判断计算机中当前有哪些设备，以及每个设备都支持哪些功能。幸运的是，可以通过一个非常简单的接口来获得这种信息。首先，我们希望知道在系统中有多少个设备是支持CUDA架构的，并且这些设备能够运行基于CUDA C编写的核函数。要获得CUDA设备的数量．可以调用cudaGetDeviceCount（）。这个函数的作用从色的名字就可以看出来。在调用cudaGetDeviceCount（）后，可以对每个设备进行迭代、井查询各个设备的相关信息。CUDA运行时将返回一个cudaDevice Prop类型的结构，其中包含了设备的相关属性。我们可以获得哪些属性？从CUDA 3.0开始，在cudaDeviceProp结构中包含了以下信息： view plain copy struct cudaDeviceProp { char name ; //标识设备的ASCII字符串 size_t totalGlobalMem; //设备上全局内存的总量，单位为字节 size_t sharedMemPerBlock; //在一个线程块（Block）中可使用的共享内存总量，单位为字节 int regsPerBlock; //每个线程块中可用的32位寄存器数量 int warpSize; //在一个线程束（warp）中包含的线程数量 size_t memPitch; // 在内存复制中最大的修正量（Pitch），单位为字节 int maxThreadsPerBlock; //在一个线程块中包含的最大线程数目 int maxThreadsDim ; //在多维线程块数组中，每一维包含的最大线程数量 int maxGridSize ; //在一个线程格（Grid）中，每一维可以包含的线程块的数量 size_t totalConstMem; //常量内存的总量 int major; //设备计算功能集的主版本号 int minor; //设备计算功能集的次版本号 int clockRate; // size_t textureAlignment; //设备的纹理对齐要求 int deviceoverlap; //一个布尔类型值，表示设备是否可以同时执行一个cudaMemory()调用和一个核函数调用 int multiProcessorCount; //设备上多处理器的数量 int kernelExecTimeoutEnabled; //一个布尔值，表示该设备上执行的核函数是否存在运行时限制 int integrated; //一个布尔值，表示设备是否是一个集成的GPU int canMapHostMemory; //一个布尔值，表示设备是否将主机内存映射到cuda设备地址空间 int computeMode; //表示设备的计算模式：默认，独占或禁止 int maxTexture1D; //一维纹理的最大大小 int maxTexture2D ; //二维纹理的最大维数 int maxTexture3D ; //三维纹理的最大维数 int maxTexture2DArray ; //二维纹理数组的最大维数 int concurrentKernels ; //一个布尔值，表示设备是否支持在同一个上下文中同时执行多个核函数 }; 设备属性的使用通过上面的结构体，我们大致了解了设备的属性，然后我们就可以通过这个结构体来查询设备属性了。可能会有人问，到底我们需要这些设备属性来干嘛，别着急，以后在编写相关性能优化的代码的时候，就知道了解这些属性的好处了。现在我们只需要知道方法就可以了。首先我们可以通过两个函数，第一个就是上面的cudaGetDeviceCount（）来选择设备，然后循环地通过getDeviceProperties（）来获得设备的属性，之后我们就可以通过这样的一个结构体变量将设备的属性值获取出来。 view plain copy #include cuda_runtime.h #include iostream using namespace std; int main() { cudaDeviceProp prop; int count; cudaGetDeviceCount(count); for ( int i = 0 ; i count ; i++) { cudaGetDeviceProperties(prop,i); cout the information for the device : iendl; cout name: prop.nameendl; cout the memory information for the device : iendl; cout total global memory: prop.totalGlobalMemendl; cout total constant memory: prop.totalConstMemendl; cout threads in warps: prop.warpSizeendl; cout max threads per block: prop.maxThreadsPerBlockendl; cout max threads dims: prop.maxThreadsDim prop.maxThreadsDim prop.maxThreadsDim endl; cout max grid dims: prop.maxGridSize prop.maxGridSize prop.maxGridSize endl; } return 0; } 我这边只是获取一部分的属性值，只是和大家介绍一下，具体的属性值可以按照这样的方法来获取。原文地址：http://blog.csdn.net/timmawang/article/details/10362701; 个人分类: cuda|3539 次阅读|2 个评论

RHEL 6.5 安装 cuda-toolkit 不启动图形界面: lemoncyb 2014-12-31 09:42; 在 RHEL 6.5 上安装 cuda-toolkit，提示不能启动图形服务器（X Server）。处理方法是修改 /etc/inittab 文件，将文件最后一行的“id:5:initdefault”修改为“id:3:initdefault”，再 reboot 就只进入字符界面，此时可直接安装 cuda-toolkit。安装完成后，再将 /etc/inittab 文件配置修改回来 5，再次启动就会启动图形界面。 PS：/etc/inittab 文件在内核启动后，由init 进程读取，可以控制系统运行级别： The runlevels used are: # 0 - halt (Do NOT set initdefault to this) # 1 - Single user mode # 2 - Multiuser, without NFS (The same as 3, if you do not have networking) # 3 - Full multiuser mode # 4 - unused # 5 - X11 # 6 - reboot (Do NOT set initdefault to this) 0：表示关机 1：表示单用户模式，在这个模式中，用户登录不需要密码，默认网卡驱动是不被加载，一些服务不能用。 2：表示多用户模式，NFS服务不开启 3，表示命令行模式 4，这个模式保留未用 5，表示图形用户模式 6，表示重启系统; 个人分类: linux|4157 次阅读|0 个评论

Ubuntu+CUDA6.5+Caffe安装配置汇总: 热度 3 zhuwei3014 2014-11-7 17:23; 感谢欧新宇的分享，此配置贴大部分参考他的博客。 http://ouxinyu.github.io/Blogs/20140723001.html 此贴历经坎坷，一入DL深似海啊，配个caffe玩玩足足折腾了我半个多月，就在我想放弃之时，峰回路转，成功了，其中心酸只有自己知道啊。起初安装ubuntu因为引导问题折腾了一三天左右，然后各种方式安装及引导ubuntu手到擒来，半小时搞定。然后开始在笔记本安装cuda，出现各种问题不说，最终无法进入GUI折腾了近一个星期，其间重装了不下15次系统，最后才发现好像是optimus双显卡的问题，也罢，直接卸载也不愿在笔记本上折腾了。再然后在单显卡台机上安装cuda，非常顺利，接着开始安装各种依赖库及配置环境，遇到问题疯狂google（百度真心不行啊），这样折腾了有一个星期之久，最终遇到一个连google都搜不到的问题，思考半晌，考虑放弃这么高大上的东西了。在不忍删除辛苦安装的系统用再生龙备份之余，想想试试ubuntu12.04，结果虽然遇到不少问题，都曲折的解决了，最终一天终于搞定了，一把鼻涕一把泪啊！我想应该是属于比较倒霉的，跟着别人的教程按部就班，但是每一步都出现问题，本人又是linux新手，出现问题只能google，所以浪费了太长时间。现在回头想想，好像真的没什么地方很难解决的，理应一天时间搞完的，最多有一些版本之间的冲突，真心觉得时间花费的有点不值。总之，成功了，也是醉了。。。简单介绍一下：Caffe，一种Convolutional Neural Network的工具包，和Alex的cuda-convnet功能类似，但各有特点。都是使用C++ CUDA进行底层编辑，Python进行实现，原作不属于Ubuntu 12，也有大神发布了Windows版，但其他相关资料较少，不适合新手使用，所以还是Ubuntu的比较适合新手。本人为Linux新手，安装ubuntu和cuda折腾了一个多星期，起初是因为ubuntu安装导致引导失效，中途每次都需要手动引导进入系统，然后安装cuda失败后用ultraISO制作U盘启动重新安装才恢复正常。至于安装过程可以参考： http://blog.sciencenet.cn/home.php？mod=spaceuid=1583812do=blogid=839793 一、CUDA Toolkit的安装和调试这里其实可以参考nVidia 官方提供的CUDA安装手册，全英文的，我就是参考这个文档完成后面的配置和验证工作。https://developer.nvidia.com/rdp/cuda-65-rc-toolkit-download#linux。一般要输入你的用户名和密码，就是下载6.5的那个账号。 1、Verify You Have a CUDA-Capable GPU 执行下面的操作，然后验证硬件支持GPU CUDA，只要型号存在于 https://developer.nvidia.com/cuda-gpus ，就没问题了 $ lspci | grep -i nvidia 2、Verify You Have a Supported Version of Linux $ uname -m cat /etc/*release 重点是“ x86_64 ”这一项，保证是x86架构，64bit系统 3、Verify the System Has gcc Installed $ gcc --version 4、Download the NVIDIA CUDA Toolkit 下载地址： https://developer.nvidia.com/cuda-toolkit 在根目录下新建cuda_install文件夹，把run文件放进去 mkdir cuda_install 验证地址： https://developer.nvidia.com/rdp/cuda-rc-checksums $ md5sum filename 例如：md5sum cuda_6.5.14_linux_64.run，然后与官网核对 5、安装必要的一些库和头文件文件 sudo apt-get install freeglut3-dev build-essential libx11-dev libxmu-dev libxi-dev libgl1-mesa-glx libglu1-mesa libglu1-mesa-dev 如果有依赖冲突的，建议分开安装。 6、Handle Conflicting Installation Methods 根据官网介绍，之前安装的版本都会有冲突的嫌疑所以，之前安装的Toolkit和Drievers就得卸载，屏蔽，等等（因为我是新系统，没有安装过nvidia驱动，所以此步可以省略） sudo apt-get --purge remove nvidia* 7、Graphical Interface Shutdown 退出GUI，也就是X-Win界面，操作方法是：同时按：CTRL+ALT+F1（F2-F6），切换到TTY1-6命令行模式。关闭桌面服务： $ sudo stop lightdm 8、Interaction with Nouveau 这是卡住本人将近一个星期的问题，我原来用的是笔记本，双显卡，装了不下二十次，不管按照何种方法，最终装完cuda之后图形界面就只剩下墙纸，只有鼠标可以动，进不了桌面还打不开终端，最后换了一个台式机，半天所有东西全部搞定。原来以为是nouveau过于顽固，怎么样都卸不掉，之后顿悟，可能是optimus显卡问题，默认3D渲染由nvidia独显完成，而2D渲染由intel集显完成，但是我的机子是华硕的，BIOS里面无法关闭集显（貌似thinkpad可以），所以没有进一步尝试，反正台式机环境搭好了。如果遇到以上问题，可以移步： http://wenku.baidu.com/link?url=hjEIoYx-spMxyrU-zy057bOBb4dtYUc7s6bj8CM-TTJ4-QPQTmc9KX3DQ0ZZCfhJpkar0To8y54Cc2gR8LwTOLRCQ8TS4iUUPXavaw7o2Eu 可能装了这个bumblebee显卡调节程序可能解决问题，也可以参考此贴： http://www.cnblogs.com/bsker/archive/2011/10/03/2198423.html 还有一个之前没找到的帖子，白白浪费了那么长时间。。。用prime解决这个问题 http://www.cnblogs.com/zhcncn/p/3989572.html 百度经验也有： http://jingyan.baidu.com/article/046a7b3efe8c58f9c27fa98b.html Nouveau是一个开源的显卡驱动，Ubuntu 14.04 默认安装了，但是它在nvidia驱动安装过程中会有冲突，所以要禁用它。以下是欧新宇同学的过程，反正我按照这个没有成功，大家可以试试，因为在第三步中我的boot文件夹里没有initramfs，只有initrd，重新生成initrd貌似不起作用，这就是linux新手的悲哀，出了问题完全不知道原因。如果有高人指点一下，小弟感激不尽！（1）将nouveau添加到黑名单，防止它启动 $ cd /etc/modprobe.d $ sudo vi nvidia-graphics-drivers.conf 写入：blacklist nouveau 保存并退出: wq! 检查： $ cat nvidia-graphics-drivers.conf （2）对于：/etc/default/grub，添加到末尾。 $ sudo vi /etc/default/grub 末尾写入：rdblacklist=nouveau nouveau.modeset=0 保存并退出: wq! 检查： $ cat /etc/default/grub （3）官网提供的操作：(感觉上这一小步，可以略过，不执行，执行了也会报错) $ sudo mv /boot/initramfs-$(uname -r).img /boot/initramfs-$(uname -r)-nouveau.img 然后重新生成initrd文件 $ sudo dracut /boot/initramfs-$(uname -r).img $(uname -r) $ sudo update-initramfs -u 上面那条是nVidia官方提供的命令，不知道为什么在我这里会提示dracut是不存在的命令，也许是版本问题，或者少了什么包，不过无所谓，第二条命令也可以搞定，应该是一样的功能。我试过在ubuntu12.04下安装，只要修改/etc/modprobe.d/blacklist.conf就可以解决问题，可是ubuntu14.04中这个文件是只读的，所以我就给它添加了写的权限，强制修改了。 sudo chmod +w /etc/modprobe.d/blacklist.conf sudo vi /etc/modprobe.d/blacklist.conf 在里面加入: blacklist nouveau options nouveau modeset=0 由于试过很多种方法，最终是哪种方法成功禁用了nouveau，说实话我还真不记得了，大家可以互相交流。测试nouveau是否被禁用成功很简单：（1）重启之后明显感觉画质变差（2） lsmod | grep nouveau ，如果显示为空，那么就是卸载成功了。 9、Installation 默认情况下，可以跳过显卡驱动的安装，直接安装CUDA，因为它包含了Drivers，Toolkit和Sample三个部分，但是如果出现问题，可以尝试二次安装CUDA或者利用官方的显卡驱动，来进行处理。GTX显卡驱动的下载地址如下（Tesla版的驱动，请大家自己去nVidia的官网下载）：下载地址： http://www.geforce.cn/drivers $ sudo sh ./NVIDIA-Linux-x86_64-340.24.run (Optional) 切换到cuda_6.5.14_linux_64.run 所在的目录，然后执行安装命令： $sudo cd cuda_install $ sudo sh cuda_6.5.11_rc_linux_64.run 再次提醒，安装前一定要执行 md5sum ，如果不一样会导致安装的Sumary里显示Driver成功，Toolkit和Samples失败，需要重新下载run文件。这里会一路问你各种问题，基本上就是Accept-yes-Enter-yes-Enter-yes-Enter，接受协议，安装的默认位置确认。 10、驱动装完了，可以回到GUI界面了 $ sudo start lightdm （在这里又出现问题，开机重启后进不了GUI，估计是显卡版本有问题，本机配置Nvidia Quadro K600显卡，官网下载专用驱动，按以上步骤重新安装，在CUDA安装过程中的第一步提示是否安装显卡驱动选择no）检查显卡是否安装成功可以用命令 sudo apt-get install mesa-utils glxinfo | grep -i nvidia 11、POST-INSTALLATION ACTIONS 这一步就是验证一下安装是否正确，编译和完成以下CUDA自带的程序，建议做一下~ （1）Environment Setup $ export PATH=/usr/local/cuda-6.5/bin:$PATH $ export LD_LIBRARY_PATH=/usr/local/cuda-6.5/lib64:$LD_LIBRARY_PATH 环境变量配置完，使用nvcc -V命令检查cuda是否安装正确，这里开始使用普通用户操作，始终提示没有安装cuda toolkit，最后改到root用户下就显示成功了。（2）(Optional) Install Writable Samples $ cuda-install-samples-6.5.sh 安装到Home下，搞定了之后可以在GUI下调整一下，主要是前面的要求，会有一个Sample的文件夹 NVIDIA_CUDA-6.5_Samples在Home的根目录下就ok了。因为后面编译测试各方面什么的方便。其实如果之前安装CUDA驱动和Toolkit一切正常，这一步基本可以省略，应该会自动建立，但检查一下无妨。（3）Verify the Installation a. 验证驱动的版本，其实主要是保证驱动程序已经安装正常了 $ cat /proc/driver/nvidia/version b. Compiling the Examples $ nvcc -V 不出意外的话应该会提示，nvcc没有安装，其实就是，nvidia-cuda-toolkit的编译器没有安装完整，总之，根据提示继续就好了 $ sudo apt-get install nvidia-cuda-toolkit 这里安装完，就可以编译了，切换目录到~/NVIDIA_CUDA-6.5_Samples： $ cd /home/username/NVIDIA_CUDA-6.5_Samples $ make c. Running the Binaries 运行编译好的文件，例如看看设备的基本信息和带宽信息： $ cd /bin/x86_64/linux/release $ ./deviceQuery $ ./bandwidthTest PS：如果测试的时候出现说运行版驱动和实际驱动不符，原因可能是因为后面安装的nvidia-cuda-toolkit更新了配置文件，所以和原始的Cuda-Samples的配置或者是驱动程序有变化，所以检测无法编译通过。考虑下面的解决方法：（1）卸载现有驱动 $ sudo nvidia-installer --uninstall （2）下载合适版本的驱动，并安装：下载地址： http://www.geforce.cn/drivers $ sudo sh ./NVIDIA-Linux-x86_64-340.24.run （3）重装CUDA Toolkit $ sudo sh cuda_6.5.11_rc_linux_64.run Nvidia Cuda安装结束二、Caffe的安装和测试对于Caffe的安装严格遵照官网的要求来： http://caffe.berkeleyvision.org/installation.html 1、安装BLAS 这里可以选择（ATLAS，MKL或者OpenBLAS），我这里使用MKL，首先下载并安装英特尔® 数学内核库 Linux* 版MKL，下载链接是： https://software.intel.com/en-us/intel-education-offerings ，可以下载Student版的，先申请，然后会立马收到一个邮件（里面有安装序列号），打开照着下载就行了。下载完之后，要把文件解压到home文件夹，或者其他的ext4的文件系统中。接下来是安装过程，先授权，然后安装： $ tar zxvf parallel_studio_xe_2015.tgz （如果你是直接拷贝压缩文件过来的） $ chmod a+x /home/username/ parallel_studio_xe_2015 -R $cd parallel_studio_xe_2015 $ sudo ./install_GUI.sh 然后进入图形安装模式，跟windows差不多，其中序列号就是邮箱发过来的那个。这里使用root权限安装。 $ sudo passwd root 2、MKL与CUDA的环境设置文件夹切换到/etc/ld.so.conf.d，并进行如下操作（1）新建intel_mkl.conf，并编辑之： $ cd /etc/ld.so.conf.d $ sudo vi intel_mkl.conf 加入：/opt/intel/lib/intel64 /opt/intel/mkl/lib/intel64 （2）新建cuda.conf，并编辑之： $ sudo vi cuda.conf 加入：/usr/local/cuda/lib64 /lib （3）完成lib文件的链接操作，执行： $ sudo ldconfig -v （这里我按照这样的方法最终编译出现cblas找不到的问题，应该是MKL安装有问题，但是又没办法解决，最终我就按照官网的方法安装了ATLAS sudo apt-get install libatlas-base-dev 一句话就搞定，虽然性能可能比不上MKL，但是将就着能用就行。） 3、安装OpenCV （1）这里我用他的方法发现报错，所以按照依赖包以正常方式安装 sudo apt-get install build-essential libgtk2.0-dev libavcodec-dev libavformat-dev libjpeg62-dev libtiff4-dev cmake libswscale-dev libjasper-dev 这里libtiff4-dev出现依赖错误，于是分开安装就解决了。（2）根据官网提示，还要安装python，于是： sudo apt-get install python-pip sudo apt-get install python-dev sudo apt-get install python-numpy （3）下载官网opencv压缩包，我下载的是opencv-3.0.0-alpha.zip，移动到主目录下，解压： unzip opencv-3.0.0-alpha 然后执行以下命令： cd opencv-3.0.0-alpha mkdir release cmake -D CMAKE_BUILD_TYPE=RELEASE -D CMAKE_INSTALL_PREFIX=/usr/local .. (这里可能会遇到CMakeList.txt找不到的问题，把“..”换成CMakeList.txt的所在目录opencv-3.0.0-alpha就可以了） make sudo make install 这个过程时间比较久，耐心等待。。。下面配置library，打开/etc/ld.so.conf.d/opencv.conf，加入/usr/local/lib： sudo su vi /etc/ld.so.conf.d/opencv.conf sudo ldconfig -v 然后更改变量： sudo gedit /etc/bash.bashrc 添加： PKG_CONFIG_PATH=$PKG_CONFIG_PATH:/usr/local/lib/pkgconfig export PKG_CONFIG_PATH 至此opencv安装配置完成，最后随便写个hello.cpp，包含#include opencv2/core/core.hpp进行测试，在命令行输入： g++ hello.cpp -o hello `pkg-config --cflags --libs opencv` 编译不报错就说明配置正确，其中有个问题弄了半天要注意，这个命令中的单引号不是平常的单引号，而是键盘上tab键上面那个符号。（在14.04下opencv安装还算顺利，但是后来转到12.04下安装opencv-3.0.0出现一堆错误，折腾了很长时间，最终换成opencv-2.4.9，很快解决中途可能会遇到这个错误： opencv- 2.4 . 9 /modules/gpu/src/nvidia/core/NCVPixelOperations.hpp( 51 ): error: a storage class is not allowed in an explicit specialization 参考 http://code.opencv.org/issues/3814 ，重新下载 NCVPixelOperations.hpp 取代opencv2.4.9中的即可。如果遇到这个错误，参考 http://www.foreverlee.net/ /usr/bin/ld: cannot find -lcufft /usr/bin/ld: cannot find -lnpps /usr/bin/ld: cannot find -lnppi /usr/bin/ld: cannot find -lnppc /usr/bin/ld: cannot find -lcudart 编译命令改为 g++ -L /usr/local/cuda/lib64/ hello.cpp -o hello `pkg-config --cflags --libs opencv` ） 4、安装其他依赖项（1） Google Logging Library（glog），下载地址： https://code.google.com/p/google-glog/ ，然后解压安装： $ tar zxvf glog-0.3.3.tar.gz $cd glog-0.3.3 $ ./configure $ make $ sudo make install 如过没有权限就chmod a+x glog-0.3.3 -R , 或者索性 chmod 777 glog-0.3.3 -R , 装完之后，这个文件夹就可以kill了。（历经坎坷，最终ubuntu14.04由于不知名错误实在无法解决，投入到了ubuntu12.02的怀抱。这里需要安装另外两个依赖项：gflags、lmdb。不装之后编译会出问题。参考： http://www.shwley.com/index.php/archives/52/ # glog wget https://google-glog.googlecode.com/files/glog- 0.3 .3 .tar.gz tar zxvf glog- 0.3 .3 .tar.gz cd glog- 0.3 .3 ./configure make make install # gflags wget https://github.com/schuhschuh/gflags/archive/master.zip unzip master.zip cd gflags-master mkdir build cd build export CXXFLAGS = -fPIC cmake .. make VERBOSE = 1 make sudo make install # lmdb git clone git://gitorious.org/mdb/mdb.git cd mdb/libraries/liblmdb make make install ）（2）其他依赖项，确保都成功 $ sudo apt-get install libprotobuf-dev libleveldb-dev libsnappy-dev libopencv-dev libboost-all-dev libhdf5-serial-dev 如果安装过程中出现错误，E: Sub-process /usr/bin/dpkg returned an error code (1)，可能是因为sudo apt-get install出现到意外，不用着急，可以试试这个解决办法：（我没有遇到这个问题） $ cd /var/lib/dpkg $ sudo mv info info.bak $ sudo mkdir info $ sudo apt-get --reinstall install libprotobuf-dev libleveldb-dev libsnappy-dev libopencv-dev libboost-all-dev libhdf5-serial-dev 如果使用的是2014年9月之后的新版Caffe,对于ubuntu 14.04来说，需要安装以下依赖文件： $ sudo apt-get install libgflags-dev libgoogle-glog-dev liblmdb-dev protobuf-compiler 5、安装Caffe并测试 1. 切换到Caffe的下载文件夹，然后执行： $ cp Makefile.config.example Makefile.config 修改新生成的Makefile.config文件，修改“ BLAS := mkl ”（我这里装的就是ATLAS，所以不用改，使用默认配置就行）。希望使用nVidia开发的cuDNN来加速Caffe模型运算的同学，在安装完cuDNN之后，确保Makefile.config文件中的 USE_CUDNN := 1 处于启用状态。幸运的是，新版的Caffe已经默认集成了cuDNN的库文件，不需要做额外的设置了。 cuDNN的安装方法如下： cuDNN Introdution and Download 下载cuDNN之后解压，进入解压后的文件夹： $ sudo cp cudnn.h /usr/local/include $ sudo cp libcudnn.so /usr/local/lib $ sudo cp libcudnn.so.6.5 /usr/local/lib $ sudo cp libcudnn.so.6.5.18 /usr/local/lib 链接cuDNN的库文件 $ sudo ln -sf /usr/local/lib/libcudnn.so.6.5.18 /usr/local/lib/libcudnn.so.6.5 不做链接，可能会出现这个报错：“./build/tools/caffe: error while loading shared libraries: libcudnn.so.6.5: cannot open shared object file: No such file or directory”那是因为cuDNN没有链接成功，只能做一下硬链接。下面可以编译caffe-master了！！！ $ make all $ make test $ make runtest 这里出现libcudnn.so.6.5：cannot open shared object file，查看LD_LIBRARY_PATH发现环境变量没问题，折腾了半天发现cuda的配置文件没有加进去，就是上面安装MKL时候的cuda.conf忘了写了。错误Fixed： 1. 如果提示： make: protoc: 命令未找到，那是因为protoc没有安装，安装一下就好了。 $ sudo apt-get install protobuf-c-compiler protobuf-compiler 2. (该问题已经在9月以后的Caffe中得到作者修复)提示“src/caffe/util/math_functions.cu(140): error: calling a host function(std::signbit ) from a globalfunction(caffe::sgnbit_kernel ) is not allowed” 解决办法：修改 ./include/caffe/util/math_functions.hpp 224行删除(注释)：using std::signbit; 修改：DEFINE_CAFFE_CPU_UNARY_FUNC(sgnbit, y = signbit(x )); 为：DEFINE_CAFFE_CPU_UNARY_FUNC(sgnbit, y = std::signbit(x )); 得到作者，大神Yangqing Jia的回复，解决方法如上，没有二致。六、使用MNIST数据集进行测试 Caffe默认情况会安装在$CAFFE_ROOT，就是解压到那个目录，例如：$ home/username/caffe-master，所以下面的工作，默认已经切换到了该工作目录。下面的工作主要是，用于测试Caffe是否工作正常，不做详细评估。具体设置请参考官网：http://caffe.berkeleyvision.org/gathered/examples/mnist.html 1. 数据预处理可以用下载好的数据集，也可以重新下载，我网速快，这里就偷懒直接下载了，具体操作如下： $ cd data/mnist $ sudo sh ./get_mnist.sh 2. 重建LDB文件，就是处理二进制数据集为Caffe识别的数据集，以后所有的数据，包括jpe文件都要处理成这个格式 $ cd examples/mnist $ sudo sh ./create_mnist.sh 生成mnist-train-leveldb/ 和 mnist-test-leveldb/文件夹，这里包含了LDB格式的数据集 PS: 这里可能会遇到一个报错信息： Creating lmdb... ./create_mnist.sh: 16: ./create_mnist.sh: build/examples/mnist/convert_mnist_data.bin: not found 解决方法是，直接到Caffe-master的根目录执行，实际上新版的Caffe，基本上都得从根目录执行。 ~/caffe-master$ sudo sh examples/mnist/create_mnist.sh 3. 训练mnist $ sudo sh examples/mnist/train_lenet.sh 至此，Caffe安装的所有步骤完结，下面是一组简单的数据对比，实验来源于MNIST数据集，主要是考察一下不同系统下CPU和GPU的性能。可以看到明显的差别了，虽然MNIST数据集很简单，相信复杂得数据集，差别会更大，Ubuntu+GPU是唯一的选择了。测试平台：i7-4770K/16G/GTX 770/CUDA 6.5 MNIST Windows8.1 on CPU：620s MNIST Windows8.1 on GPU：190s MNIST Ubuntu 14.04 on CPU：270s MNIST Ubuntu 14.04 on GPU：160s MNIST Ubuntu 14.04 on GPU with cuDNN：35s Cifar10_full on GPU wihtout cuDNN：73m45s = 4428s　（Iteration 70000） Cifar10_full on GPU with cuDNN：20m7s = 1207s　（Iteration 70000）; 个人分类: Caffe|39721 次阅读|4 个评论

WIN7 X64下CUDA5.5+VS2010工作环境配置: Hooope 2014-1-28 21:17; 第一篇文章，科学网的开始~配置工作环境总是很麻烦，参考了许多网上的帖子，自己终于走通了，这篇博客算一个总结。另外还附带了CUDA和VS一些常见的问题及解决方案。 ] 1.下载安装CUDA5.5 installer 以前在实验室用的版本较早，需要分别下几个安装包本别安装。现在NVIDIA把安装程序都集成在一起了，下载链接： https://developer.nvidia.com/cuda-downloads 需要先安装VS然后再安装CUDA。在安装时要特别注意选择自定义安装，因为精简安装不会安装SDK，会给后面的工作带不少麻烦。安装位置要记住，后面会用得上。建议按默认路径安装。 CUDA安装好后会自动添加以下路径： CUDA_PATH_V5_5 = C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v5.5 CUDA_PATH = C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v5.5 为了方便，还可以添加以下路径： 2.新建第一个CUDA程序！启动VS2010，新建个空的WIN32控制台程序。建好后在解决方案管理器中右击源文件，新建一个CUDA源文件，如图所示。选择编译器。右击刚建立的源文件，选择属性，在项类型中选择CUDA C/C++编译器。在解决方案管理器中右击工程，进入属性-连接器-常规-附加库目录，添加一个新路径：$(cuda_path_v_5_5)\lib\$(platform) 在解决方案管理器中右击工程，进入属性-连接器-输入，在右侧附加依赖项一栏中添加cudart.lib：工具-选项-文本编辑器-文件扩展名，在右侧的下拉菜单中选择VC++编译器，并分别添加扩展名cu和cuh 至此，VS应该可以成功编译CUDA代码了。可以通过一个简单的向量求和的程序验证。在新建的cu文件中粘贴如下代码：语法高亮显示前两步设置好后应该就可以成功编译CUDA代码了。但是这时候VS和 VAssistX 并不识别CUDA的语法，编译器里头的CUDA代码全是黑压压的，还带着不少红杠杠，需要进一步的设置。具体方法参考链接： http://blog.csdn.net/augusdi/article/details/12205435 3.常见问题 nsight使用的8000端口经常会出现被占用的情况，可以通过修改nsight的通信端口解决。打开你nsight monitor，在任务栏右侧找到它的图标，右击，选择Option-general-connection，便可以修改通信端口了。修改端口后需要重启monitor才能生效。此外，还需要再VS中做对应的修改。在VS中进入Nsight-options-general-default connection port，做同样的修改。无法打开cuda_runtime.h等头文件的解决方法：先在计算机中搜索cuda_runtime.h，找到该文件的位置，然后再将其路径添加到VC++目录中的inc中。 4.VS使用笔记几个常用且好用的快捷键： Ctrl-K Ctrl-F 自动对齐选中代码 F7 生成解决方案 F9 添加断点/清除断点 Ctrl-Shift-F9 清除所有断点 F5 调试/运行至下一个断点处 ctrl-F5 运行程序 shift-F5 停止调试 F10 step F11 step in Shift-F11 step out Ctrl-K + Ctrl-C： comment一段选择代码 Ctrl-K + Ctrl-U： uncomment一段选择代码 VS默认不会在窗口中显示代码行数，解决方法：工具-选项-文本编辑器-C/C++-显示将行号选中，便可以在VS中显示行数了~; 个人分类: CUDA|8774 次阅读|0 个评论

CUDA笔记: meigang 2013-9-16 04:21; Dynamic parallelism is a new feature of CUDA 5.0 for GPUs with compute capability 3.5, allowing to launch kernels directly from other kernels. It promises to further speedup applications by better handling computing workloads at runtime directly on the GPU; that avoids CPU/GPU interactions with benefits to mechanisms like recursion. To use dynamic parallelism in Visual Studio 2010, do the following: 1) View - Property Pages 2) Configuration Properties - CUDA C/C++ - Common - - Generate Relocatable Device Code - Yes (-rdc=true) 3) Configuration Properties - CUDA C/C++ - Device - - Code Generation - compute_35, sm_35 4) Configuration Properties - Linker - Input - - Additional Dependencies - cudadevrt.lib Thrust: https://github.com/thrust/thrust/wiki/Quick-Start-Guide http://thrust.github.io/ http://thrust.github.io/doc/modules.html CUDPP: http://cudpp.github.io/ 转自：http://download.csdn.net/download/anson2004110/5912747 VS2010 CUDA 5.5 Win7 64位配置 VS2010+CUDA+5.5+Win7+64位配置以及项目创建配置.docx 错误：thrust::system::system_error in transform_reduce I'm using VS2010 and when it breaks at the errors it points to the following in the dbgheap.c file. __finally { /* unlock the heap */ _munlock(_HEAP_LOCK); } I forgot to adjust the Properties of the project to my CUDA card compute capability Configuration Properties CUDA C\C++ Device Code Generation change compute_10,sm_10 to your GPU compute capability For Nvidia card with 2.1 compute capability it will be compute_20,sm_21 CUDA Thread Indexing Sample code simpleIndexing.cu 1D grid of 1D blocks __device__ int getGlobalIdx_1D_1D() { return blockIdx.x *blockDim.x + threadIdx.x; } 1D grid of 2D blocks __device__ int getGlobalIdx_1D_2D() { return blockIdx.x * blockDim.x * blockDim.y + threadIdx.y * blockDim.x + threadIdx.x; } 1D grid of 3D blocks __device__ int getGlobalIdx_1D_3D() { return blockIdx.x * blockDim.x * blockDim.y * blockDim.z + threadIdx.z * blockDim.y * blockDim.x + threadIdx.y * blockDim.x + threadIdx.x; } { return blockIdx.x * blockDim.x * blockDim.y * blockDim.z + threadIdx.z * blockDim.y * blockDim.x + threadIdx.y * blockDim.x + threadIdx.x; } 2D grid of 1D blocks __device__ int getGlobalIdx_2D_1D() { int blockId = blockIdx.y * gridDim.x + blockIdx.x; int threadId = blockId * blockDim.x + threadIdx.x; return threadId; } { int blockId = blockIdx.y * gridDim.x + blockIdx.x; int threadId = blockId * blockDim.x + threadIdx.x; return threadId; } 2D grid of 2D blocks __device__ int getGlobalIdx_2D_2D() { int blockId = blockIdx.x + blockIdx.y * gridDim.x; int threadId = blockId * (blockDim.x * blockDim.y) + (threadIdx.y * blockDim.x) + threadIdx.x; return threadId; } 2D grid of 3D blocks __device__ int getGlobalIdx_2D_3D() { int blockId = blockIdx.x + blockIdx.y * gridDim.x; int threadId = blockId * (blockDim.x * blockDim.y * blockDim.z) + (threadIdx.z * (blockDim.x * blockDim.y)) + (threadIdx.y * blockDim.x) + threadIdx.x; return threadId; } 3D grid of 1D blocks __device__ int getGlobalIdx_3D_1D() { int blockId = blockIdx.x + blockIdx.y * gridDim.x + gridDim.x * gridDim.y * blockIdx.z; int threadId = blockId * blockDim.x + threadIdx.x; return threadId; } 3D grid of 2D blocks __device__ int getGlobalIdx_3D_2D() { int blockId = blockIdx.x + blockIdx.y * gridDim.x + gridDim.x * gridDim.y * blockIdx.z; int threadId = blockId * (blockDim.x * blockDim.y) + (threadIdx.y * blockDim.x) + threadIdx.x; return threadId; } 3D grid of 3D blocks __device__ int getGlobalIdx_3D_3D() { int blockId = blockIdx.x + blockIdx.y * gridDim.x + gridDim.x * gridDim.y * blockIdx.z; int threadId = blockId * (blockDim.x * blockDim.y * blockDim.z) + (threadIdx.z * (blockDim.x * blockDim.y)) + (threadIdx.y * blockDim.x) + threadIdx.x; return threadId; }; 5350 次阅读|0 个评论

《CUDA中如何选择Block的尺寸》我写的一篇技术文档: sume 2013-6-1 10:59; 介绍了如何在CUDA编程中设置block的尺寸，给出了计算示例。下载地址 http://ishare.iask.sina.com.cn/f/36767710.html http://wenku.baidu.com/view/611aa63883c4bb4cf7ecd1c5.html; 2730 次阅读|0 个评论

[转载]Ubuntu 12.04安装CUDA5.0: xuyichao 2013-5-24 18:36; 近来需要用到GPU做运算，大多参考英文资料，发现一个中文的博文，虽然不是同样的机型，但仍然很具有参考价值。原文地址： http://blog.csdn.net/lucktroy/article/details/8445854 笔记本型号： Lenovo Y570 显卡： NVIDIA GeForce GT 555M 系统： Ubuntu 12.04 第一步，因为是Y570笔记本，有个bug，得先纠正，具体如下： view plain copy print ? sudo apt-get install git git clone git://github.com/Bumblebee-Project/bbswitch.git -b hack-lenovo cd bbswitch sudo mkdir /usr/src/acpi-handle-hack-0.0.1 sudo cp Makefile acpi-handle-hack.c /usr/src/acpi-handle-hack-0.0.1 sudo cp dkms/acpi-handle-hack.conf /usr/src/acpi-handle-hack-0.0.1/dkms.conf sudo dkms add acpi-handle-hack/0.0.1 sudo dkms build acpi-handle-hack/0.0.1 sudo dkms install acpi-handle-hack/0.0.1 sudo echo acpi-handle-hack | sudo tee -a /etc/modules sudo update-initramfs -u 第二步，就安装bumblebee 3.0 view plain copy print ? sudo add-apt-repository ppa:bumblebee/stable #If you are on Ubuntu 11.04 or older and want newer drivers (recommended) than the ones available in the official repos, run: sudo add-apt-repository ppa:ubuntu-x-swat/x-updates sudo apt-get update #To install Bumblebee using the proprietary nvidia driver: sudo apt-get install bumblebee bumblebee-nvidia 第三步，用xrandr设置分辨率 view plain copy print ? xrandr xrandr --output LVDS1 --mode 1360x768 第四步，测试 view plain copy print ? optirun glxspheres 第五步，安装CUDA5.0 5.1 安装所需的工具 view plain copy print ? sudo apt-get install freeglut3-dev build-essential libx11-dev libxmu-dev libxi-dev libgl1-mesa-glx libglu1-mesa libglu1-mesa-dev 5.2 Blacklist所需的模块 view plain copy print ? sudo vi /etc/modprobe.d/blacklist.conf #在最后添加 blacklist amd76x_edac blacklist vga16fb blacklist nouveau blacklist rivafb blacklist nvidiafb blacklist rivatv view plain copy print ? 5.3 卸载已有的 nvidia view plain copy print ? sudo apt-get remove --purge nvidia* 5.4 切换到命令模式：Ctrl+Alt+F1 (1) 关闭lightdm view plain copy print ? sudo /etc/init.d/lightdm stop (2) 运行下载的 NVIDIA*.run view plain copy print ? sudo sh NVIDIA*.run 5.5 设置环境变量 view plain copy print ? #32 bit systems - export PATH =$PATH:/usr/local/cuda-5.0/bin export LD_LIBRARY_PATH =/usr/local/cuda-5.0/lib #64 bit systems - export PATH =$PATH:/usr/local/cuda-5.0/bin export LD_LIBRARY_PATH =/usr/local/cuda-5.0/lib64:/lib 5.6 安装CUDA Sample有可能会失败，如果遇到这种情况, 按照下面的修改，之后重复5.4的操作 view plain copy print ? sudo find /usr -name libglut\* 显示结果： /usr/lib/x86_64-linux-gnu/libglut.so.3 /usr/lib/x86_64-linux-gnu/libglut.so.3.9.0 /usr/lib/x86_64-linux-gnu/libglut.a /usr/lib/x86_64-linux-gnu/libglut.so 解决方法： sudo ln -s /usr/lib/x86_64-linux-gnu/libglut.so.3 /usr/lib/libglut.so 参考： Lenove补丁安装 http://ubuntuforums.org/showthread.php?t=2036010page=2 Bumblebee 3.0 http://bumblebee-project.org/install.html xrandr使用 http://www.2cto.com/os/201202/121012.html CUDA5.0安装 http://sn0v.wordpress.com/2012/12/07/installing-cuda-5-on-ubuntu-12-04/; 3313 次阅读|0 个评论

[转载]如何用显卡做科学计算:CUDA试用体验: CTB11 2013-5-10 16:31; 原始连接 http://blog.sina.com.cn/s/blog_c1fccc770101do3q.html 欢迎大家访问博客 http://blog.sina.com.cn/u/3254570103 CUDA是NVIDA公司提出的一种并行计算的结构,能够应用到科学计算的各个领域.笔者的电脑正好有一块支持CUDA的显卡,于是正好试用了一下. 1.CUDA的下载和配置: 关于支持CUDA的GPU型号，最好在NVIDA英文官网上确认，像笔者的Geforce530GT（OEM）就没有出现在中文官网支持CUDA型号的列表里。笔者的环境是64bitWin7系统和64bit VS2008。下载可以在这个界面完成 https://developer.nvidia.com/cuda-downloads . 下载的5.0 production release比较全，包括了原来的SDK等。正常安装即可，最好保证电脑安装了最新的clean的NVIDA显卡驱动。然后就可以在VS的新建project中找到CUDA项目了。CUDA安装包中包括了很多example，可以在C:\ProgramData\NVIDIA Corporation\CUDA Samples 中找到它们。关于高亮显示可以把C:\ProgramData\NVIDIA Corporation\CUDA Samples\v5.0\doc\syntax_highlighting\visual_studio_7的usertype.dat文件拷贝到\Microsoft Visual Studio 9.0\Common7\IDE目录下。也别忘了在Tools-Options-Text Editor-File Extension。在“Extension：”处添加(ADD).cu 2.CUDA的基本概念可以在 http://cudazone.nvidia.cn/cuda-education-training/ 找到很多资料。作者推荐Stanford大学的CS193G课程。相关ppt可以在上个网址的链接中找到。笔者在这个只介绍一点最基本的概念.GPU运算的最基本单位是thread，thread可以构成一个运算的阵列block，block可以进一步构成grid。函数定义: __global__void my_kernel() { } 这种类型的函数是被CPU(Host)调用的 __device__float my_device_func() { } 这种类型的函数是被GPU(Device)调用的变量定义: __shared__float my_shared_array ; 这种数据类型可以被一个block共享，数据调用速度很快，相当于CPU中的L1 cache，GPU计算的一个问题就在于要把需要处理的线程分成合适的block，并且指定每一块block对应的显存，相当于人工分配缓存。定义Block大小和程序运行: dim3 grid_dim; grid_dim.x = 64; grid_dim.y = 64;//每个grid由64*64个block组成 dim3 block_dim; //类似的我们可以定义block的大小 my_kernel grid_dim, block_dim (...input...)//函数调用进行计算时，我们首先要在显存上分配空间，来保存我们用以计算的数据，语句如下 cudaMalloc( (void**) d_A, N * sizeof(float)); 以下代码可以把内存中的数据拷入显存 cudaMemcpy( d_A, h_A, N * sizeof(float), cudaMemcpyHostToDevice) ); 在计算结束后，我们可以类似地把数据拷回电脑内存。我们可以发现函数调用只是由一句命令完成的，那么我们怎样使不同的thread完成不同的功能呢，以下的函数便是一个例子。这个函数实现的是数组相加的工作。 __global__void vector_add(float* A, float* B, float* C) { int i = threadIdx.x+ blockDim.x* blockIdx.x; C = A + B ; } 其中threadIdx.x(y), blockIdx.x(y,z)唯一地标志了每个线程。基于这些简单的概念，笔者写了一个简单的利用CUDA进行PDE求解的小程序。但事实是这个程序的运行效率并不是很高，笔者也懒得去优化block size和诸如此类的一些东西了，Geforce 530GT的性能比较捉急，大家可能最好采购价格高于400的独立显卡来玩计算。; 5856 次阅读|0 个评论

[转载]Matlab中调用CUDA加速的方法: yngcan 2013-4-29 20:28; Matlab中调用CUDA加速的方法…… 呃，大家都知道哈，现在nVIDIA显卡的计算能力那是越来越强大了，不利用一下岂不是可惜了，尤其是现在Fermi大大加强了双精度计算能力之后，N卡+Matlab已经变成了我们解决数理问题的强大工具（计算速度可加快数十倍），但是应该如何使用呢…… Matlab版本：2010a (注：matlab版本需比vs高) 编译环境：Microsoft Visual Studio 2008 硬件需求： gpu显卡一块先从 http://developer.nvidia.com/object/matlab_cuda.html 网址上下载NVMEX的源码（cudaWhitePaper.zip），解压。打开nvmex.m的文件，找到 CUDA_LIB_Location = ‘C:\CUDA\lib’; Host_Compiler_Location = ‘-ccbin C:\Program Files\Microsoft Visual Studio 8\VC\bin‘; 将上面的部分改成自己本机的实际文件路径。修改之后，将其复制到你所要编译的文件目录下，例如：addMatrix.cu目录，并将此目录设置为matlab运行目录，在命令窗口输入： nvmex(‘addMatrix.cu’); 将其替换成自己的文件名，编译时会有一个错误，错误指向 = fileparts(cuFileName); 提示是符号不匹配，此时将此段代码注释掉或删掉，替换成 = fileparts(cuFileName) 即可完成编译。编译成功后，在matlab中即可像常规函数一样使用函数（addMatrix）。注：.cu文件时cuda的源文件，此外.o文件时vs编译时产生的文件按照上例你就可以在matlab中自由调用CUDA的函数来加速计算了 ……实际效果还是很好很强大的; 个人分类: 网络科学|6272 次阅读|0 个评论

[转载]CUDA 从入门到高深: timbre 2012-10-14 08:46; (转自http://hpcbbs.it168.com/forum.php?mod=viewthreadtid=2702extra=page%3D1%26filter%3Ddigest%26digest%3D1%26digest%3D1) http://hpcbbs.it168.com/forum.php?mod=viewthreadtid=4065extra=page%3D1%26filter%3Ddigest%26digest%3D1%26digest%3D1 CUDA 从入门到高深 CUDA入门 CUDA配置 http://cudabbs.it168.com/thread-636-1-1.html CUDA3.2+VS2010 http://cudabbs.it168.com/thread-2678-1-1.html CUDA +VS2008 http://cudabbs.it168.com/thread-2519-1-1.html CUDA_C_Getting_Started_Mac http://cudabbs.it168.com/thread-2701-1-1.html CUDA_C_Getting_Started_Linux http://cudabbs.it168.com/thread-2699-1-1.html 风辰的CUDA入门教程 http://cudabbs.it168.com/thread-2251-1-1.html 深入浅出CUDA http://cudabbs.it168.com/thread-339-1-4.html CUDA编程指南3.1中文版 http://cudabbs.it168.com/thread-2226-1-6.html CUDA经典入门 http://cudabbs.it168.com/thread-97-1-6.html GPU高性能运算之CUDA http://cudabbs.it168.com/thread-2494-1-1.html CUDA之OpenGL编程 http://cudabbs.it168.com/thread-2516-1-3.html CUDA高性能运算并行编程 http://cudabbs.it168.com/thread-2272-1-1.html CUDA中文教程讲座专辑汇总（感谢淡陌依人的整理） http://cudabbs.it168.com/thread-333-1-5.html CUDA编程要点-DOC http://cudabbs.it168.com/thread-2298-1-1.html CUDA参考手册-中文版 http://cudabbs.it168.com/thread-2737-1-2.html GPU，CUDA相关研究应用基于GPU的稀疏矩阵向量乘优化 http://cudabbs.it168.com/thread-2644-1-1.html 浅谈GPU在遥感影像融合中的应用 http://cudabbs.it168.com/thread-2430-1-1.html 基于GPU粒子系统的烟花实时仿真 http://cudabbs.it168.com/thread-2429-1-1.html 基于GPU的点模型绘制 http://cudabbs.it168.com/thread-2403-1-1.html 使用GPU实现快速K近邻搜索算法 http://cudabbs.it168.com/thread-2643-1-1.html 基于CUDA的矩阵乘法和FFT性能测试 http://cudabbs.it168.com/thread-2533-1-1.html CUDA_超大规模并行程序设计(开勇)- http://cudabbs.it168.com/thread-2512-1-1.html GPU加速分子动力学模拟的热力学量提取 http://cudabbs.it168.com/thread-2446-1-1.html 基于GPU的高速图像融合 http://cudabbs.it168.com/thread-2529-1-1.html 基于MPI标准的并行计算平台的设计与实现 http://cudabbs.it168.com/thread-2537-1-2.html 并行蚁群算法及其应用研究 http://cudabbs.it168.com/thread-2584-1-2.html 利用GPU进行通用数值计算的研究 http://cudabbs.it168.com/thread-2582-1-2.html 方程组的迭代法求解在GPU上的实现 http://cudabbs.it168.com/thread-2561-1-2.html 基于CUDA的H_264_AVC视频编码的设计与实现 http://cudabbs.it168.com/thread-2314-1-2.html 基于CUDA的快速三维医学图像分割 http://cudabbs.it168.com/thread-2329-1-3.html 基于GPU 的大视场景物畸变实时校正算法 http://cudabbs.it168.com/thread-2530-1-3.html 基于GPU加速的分形地形生成方法 http://cudabbs.it168.com/thread-2445-1-3.html 复杂多相流动分子动力学模拟在GPU上的实现 http://cudabbs.it168.com/forum-40-3.html 基于通用可编程GPU的视频编解码器——架构、算法与实现 http://cudabbs.it168.com/thread-2484-1-3.html 图形处理器CUDA编程模型的应用研究 http://cudabbs.it168.com/thread-2332-1-4.html CUDA并行计算加速方案 http://cudabbs.it168.com/thread-2318-1-4.html GPU\CPU 协同并行计算(CPC)在石油地震勘探资料处理中的应用 http://cudabbs.it168.com/thread-2381-1-4.html 基于CUDA的高性能SAR成像模拟 http://cudabbs.it168.com/thread-2297-1-5.html Fermi架构CUDA编程与优化 http://cudabbs.it168.com/thread-888-1-6.html 邓仰东：基于GPU的高性能嵌入式计算 http://cudabbs.it168.com/thread-1704-1-6.html 使用CUDA模拟顶点、几何着色器-原型演示 http://cudabbs.it168.com/thread-176-1-7.html 基于GPU的皮肤变形算法 http://cudabbs.it168.com/thread-2794-1-1.html 基于通用计算的GPU_CPU协作计算模式研究 http://cudabbs.it168.com/thread-2785-1-1.html 基于CUDA 的图像边缘检测方法 http://cudabbs.it168.com/thread-2773-1-1.html 基于GPU的串匹配算法研究 http://cudabbs.it168.com/thread-2397-1-1.html CUDA线程执行模型分析 http://cudabbs.it168.com/thread-2296-1-1.html GPU 显卡计算 http://cudabbs.it168.com/thread-2400-1-1.html GPU,CUDA周边阅读 GPU硬件知识 http://cudabbs.it168.com/thread-2372-1-5.html 从曲面细分看GPU图形和游戏的发展 http://cudabbs.it168.com/thread-2370-1-5.html 从Folding@home项目看GPU通用计算发展-PDF http://cudabbs.it168.com/thread-2645-1-2.html How_does_CUDA_work http://cudabbs.it168.com/thread-2557-1-2.html GPU 通用计算概述 http://cudabbs.it168.com/thread-2365-1-5.html CUDA——了解和使用共享内存 http://cudabbs.it168.com/thread-109-1-5.html CUDA软硬件环境简介 http://cudabbs.it168.com/thread-2311-1-5.html cuda相关概念介绍 http://cudabbs.it168.com/thread-94-1-5.html NVIDIA首席科学家Bill Dally 10.29中科院演讲 http://cudabbs.it168.com/thread-277-1-5.html CUDA:主导GPU计算的革命 http://cudabbs.it168.com/thread-2302-1-5.html Fermi白皮书 http://cudabbs.it168.com/thread-323-1-5.html CUDA平台下的复杂疾病全基因组基因相互作用计算 http://cudabbs.it168.com/thread-175-1-5.html 新一代高能运算技术——CUDA简介 http://cudabbs.it168.com/thread-2257-1-5.html CUDA.NET中文使用手册 http://cudabbs.it168.com/thread-2244-1-6.html 风　辰：矩阵与向量乘法的优化—— http://cudabbs.it168.com/thread-1621-1-7.html CPU和GPU擅长和不擅长的方面 http://cudabbs.it168.com/thread-2447-1-1.html GPU与CPU的比较分析 http://cudabbs.it168.com/thread-2351-1-1.html CUDA新问题分析-PPT http://cudabbs.it168.com/thread-2323-1-1.html 讲述GPU如何工作 http://cudabbs.it168.com/thread-2743-1-2.html 论坛官方资料（感谢xuxw001的整理）【官方文档】 CUDA_VideoEncoder_Library-PDF 【官方文档】 CUDA_VideoDecoder_Library-pdf 【官方文档】 CUDA_Toolkit_Reference_Manual-CHM 【官方文档】 CUDA_4.0_Readiness_Tech_Brief-PDF 【官方文档】 CUDA_Developer_Guide_for_Optimus_Platforms 【官方文档】 CUDA_C_Best_Practices_Guide-PDF 【官方文档】 CUBLAS_Library-PDF 【官方文档】 Compute_Visual_Profiler_User_Guide-PDF 【官方资料】 CUDA C 入门指南（ Windows ）相关工具 OpenCL性能测试工具 http://cudabbs.it168.com/thread-2508-1-2.html; 个人分类: 工具|0 个评论

[转载]Install NVIDIA CUDA in Ubuntu: lanlin 2011-9-29 18:39; This is my personal experience to install CUDA 4.0 in Ubuntu 10.04 and 10.10 following a tutorial . The setup has been tested both on 32-bits and 64-bits system. 1. Download NVIDIA developer driver and CUDA from NVIDIA homepage: http://developer.nvidia.com/cuda-toolkit-archive#Linux. Choose suitable version depending on you machine and system. For example, I downloaded the followiing files: Developer Drivers for Linux (270.41.19) CUDA Toolkit for Ubuntu Linux 10.10 GPU Computing SDK - complete package including all code samples (This is optional but strongly suggested) 2. Blacklist some modules which might interfere the NVIDIA developer driver. edit /etc/modprobe.d/blacklist.conf file and add the following lines in the file blacklist vga16fb blacklist nouveau blacklist rivafb blacklist nvidiafb blacklist rivatv and then save and quit. 3. Delete existing graphic driver sudo apt-get --purge remove nvidia-* then reboot your computer. 4. Enter into the virtual terminal and stop X server press CTR+ALT+F5 to quit X window and log in virtual terminal. Using the following command to stop X server sudo service gdm stop or sudo /etc/init.d/gdm stop 5. Go to the directory where you put downloaded NVIDIA files and run sudo sh devdriver_4.0_linux_64_270.41.19.run (file name may be different for you ). Accept the license agreement. Install NVIDIA's 32-bit compatibility OpenGL libraries? Answer 'Yes' - we don't know if this is actually necessary, but it does not seem to hurt... Would you like to run the nvidia-xconfig utility to automatically update your X configuration file so that the NVIDIA X driver will be used when you restart X? Answer 'Yes'. 6. Install cuda sudo sh cudatoolkit_4.0.17_linux_64_ubuntu10.10.run choose default install path. 7. Set environmental variables export PATH=/usr/local/cuda/bin:$PATH export LD_LIBRARY_PATH=/usr/local/cuda/lib:/usr/local/cuda/lib64:$LD_PATH_LIBRARY you can add the two lines in bash profile. 8. Install GPU computing SDK sh gpucomputingsdk_4.0.17_linux.run (do not use sudo) 9. Some libraries Repair broken link to libGL.so sudo rm /usr/lib/libGL.so sudo ln -s /usr/lib/libGL.so.260.24 /usr/lib/libGL.so Create link to libXmu.so sudo ln -s /usr/lib/libXmu.so.6 /usr/lib/libXmu.so install other libraries sudo apt-get install freeglut3-dev libxi-dev 10. Congratulation! You can doing GPU computing right now. Go to SDK directory and build all examples. You can run some examples.; 5267 次阅读|0 个评论

“显卡计算”与“天河1号”: 热度 1 seawan 2011-3-24 19:51; 我真是孤陋寡闻。今天听一个老同学说，我国的“天河1号”现在是世界超算第一名了。我上网查查，果然。2700多万亿次/秒，第一。而“显卡计算”这样技术，就是超级计算机背后的动力。据说天河1号就是这样的，有无数个NVidia显卡组成的一个并行CPU+GPU群。也就是说，现在的超级计算机，已经不再是超级CPU的同义词了，而是“显卡群”的同义词？另外，在NVidia的网站的图片上，可以看到对Java/Pathon的支持。但是，在软件下载的区域，却只有C/C++。Why? 找了半天，才发现下面链接： Java wrapper jCUDA: Java for CUDA Bindings for CUDA BLAS and FFT JaCUDA .NET integration for CUDA Thrust: C++ template Library for CUDA CuPP : C++ framework for CUDA Libra: C/C++ abstraction layer for CUDA F# for CUDA; 个人分类: 并行计算|3820 次阅读|1 个评论

How to bind CUDA 3D array to texture memory: 热度 3 cobra22 2010-7-8 01:51; 众所周知，Global memory没有Cache，访问速度很慢，Shared memory访问速度很快，但是容量很小，对于较大的数组，将其绑定至texture memory往往是个不错的选择。Texture memory可以cache，而且容量很大。在当前的CUDA版本中，3D的线性内存是无法直接绑定到texture memory，一维的可以，因此，需要将数据首先放进一个3D的CUDA array，然后将3D CUDA array绑定到texture memory上，访问数组元素时，通过取纹理的函数tex3D(tex,x,y,z)可以返回坐标为（x,y,z）的元素。 1. 创建CUDA 3D array 在之前的CUDA版本中，extent.width与height，depth不同，其计数单位为bytes，所以在旧版本中必须使用array_width*sizeof(float)，最新的3.1竟然悄悄的修改了。可以CUDA的文档一直是错误的，文档中记载width,height,depth均是in bytes，实际上赋值时使用元素个数即可。如果不直接赋值，还可以调用函数make_cudaExtent(extent,width,height,depth), 原理类似。 cudaArray *d_u cudaChannelFormatDesc channelDesc = cudaCreateChannelDescfloat(); cudaExtent extent; extent.width=array_width; extent.height=array_height; extent.depth=array_depth; cudaMalloc3DArray(d_u,channelDesc,extent); 2. 复制数据至3D array 首先解释一下pitched pointer的工具原理，如果访问数组元素u ，通过pitched pointer访问则是u_p 。显然，这里pitch=width，因此当创建pitched pointer时我们需要将width和height作为参数传递给函数make_cudaPitchedPtr()。在这里尤其要注意的是，pitched pointer指向的array与传统的C语言数组的存储方式不同，C语言访问元素u 是通过u 。因此为了正确读取所需元素，我建议逆序建立pitched pointer： copyParams.srcPtr = make_cudaPitchedPtr((void*)u, array_depth*sizeof(float), array_depth, array_height); 此时相当于数组u 被转置，在CUDA3D array中对应元素为u ，CUDA文档与指南中并未提及这一点区别，这个问题当时也困扰我很久，费尽周折才搞清楚，希望以后的SDK sample能覆盖这个注意点。 cudaMemcpy3DParms copyParams = {0}; copyParams.srcPtr = make_cudaPitchedPtr((void*)u, array_width*sizeof(float), array_width, array_height); copyParams.dstArray = d_u; copyParams.extent = extent; copyParams.kind = cudaMemcpyHostToDevice; cudaMemcpy3D(copyParams); 3. 绑定3D array至texture memory normalized 设置是否对纹理坐标是否进行归一化。如果normalized是一个非零值，那么就会使用归一化到 , , 寻址。例如，一个尺寸为6432的纹理可以通过x维度范围为，y维度范围的坐标寻址。如果采用归一化方式对尺寸为6432的纹理进行寻址，在x和y维度上的坐标就都是 Could not bind texture u\n); return; }; 个人分类: 科研笔记|10652 次阅读|1 个评论

更多...

帐号		自动登录	找回密码
密码			注册

关闭 安全验证

标签: CUDA

相关帖子

相关日志

关闭安全验证