Bounty Rewards 20 000 CHF for a Multi GPU miner and a guide on how to set up and use it 20 000 CHF for the Stratum miner and a guide how to use it 25 000 CHF for improving the efficiency of the currently existing AE mining software CHF,瑞士法郎;三个层次的挑战65000CHF,折合rmb有45万左右。精通CUDA编程的上!可惜我不够懂。 悬赏细则(墙外,能应战的翻墙肯定不是问题): https://blog.aeternity.com/bounties-multi-gpu-miner-guide-stratum-miner-and-mining-optimization-51029d468e79
The GPU (graphics processing units), as a specialized computer processor, addresses the demands of real-time high-resolution 3D graphics compute-intensive tasks. GPU had envolved into highly parrel multi-core systems allowing very efficient manipulation of large blocks of data. This design is more effective than general-purpose CPUs (central processing unit) for algorithms in situations where processing large blocks of data is done in paralle. CUDA is a parallel computing platform and application programming interface (API) model created by Nvidia. It allows software developers/engineers to use a CUDA-enabled GPU for general purpose processing. The CUDA platform is designed to work with programming languages such as python, C, C++ and Fortran. This accessibility makes it easier for specialists in parallel programming to use GPU resources.
随着用于分子系统发育推断数据量的增大,尤其是基于基因组和转录组的大数据时代的来临,越来越多研究者面临计算周期延长的挑战,因此人们提出了一些通过并行计算进行加速的解决方案。其中最典型的要数基于多CPU的解决方案,这种解决方案出现的时间最早、最成熟,目前也都受主流系统发育重建软件的支持。不过其局限性也比较明显,例如使用的灵活性、能源消耗率、对后验概率算法的加速效果等。基于这些考虑,我们设想,能否基于Nvidia显卡的GPU硬件、在CUDA计算环境下,实现在单机上对MrBayes的加速?在南开大学计算机与控制工程学院刘晓光和王刚教授研究组的努力下,这一设想得到了比较好的实现。 在2011年实现的对于核酸数据的加速水平为15-20倍的样子,矩阵越大加速比越高,最高可达40倍以上 。这意味着原先需要2-3个星期的计算可以被缩短到1天之内。 在2013年,对于核酸数据数据的加速比被提高到63-170倍,这意味着原先需要9个星期的计算可以被缩短到1天之内 。 在2015年,实现了对于蛋白质数据的加速 。 目前,对于MrBayes中形态和二级结构等其它数据的加速改造没有进行过尝试,由于这方面的数据量增长潜力不大,对适用于这些数据进行加速改造的意义不大。此外,目前对于核酸-蛋白质混合数据的加速改造尚未实现,还需要更多尝试和努力。 参考文献 Zhou J-F, *Liu X-G, Stones DS, Xie Q, Wang G. 2011. MrBayes on a Graphics Processing Unit. Bioinformatics 27: 1255-1261. http://bioinformatics.oxfordjournals.org/content/27/9/1255.full.pdf+html Bao J, Xia H, Zhou J-F, *Liu X, *Wang G. 2013. Efficient Implementation of MrBayes on Multi-GPU. Mol. Biol. Evol. 30: 1471-1479. http://mbe.oxfordjournals.org/content/30/6/1471.full.pdf+html Pang S, Stones RJ, *Ren MM, Liu XG, Wang G, Xia HJ, Wu HY, Liu Y, Xie Q. 2015. GPU MrBayes V3.1: MrBayes on Graphics Processing Units for Protein Sequence Data. Mol. Biol. Evol. doi: 10.1093/molbev/msv129, http://mbe.oxfordjournals.org/content/early/2015/05/25/molbev.msv129.abstract?papetoc
CudaPre3D: An Alternative Preprocessing Algorithm for Accelerating 3D Convex Hull Computation on the GPU Abstract In the calculating of convex hulls for point sets, a preprocessing procedure that is to filter the input points by discarding non-extreme points is commonly used to improve the computational efficiency. We previously proposed a quite straightforward preprocessing approach for accelerating 2D convex hull computation on the GPU. In this paper, we extend that algorithm to being used in 3D cases. The basic ideas behind these two preprocessing algorithms are similar: first, several groups of extreme points are found according to the original set of input points and several rotated versions of the input set; then, a convex polyhedron / hull is created using the found extreme points; and finally those interior points that locate inside the formed convex polyhedron / hull are discarded. Experimental results show that: when employing the proposed preprocessing algorithm, it achieves the speedups of about 4x on average and 5x ~ 6x in the best cases over the cases where the proposed approach is not used. In addition, more than 95% input points can be discarded in most experimental tests. https://www.researchgate.net/publication/275887547_CudaPre3D_An_Alternative_Preprocessing_Algorithm_for_Accelerating_3D_Convex_Hull_Computation_on_the_GPU
CUDA编程主要做的就是和GPU打交道,在和这样的一个陌生的家伙交流之前,我们需要做的就是先得认识和熟悉这个家伙。 在深入研究如何编写设备代码之前,我们需要通过某种机制来判断计算机中当前有哪些设备,以及每个设备都支持哪些功能。幸运的是,可以通过一个非常简单的接口来获得这种信息。首先,我们希望知道在系统中有多少个设备是支持CUDA架构的,并且这些设备能够运行基于CUDA C编写的核函数。要获得CUDA设备的数量.可以调用cudaGetDeviceCount()。这个函数的作用从色的名字就可以看出来。在调用cudaGetDeviceCount()后,可以对每个设备进行迭代、井查询各个设备的相关信息。CUDA运行时将返回一个cudaDevice Prop类型的结构,其中包含了设备的相关属性。我们可以获得哪些属性?从CUDA 3.0开始,在cudaDeviceProp结构中包含了以下信息: view plain copy struct cudaDeviceProp { char name ; //标识设备的ASCII字符串 size_t totalGlobalMem; //设备上全局内存的总量,单位为字节 size_t sharedMemPerBlock; //在一个线程块(Block)中可使用的共享内存总量,单位为字节 int regsPerBlock; //每个线程块中可用的32位寄存器数量 int warpSize; //在一个线程束(warp)中包含的线程数量 size_t memPitch; // 在内存复制中最大的修正量(Pitch),单位为字节 int maxThreadsPerBlock; //在一个线程块中包含的最大线程数目 int maxThreadsDim ; //在多维线程块数组中,每一维包含的最大线程数量 int maxGridSize ; //在一个线程格(Grid)中,每一维可以包含的线程块的数量 size_t totalConstMem; //常量内存的总量 int major; //设备计算功能集的主版本号 int minor; //设备计算功能集的次版本号 int clockRate; // size_t textureAlignment; //设备的纹理对齐要求 int deviceoverlap; //一个布尔类型值,表示设备是否可以同时执行一个cudaMemory()调用和一个核函数调用 int multiProcessorCount; //设备上多处理器的数量 int kernelExecTimeoutEnabled; //一个布尔值,表示该设备上执行的核函数是否存在运行时限制 int integrated; //一个布尔值,表示设备是否是一个集成的GPU int canMapHostMemory; //一个布尔值,表示设备是否将主机内存映射到cuda设备地址空间 int computeMode; //表示设备的计算模式:默认,独占或禁止 int maxTexture1D; //一维纹理的最大大小 int maxTexture2D ; //二维纹理的最大维数 int maxTexture3D ; //三维纹理的最大维数 int maxTexture2DArray ; //二维纹理数组的最大维数 int concurrentKernels ; //一个布尔值,表示设备是否支持在同一个上下文中同时执行多个核函数 }; 设备属性的使用 通过上面的结构体,我们大致了解了设备的属性,然后我们就可以通过这个结构体来查询设备属性了。可能会有人问,到底我们需要这些设备属性来干嘛,别着急,以后在编写相关性能优化的代码的时候,就知道了解这些属性的好处了。现在我们只需要知道方法就可以了。 首先我们可以通过两个函数,第一个就是上面的cudaGetDeviceCount()来选择设备,然后循环地通过getDeviceProperties()来获得设备的属性,之后我们就可以通过这样的一个结构体变量将设备的属性值获取出来。 view plain copy #include cuda_runtime.h #include iostream using namespace std; int main() { cudaDeviceProp prop; int count; cudaGetDeviceCount(count); for ( int i = 0 ; i count ; i++) { cudaGetDeviceProperties(prop,i); cout the information for the device : iendl; cout name: prop.nameendl; cout the memory information for the device : iendl; cout total global memory: prop.totalGlobalMemendl; cout total constant memory: prop.totalConstMemendl; cout threads in warps: prop.warpSizeendl; cout max threads per block: prop.maxThreadsPerBlockendl; cout max threads dims: prop.maxThreadsDim prop.maxThreadsDim prop.maxThreadsDim endl; cout max grid dims: prop.maxGridSize prop.maxGridSize prop.maxGridSize endl; } return 0; } 我这边只是获取一部分的属性值,只是和大家介绍一下,具体的属性值可以按照这样的方法来获取。 原文地址:http://blog.csdn.net/timmawang/article/details/10362701
在 RHEL 6.5 上安装 cuda-toolkit,提示不能启动图形服务器(X Server)。 处理方法是修改 /etc/inittab 文件,将文件最后一行的“id:5:initdefault”修改为“id:3:initdefault”,再 reboot 就只进入字符界面,此时可直接安装 cuda-toolkit。 安装完成后,再将 /etc/inittab 文件配置修改回来 5,再次启动就会启动图形界面。 PS:/etc/inittab 文件在内核启动后,由init 进程读取,可以控制系统运行级别: The runlevels used are: # 0 - halt (Do NOT set initdefault to this) # 1 - Single user mode # 2 - Multiuser, without NFS (The same as 3, if you do not have networking) # 3 - Full multiuser mode # 4 - unused # 5 - X11 # 6 - reboot (Do NOT set initdefault to this) 0:表示关机 1:表示单用户模式,在这个模式中,用户登录不需要密码,默认网卡驱动是不被加载,一些服务不能用。 2:表示多用户模式,NFS服务不开启 3,表示命令行模式 4,这个模式保留未用 5,表示图形用户模式 6,表示重启系统
This is my personal experience to install CUDA 4.0 in Ubuntu 10.04 and 10.10 following a tutorial . The setup has been tested both on 32-bits and 64-bits system. 1. Download NVIDIA developer driver and CUDA from NVIDIA homepage: http://developer.nvidia.com/cuda-toolkit-archive#Linux. Choose suitable version depending on you machine and system. For example, I downloaded the followiing files: Developer Drivers for Linux (270.41.19) CUDA Toolkit for Ubuntu Linux 10.10 GPU Computing SDK - complete package including all code samples (This is optional but strongly suggested) 2. Blacklist some modules which might interfere the NVIDIA developer driver. edit /etc/modprobe.d/blacklist.conf file and add the following lines in the file blacklist vga16fb blacklist nouveau blacklist rivafb blacklist nvidiafb blacklist rivatv and then save and quit. 3. Delete existing graphic driver sudo apt-get --purge remove nvidia-* then reboot your computer. 4. Enter into the virtual terminal and stop X server press CTR+ALT+F5 to quit X window and log in virtual terminal. Using the following command to stop X server sudo service gdm stop or sudo /etc/init.d/gdm stop 5. Go to the directory where you put downloaded NVIDIA files and run sudo sh devdriver_4.0_linux_64_270.41.19.run (file name may be different for you ). Accept the license agreement. Install NVIDIA's 32-bit compatibility OpenGL libraries? Answer 'Yes' - we don't know if this is actually necessary, but it does not seem to hurt... Would you like to run the nvidia-xconfig utility to automatically update your X configuration file so that the NVIDIA X driver will be used when you restart X? Answer 'Yes'. 6. Install cuda sudo sh cudatoolkit_4.0.17_linux_64_ubuntu10.10.run choose default install path. 7. Set environmental variables export PATH=/usr/local/cuda/bin:$PATH export LD_LIBRARY_PATH=/usr/local/cuda/lib:/usr/local/cuda/lib64:$LD_PATH_LIBRARY you can add the two lines in bash profile. 8. Install GPU computing SDK sh gpucomputingsdk_4.0.17_linux.run (do not use sudo) 9. Some libraries Repair broken link to libGL.so sudo rm /usr/lib/libGL.so sudo ln -s /usr/lib/libGL.so.260.24 /usr/lib/libGL.so Create link to libXmu.so sudo ln -s /usr/lib/libXmu.so.6 /usr/lib/libXmu.so install other libraries sudo apt-get install freeglut3-dev libxi-dev 10. Congratulation! You can doing GPU computing right now. Go to SDK directory and build all examples. You can run some examples.
我真是孤陋寡闻。今天听一个老同学说,我国的“天河1号”现在是世界超算第一名了。 我上网查查,果然。2700多万亿次/秒,第一。 而“显卡计算”这样技术,就是超级计算机背后的动力。 据说天河1号就是这样的,有无数个NVidia显卡组成的一个并行CPU+GPU群。 也就是说,现在的超级计算机,已经不再是超级CPU的同义词了,而是“显卡群”的同义词? 另外,在NVidia的网站的图片上,可以看到对Java/Pathon的支持。但是,在软件下载的区域,却只有C/C++。Why? 找了半天,才发现下面链接: Java wrapper jCUDA: Java for CUDA Bindings for CUDA BLAS and FFT JaCUDA .NET integration for CUDA Thrust: C++ template Library for CUDA CuPP : C++ framework for CUDA Libra: C/C++ abstraction layer for CUDA F# for CUDA