博文

未来的内存系统的机遇和挑战

已有 6494 次阅读 2016-8-22 18:53 |个人分类:高性能计算|系统分类:科研笔记|关键词:学者| 高性能计算, 计算机架构, 内存存储

未来几周，我们将会探讨一下未来计算机架构的未来发展趋势。上一期，我们探讨了处理器。这一期，让我们来探讨一下内存存储方面的发展趋势。

主要参考文献包括：

Onur Mutlu and Lavanya Subramanian,

"ResearchProblems and Opportunities in Memory Systems"

InvitedArticle in SupercomputingFrontiers and Innovations (SUPERFRI), 2015.

https://users.ece.cmu.edu/~omutlu/pub/memory-systems-research_superfri14.pdf

这篇文章的作者Onur Mutlu，是大名鼎鼎的Onur，做内存计算的大概都知道这位大哥，最近从CMU挪窝到ETHZ。下面我们来一起看看这篇文章。文章里面讲了囊括了很多技术性的解法，但是我们主要想通过数据来说话，来看看内存系统的发展方向。

内存架构发展的趋势:

Onur从三个方面来探讨未来发展的趋势：systems/architecture front, applications front,technology front.

- Architecture: 这里强调了heterogeneous processing cores对内存系统的QoS的需求

- Application:随着更多的应用在内存里面常驻，产生对内存系统的不同要求。例如，图应用需要的更多的是latency, 而OLAP数据库更多的是带宽。

- Technology: nvram是主角，还有就是千呼万唤的Intel Xpoint.

在Onur看来，下面这些Technical challenges是主要的挑战。个人认为现在单单看内存本身已经比较局限，更多的需要和系统和应用相结合，Onur提出的三个challenges里面，解决方案不少都是跟上层结合的。大家感兴趣可以看这个文章，洋洋洒洒30多页啊啊。

1) Overcome scaling challenges with DRAM,

2) Enable the use of emerging memory technologies,

3) Design memory systems that provide predictable performanceand quality of service to applications and users

由于时间关系，下面我就根据这个文章，结合我的理解，来分别讲讲这三个挑战为啥是个问题，他们的影响是什么。各种具体解决方案可以看文章。

Overcome scaling challenges with DRAM

They have identiﬁed three major challenges as impediments toeﬀectivescaling of DRAM to smaller technology nodes: 1) the growing cost ofrefreshes,2) increase in write latency, and 3) variation in the retention timeof a cell over time DRAM Capacity & Latency Over Time.

关于前两点，大家看下图就明白了。

Scaling带来的另外一个问题是可靠性。在这篇论文里面，FlippingBits in Memory Without Accessing Them: An Experimental Study of DRAMDisturbance Errors, (Kim et al., ISCA 2014)讲了不少关于内存可靠性的内容，大家感兴趣可以看看。

最后一个因素就是功耗。有一些研究工作[Lefurgy, IEEE Computer 2003] 指出，内存功耗可以占整个机器的40-50%。这个比例当然是每个机器不一样的，但是如果未来机器具有很大的内存，目前已经有单机TB级别的，内存功耗就不容小觑了。另外一个坏消息是，DRAM consumes power even when not used (periodic refresh)，这个就是之前的那个关于refresh的图了。

Emerging memory technologies

主要是看新的内存硬件技术(e.g.,NVRAM) 如何改变DRAM整个生态圈。以前大家都期盼NVRAM替代DRAM，但是貌似最近大家更接受DRAM+NVRAM混合。至于怎么混合，貌似还没有定论，一个原因可能是因为大部分的NVRAM都还是在实验室里面。唯一一个已经公布的Intel Xpoint，大家也是等啊等啊。不管怎么样，这是一个大家都在盯着的领域，不仅仅是计算机架构的，还有各种应用和系统的，包括数据库，操作系统等等领域。Onur写道“We believe emerging technologiesenable at least three major system-level opportunitiesthat canimprove overall system eﬃciency: 1) hybrid main memory systems, 2)non-volatilemain memory, 3) merging of memory and storage.”

其实，这里强调的是system-level，下面两个图分别是讲：1) 平行架构, 2) NVM作为外部存储部件。不管是哪个，对上层应用和系统都是很大的挑战。一方面，我们有如此多的legacy code base, 如何让这些legacy code能够利用到新的硬件特性，可喜的是Linux内核开发社区已经注意到这个问题，并计划对新型的NVM进行支持。此外，一些研究团队提出支持NVM的文件系统如BPFS、PMFS、SCMFS等，在内核层面支持 NVM对应用程序的透明性，但增加了用户空间与内核空间数据拷贝的开销。另外一方面，新的应用如何把NVRAM用到极致，从性能和能耗等方面目前都还有探索的空间。我们需要从系统的层面来平衡这个事情。

Predictable memory performance

内存计算已经在很多领域有着广泛的应用。随着应用的增多和多核的普及，内存系统如何提供可预测的性能变成一个大问题。他们的实验里面看到最高有5倍的slowdown （下图）. 因此，Onur写道”Towards thisend,previous works have explored two diﬀerent solution directions: 1) to mitigateinterference,thereby reducing application slowdowns and improving overallsystem performance, 2) to precisely quantify and control the impact ofinterference on application slowdowns, thereby providing performance guaranteesto applications that need such guarantees.”

其他相关工作：

相关的工作其实有很多，这里我就讲讲几个最近我看过的，觉得很有意思的:

中科院计算所的YungangBao提出了PARD[1]， The Computer as a Network (CaaN). 这个项目非常有意思的结合软件定义网络来提供QoS, 可以说是解决Onur提到的第三个挑战.据说，他们这一套软硬件方案已经用到某公司的实际部署。

华中科技大学的Xuanhua Shi组最近做了一个关于内存计算的生命周期管理器 [2]。这个工作虽然跟内存架构不相关，但是体现了一个问题：软件管理是很有必要的，最近不少基于java的内存计算系统(e.g.,Spark) 都在争取把内存管起来。当然随着内存系统架构的发展，内存管理变得更加复杂，也变得更加的有挑战性。

3D堆叠和HighBandwidth Memory (HBM) 应该也是一个亮点。AMDand NVIDIA GPU都会在下一代产品里面支持，Intel KNL里面也有。如何使用这块高带宽而且容量还不小的内存(>=16GB)是个幸福的烦恼，哈哈。

最近在ISCA 2016上面，FredChong and Yuan Xie他们合作提出了"Mellow Writes: Extending Lifetimein Resistive Memories through Selective Slow Write Backs" 来延长resistivememory的生命并且控制在很小的性能损失[3]。这个想法是基于一下两个观察: 第一，Fortypical Resistive Memory technologies, slower writes are predicted to have aquadratic endurance advantage! 第二，Memorybanks are idle for most of the time. 见下图//我认为这里的实验可以做的更好，如果考虑多核多应用。

有了这两个观察，我想大家也就不难想出下一部的解决方案了，就是找出那些可以慢慢的写，但是对性能又不会有很大影响的写操作。具体可以看看看看他们的论文。我们自己小组跟香港浸会大学做了一个工作如何使用NVRAM来进行“Real-TimeIn-Memory Checkpointing for Future Hybrid Memory Systems” [4]. 我们跟上面这个ISCA的论文有类似之处，就是充分利用idle period来做些有用的事情，但是同时尽可能小的影响应用性能。

最后，我们小组在这个方向上也是做了不少的工作，从闪存开始，我们是第一个团队做出高性能的B+tree index (FB-Tree) [5], 后来陆陆续续做了一些buffermanagement 的工作 (FD-buffer [6])，还有就是和JianliangXu@HKBU合作在NVRAM上面如何提高事务处理的性能[7]。最近，我们针对NVRAM的一致性问题提出了NV-tree[8].

最最后，我们的期待啊，您什么时候出来啊？

[1] Jiuyue Ma,Xiufeng Sui, Ninghui Sun, Yupeng Li, Zhihao Yu, Bowen Huang, Tiani Xu, ZhichengYao, Yun Chen, Haibin Wang, Lixing Zhang, YungangBao, Supporting DifferentiatedServices in Computers via Programmable Architecture for Resourcing-on-Demand(PARD) , in the 20th International Conference on Architectural Support forProgramming Languages and Operating Systems (ASPLOS), 2015.

[2] Lu Lu,Xuanhua Shi, Yongluan Zhou, Xiong Zhang, Hai Jin, Cheng Pei, Ligang He,YuanzhenGeng, "Lifetime-Based Memory Management for Distributed DataProcessing Systems". Proceedings of the VLDB Endowment (PVLDB), New Delhi,India, Sept. 5-9, 2016.

[3] LunkaiZhang, Brian Neely, Diana Franklin, Dmitri Strukov, Yuan Xie, Frederic T.Chong. Mellow Writes: Extending Lifetime in Resistive Memories throughSelective Slow Write Backs, In the proceedings of the 43rd Annual Intl.Symposium on Computer Architecture (ISCA) , June 2016. Seoul, Korea.

[4] Shen Gao*,Bingsheng He, Jianliang Xu. Real-Time In-Memory Checkpointing for Future HybridMemory Systems. ACM ICS 2015: 2015 International Conference on Supercomputing. http://www.comp.nus.edu.sg/~hebs/pub/DRAMCheckpoint-ICS15.pdf

[5] Yinan Li,Bingsheng He, Robin Jun Yang, Qiong Luo and Ke Yi. Tree Indexing on Solid StateDrives. Proceedings of the VLDB Endowment, Volume 3 Issue 1-2, September 2010, pp.1195--1206.

[6] Sai TungOn*, Shen Gao, Bingsheng He, Ming Wu, Qiong Luo, Jianliang Xu. FD-Buffer: ACost-Based Adaptive Buffer Replacement Algorithm for Flash Memory Devices. IEEETC 2014: IEEE Transactions on Computers, vol.63, no.9, pp.2288--2301, Sept.2014.

[7] Sai TungOn, Jianliang Xu, Byron Choi, Haibo Hu, Bingsheng He. Flag Commit: SupportingEfficient Transaction Recovery on Flash-based DBMSs. TKDE 2012: IEEETransactions on Knowledge and Data Engineering, Volume: 24, Issue: 9, Page(s):1624-1639.

[8] Jun Yang,Qingsong Wei, Cheng Chen, Chundong Wang, and Khai Leong Yong, Bingsheng He.NV-Tree: Reducing Consistency Cost for NVM-based Single Level Systems. FAST'15:13th USENIX Conference on File and Storage Technologies.

Author byBingsheng He (commented by Haikun Liu)

*本文仅供学术交流所用，图片文字版权归各自拥有者。

关注我们：