Resources and tools for bioinformatics study. Please contribute this document if you have new recommendations or some commments on current collections Some small tools are also provided under ./small_tools/, such as file merge, gene list merge, gene list comparison, etc. Created, Oct 6, 2011 Latest update, Oct 28, 2013 *************************************************************************** --------------------------------------------------------------------------- *************************************************************************** 1. Data resources 1.1 Protein-protein interaction HPRD (recommended for initial study; download the binary interaction file if you do not concern the quality; pay attention to the evidence codes, if you want to focus on the PPI detected by low-throughput experiments) BioGRID MINT IntAct MIPS BIND (Note: some protein-DNA interactions are also included) STRING (Note: they are many functional associations rather than physical interactions) 1.2 Pathways KEGG (You can download raw data files from its FTP site. Please look at the map_title.tab and hsa_gene_map.tab for GeneID mapping) BioCarta (No source file... Only include the core genes related to the signaling pathway) Reactome (Many more species avaliable) NetPath NCBI PID 1.3 Gene regulations TargetScan (predicted miRNA targets; tool is also avaliable) RNA22 (miRNA target prediction tool) miRBase (microRNA database) TarBase (experimentally verified microRNA target database) miR2Disease (a manually curated database, aims at providing a comprehensive resource of miRNA deregulation in various human diseases) TRED (transcription regulation database; curated from literature) TRANSFAC (TFBS PWMs; many useful information; Please pay attention to the quality of the PWMs;need license.....) JASPAR (Open-access PWMs) ENCODE (Huge number of data....) 1.4 Gene function Gene Ontology (A review paper is recommended: Rhee et al. Use and misuse of the gene ontology annotations. Nat Rev Genet 2008, 9:509-515.) NCBI Gene Database 1.5 Gene expression NCI-60 project (gene miRNA expressions from 60 cancer cell lines) Connectivity Map (gene expressions from many cell lines treated by different drugs under different dosages) NCBI GEO (you can download .CEL raw data for further processing) EBI ArrayExpress (I do not like the file format of ArrayExpress....) ENCODE (many resources including RNA-seq data) TCGA (http://cancergenome.nih.gov/ the cancer genome atlas) CCLE (http://www.broadinstitute.org/software/cprg/?q=node/11 cancer cell line encyclopedia) 1.6 Drug related DrugBank PubChem ATC code SIDER 1.7 Disease related HPO SIDER OMIM miR2Disease (a manually curated database, aims at providing a comprehensive resource of miRNA deregulation in various human diseases) TCGA (http://cancergenome.nih.gov/ the cancer genome atlas) CCLE (http://www.broadinstitute.org/software/cprg/?q=node/11 cancer cell line encyclopedia) 1.8 Standard vocabulary HNGC (mapping many IDs/Names to standard IDs; EntrezGeneID is recommended;http://www.genenames.org;a sample code is given ./id_mapper/) UMLS (Unified Medical Language System) MeSH (Medical Subject Headings) *************************************************************************** --------------------------------------------------------------------------- *************************************************************************** 2 Tools 2.1 Integrated portals and platforms UCSC Genome Browser (please learn how to use Table Browser and how to add Custom Track) IPA (Read documents from http://www.ingenuity.com/; a commercial integrated functional annotating systems) Expander Cytoscape (Network visualization and small scale network analysis) BioConductor (a R platform, including many packages for bioinformatics analysis; please read its documents) 2.2 Sequence analysis BLAST BLAT (compare similar and long sequence) Bowtie (recommended for deep sequencing analysis) ClustalX (local multiple alignment) Sim4 2.3 Literature mining Literature mining scripts written by Jun Yuan (Please refer to the dir ./literature_mining/) 2.4 Statistical packages fdrtool (calculate the q-values based on p-value, z-score, t-score and correlation) 2.5 Functional annotation or gene set analysis GSEA (gene set enrichment analysis package) DAVID web tools Ontologizer (gene set analysis for GOs with hierarchical information and visualization) 2.6 Gene regulation TargetScan (miRNA target prediction) RNA22 (miRNA target prediction;easy for use) DME/STORM (motif analysis package first written by Andrew Smith, recommended; many other tools in the same package) MEME (for small scale motif analysis, very slow) MINDy (modulator inference by network dynamics) miRHiC (regulatory inference from hierarchical gene co-experessed signatures) ARCANE (gene regulatory inference from large-scale gene expression data) 2.7 Microarray processing dChip (easy to use; please refer to the documents and some scripts under ./microarray_dchip/; Combat for adjusting batch effects) RMA (similar usage as dChip) SAM (Significance analysis of microarray, to detect differentially expressed genes; EXCEL plugin/R scripts; I recommend write your own code (t-test + fdr adjustment) to identify differentially expressed genes...) EDGE (Identify differentially expressed genes in time-course datasets; the sample size should be more than 10 according to my experience) STEM (Identify gene expression patterns from time-course datasets with limited number of time points; easy to use, java platform) FastDMA (analyzer for illumina humanmethylation450 beadchip) *************************************************************************** --------------------------------------------------------------------------- *************************************************************************** 3 Conferences and Journals 3.1.1 Bioinformatics 3.1.2 PLoS Computational Biology 3.1.3 BMC Bioinformatics/Genomics/Systems Biology 3.1.4 PLoS ONE 3.1.5 Nucleic Acids Research (Computational Biology/Webserver Issue/Database Issue) 3.1.6 Nature Biotechnology/Method (Computational Biolgy) 3.1.7 Quantitative Biology 3.2.1 ISMB/ECCB (Intelligent Systems for Molecular Biology) *** 3.2.2 RECOMB (Research in Computational Molecular Biology) *** 3.2.3 APBC (Asia Pacific Bioinformatics Conference) ** 3.2.3 InCOB (International Conference on Bioinformatics) ** 3.2.4 BIBM (IEEE International Conference on Bioinformatics and Biomedicine) 3.2.5 PSB (Pacific Symposium on Biocomputing)
熊荣川 xiong rongchuan 六盘水师范学院生物信息学实验室 xiongrongchuan@126.com http://blog.sciencenet.cn/u/Bearjazz 生物信息学可以根据已有的数据对蛋白质的信息进行预测. 再此介绍一个运行速度很快而且也比较权威的在线工具 ProtParam 网址: http://web.expasy.org/protparam/ 其网站上功能介绍如下: ProtParam ( References / Documentation ) is a tool which allows the computation of various physical and chemical parameters for a given protein stored in Swiss-Prot or TrEMBL or for a user entered sequence. The computed parameters include the molecular weight, theoretical pI, amino acid composition, atomic composition, extinction coefficient, estimated half-life, instability index, aliphatic index and grand average of hydropathicity (GRAVY) ( Disclaimer ). 只要在数据框中粘贴你的氨基酸序列并提交, 即可预测该该蛋白质的相关信息. 如等电点、分子量、分子式、不稳定系数(instability index, 40为稳定, 40为不稳定) 预测蛋白质功能作用位点的在线工具 http://myhits.isb-sib.ch/cgi-bin/motif_scan 在数据框中输入一定格式的氨基酸序列(具体可以先观察其带的例子),在数据框下面复选参考数据库,然后搜索. 可以查看类似蛋白激酶磷酸化位点之类的信息.
DIGITAL HUMANITIES TOOLS DELICIOUS Delicious is a Social Bookmarking service, whereby one may save bookmarks online, share them with others, and see what other people are bookmarking. DIIGO Diigo is two services in one --a research/collaborative research tool, and a knowledge-sharing community/social content site. TWITTER Twitter is a real-time short messaging service that works over multiple networks and devices. In countries all around the world, people follow the sources most relevant to them and access information via Twitter as it happens-from breaking world news to updates from friends. WORDLE Wordle is a toy for generating word clouds from text that you provide. The clouds give greater prominence to words that appear more frequently in the source text. You can tweak your clouds with different fonts, layouts, and color schemes. The images you create with Wordle are yours to use however you like. You can print them out, or save them to the Wordle gallery to share with your friends. WORDPRESS WordPress is a state-of-the-art publishing platform with a focus on aesthetics, web standards, and usability. WordPress is both free and priceless at the same time. More simply, WordPress is what you use when you want to work with your blogging software, not fight it. OPEN SOURCE SOFTWARE FEDORA Fedora is a Linux-based operating system that showcases the latest in free and open source software. Fedora is always free for anyone to use, modify, and distribute. It is built by people across the globe who work together as a community: the Fedora Project. The Fedora Project is open and anyone is welcome to join. KUALI Kuali is a growing community of universities, colleges, businesses, and other organizations that have partnered to build and sustain open-source administrative software for higher eduction, by higher education. Kuali software is released under the Educational Community License. MIT OPEN COURSE WARE MIT OpenCourseWare (OCW) is a web-based publication of virtually all MIT course content. OCW is open and available to the world and is a permanent MIT activity. PHILOMINE PhiloMine is a drop-in extension to current releases of PhiloLogic, to support a variety of machine learning, text mining, and document clustering tasks. It is designed to work with databases currently loaded under PhiloLogic without further modification. Like PhiloLogic, PhiloMine is a Free Software implementation designed to support research and development activities at the ARTFL Project and the Digital Library Developement Center and the at the University of Chicago. SILVERLIGHT Microsoft Silverlight helps you create rich web applications that run on Mac OS, Windows, and Linux, providing a new level of engaging, rich, safe, secure, and scalable cross-platform experience. SOFTWARE ENVIRONMENT FOR THE ADVANCEMENT OF SCHOLARLY RESEARCH The Software Environment for the Advancement of Scholarly Research (SEASR), funded by the Andrew W. Mellon Foundation, provides a research and development environment capable of powering leading-edge digital humanities initiatives. SOPHIE 2.0 Sophie 2.0 is open source software for writing, reading and visualizing rich media documents in an interactive, networked environment. The program emerged from the desire to create an easy-to-use application that would allow authors to combine text, images, video, and sound quickly and simply, but with precision and sophistication. Sophie's users are interested in creating robust, elegant, networked, texts and multimedia works without having programming knowledge or training in the use of more complex and costly tools such as Flash.Sophie 2.0 was initially designed and developed by the Institute for the Future of the Book. In 2008, the University of Southern California's School of Cinematic Arts assumed sponsorship of Sophie 2.0 and, with a generous grant from the The Andrew W. Mellon Foundation, is significantly revising and improving a new 2.0 version to be released in the Fall of 2009. The Sophie 2.0 Project is being developed by Astea Solutions AD and additional contributors using a Java code base contributed to the project by Astea Solutions. TEXT CODING INITIATIVE The Text Encoding Initiative (TEI) is a consortium which collectively develops and maintains a standard for the representation of texts in digital form. Its chief deliverable is a set of Guidelines which specify encoding methods for machine-readable texts, chiefly in the humanities, social sciences and linguistics. Since 1994, the TEI Guidelines have been widely used by libraries, museums, publishers, and individual scholars to present texts for online research, teaching, and preservation. In addition to the Guidelines themselves, the Consortium provides a variety of supporting resources, including resources for learning TEI, information on projects using the TEI, TEI-related publications, and software developed for or adapted to the TEI. THOUGHTARK An open source, free web application and collaborative space that utilizes the search behaviors of the users to determine the value of various bibliographic resources. RESEARCH ENVIRONMENTAL SYSTEMS RESEARCH Founded as the Environmental Systems Research Institute, ESRI is built on the philosophy that a geographic approach to problem solving ensures better communication and collaboration. Geographic information system (GIS) technology leverages this geographic insight to address social, economic, business, and environmental concerns at local, regional, national, and global scales. HISTORY ENGINE The History Engine is an educational tool that gives students the opportunity to learn history by doing the work-researching, writing, and publishing-of a historian. The result is an ever-growing collection of historical articles or episodes that paints a wide-ranging portrait of life in the United States throughout its history and that is available to scholars, teachers, and the general public in our online database. OPEN JOURNAL SYSTEMS Open Journal Systems (OJS) is a journal management and publishing system that has been developed by the Public Knowledge Project through its federally funded efforts to expand and improve access to research. OJS assists with every stage of the refereed publishing process, from submissions through to online publication and indexing. Through its management systems, its finely grained indexing of research, and the context it provides for research, OJS seeks to improve both the scholarly and public quality of refereed research. PHILOLOGIC PhiloLogic™ is the primary full-text search, retrieval and analysis tool developed by the ARTFL Project and the Digital Library Development Center (DLDC) at the University of Chicago. This is a Free Software implementation of PhiloLogic for large TEI-Lite document collections. The wide array of XML data specifications and the recent deployment of basic XML processing tools provides an important opportunity for the collaborative development of higher-level, interoperable tools for Humanities Computing applications. The sophistication and power of the TEI-XML encoding specification supports the development of extremely rich textual data representations. WORLDCAT WorldCat connects you to the collections and services of more than 10,000 libraries worldwide. CITATION MANAGEMENT CONNOTEA Connotea: Free online reference management for clinicians and scientists. ZOTERO Zotero is a free, easy-to-use Firefox extension to help you collect, manage, and cite your research sources. It lives right where you do your work-in the web browser itself. ANALYTICAL RESEARCH METADATA OFFER NEW KNOWLEDGE (MONK) MONK is a digital environment designed to help humanities scholars discover and analyze patterns in the texts they study. The MONK project has been generously supported by the Andrew W. Mellon Foundation, from 2007-2009, and InCommon integration has been supported in 2009 by the CIC Library Directors. All code produced by the project is open source. MONK has a publicly available instance with texts contributed by Indiana University, the University of North Carolina at Chapel Hill, the University of Virginia, and Martin Mueller at Northwestern University. NEOFORMIX Discovering and Illustrating Patterns in Data: Blog editor who enjoys discovering the patterns in the apparent chaos of real life data and exploring new techniques for communicating in a visually compelling manner. Includes analytical project results, reviews of tools or techniques, and links to related resources. NVIVO Different than statistical or quantitative software, which analyze data using numbers, QSR software helps you to access, manage, shape and analyze detailed textual, audio and visual information. The NVivo 8 software product allows you to import, sort and analyze audio files, videos, digital photos, Word, PDF, rich text and plain text documents. WOLFRAM|ALPHA Wolfram|Alpha's long-term goal is to make all systematic knowledge immediately computable and accessible to everyone. We aim to collect and curate all objective data; implement every known model, method, and algorithm; and make it possible to compute whatever can be computed about anything. Today's Wolfram|Alpha is the first step in an ambitious, long-term project. Enter your question or calculation and Wolfram|Alpha uses its built-in algorithms and a growing collection of data to compute the answer. COURSE SUPPORT DEVELOPMENT CENTER FOR DIGITAL STORYTELLING An international not-for-profit community arts organization rooted in the craft of personal storytelling. We assist youth and adults around the world in using media tools to share, record, and value stories from their lives, in ways that promote artistic expression, health and well being, and justice. MOODLE Moodle is a Course Management System (CMS), also known as a Learning Management System (LMS) or a Virtual Learning Environment (VLE). It is a Free web application that educators can use to create effective online learning sites. Moodle.org is our community site where Moodle is made and discussed. Please explore and join in! PACHYDERM 2.0 Multimedia authoring for peanuts. Pachyderm is an easy-to-use multimedia authoring tool. Designed for people with little multimedia experience, Pachyderm is accessed through a web browser and is as easy to use as filling out a web form. Authors upload their own media (images, audio clips, and short video segments) and place them into pre-designed templates, which can play video and audio, link to other templates, zoom in on images, and more. Once the templates have been completed and linked together, the presentation is published and can then be downloaded and placed on the authoras website or on a CD or DVD ROM. Authors may also leave their presentations on the Pachyderm server and link directly to them there. The result is an attractive, interactive Flash-based multimedia presentation. PREZI Prezi is a living presentation tool... visualization and storytelling without slides. SAKAI The Sakai Collaboration and Learning Environment is developed by a community that strives to enable exceptional teaching, learning and research. Sakai collaborators - ranging from educators to engineers - share in their successes and challenges, honing the community's collective expertise to drive rapid development of this enterprise-ready platform. While Sakai is typically used for teaching and learning (similar to products like Blackboard and Moodle) we call it a Collaboration and Learning Environment (CLE) because it embraces uses beyond the classroom. Sakai is distributed as free and open source software under the Educational Community License. VISUAL UNDERSTANDING ENVIRONMENT The Visual Understanding Environment (VUE) is an Open Source project based at Tufts University. The VUE project is focused on creating flexible tools for managing and integrating digital resources in support of teaching, learning and research. VUE provides a flexible visual environment for structuring, presenting, and sharing digital information. WILLAMETTE INSTRUCTIONAL SUPPORT ENVIRONMENT WISE, the Willamette Instructional Support Environment, is a learning and collaboration system that provides course sites for official university courses and project sites for committee work, student organizations, collaborative research projects and other university-related activities. MULTIMEDIA THEORA Theora is a free and open video compression format from the Xiph.org Foundation. Like all our multimedia technology it can be used to distribute film and video online and on disc without the licensing and royalty fees or vendor lock-in associated with other formats. Theora scales from postage stamp to HD resolution, and is considered particularly competitive at low bitrates. It is in the same class as MPEG-4/DiVX, and like the Vorbis audio codec it has lots of room for improvement as encoder technology develops. Theora is in full public release as of November 3, 2008. UNESCO UNESCO : For Young Creators. A selection of free editing software for use in creative projects. Tools for editing audio, images and web pages.
2011年4月20日的电脑世界沙龙上,介绍了22款免费的数据分析相关工具,他们有关于数据清洗的、有关于数据展现的还有关于数据分析的;他们或是来自IBM,谷歌。雅虎这样的互联网企业,或是出自麻省理工,斯坦福这样的高校,有在线的也有离线的。如果你正为一些地理分析图片而赶到焦虑,或是为做不出漂亮的社交网络关系图而烦恼,或许下面这些工具可以帮到你。 数据清理类工具 DataWrangler Google Refine 统计分析类工具 The R Project for Statistical Computing TimeFlow 数据展现类工具 Google Fusion Tables Impure Tableau Public Many Eyes VIDI Zoho Reports 代码帮助类工具 Choosel Exhibit 地图相关数据展示工具 Quantum GIS (QGIS) OpenHeatMap OpenLayers 文本类相关处理工具 IBM Word-Cloud Generator 社交网络类工具 Gephi NodeXL