最近NCBI的数据格式由于空间缘故都转换成了*.sra格式,不再支持*.fastq.gz,因此需要一个特别的转化工具来转换下载的*.sra数据文件。 下载地址: http://www.ncbi.nlm.nih.gov/Traces/sra/sra.cgi?cmd=showf=softwarem=softwares=software 这里面包含了不同系统平台下的程序以及源代码。 转换命令 $ fastq-dump -A SRR_accession -D Path_to_SRR_Directory -O Output_Path 基本的命令参数 Command Description ‘-A’ or ‘--accession’ Enables modification of the output name used for the fastq files. For example: fastq-dump -A foo SRR000001 Will produce files named ‘foo.fastq’, ‘foo_1.fastq’, and ‘foo_2.fastq’ ‘-D’ or ‘--table-path’ Makes the archive path more explicitly specified, thus preventing confusion when more than option is specified. These two commands produce the same files: fastq-dump ~/SRR000001 fastq-dump -D ~/SRR000001 However, the first command below will fail while the second will succeed: fastq-dump -C ~/SRR000001 fastq-dump -C -D ~/SRR000001 (‘-C’ option is explained further below) ‘-N’ or ‘--minSpotId’ Minimum spot number at which to start the dump process ‘-X’ or ‘--maxSpotId’ Maximum spot number at which to stop the dump process For example: fastq-dump -N 5 -X 10 SRR000001 This command will dump six spots starting from spot ‘SRR000001.5’ and ending in spot ‘SRR000001.10’. Filtered spots can result in less than (maxSpotId - minSpotId + 1) total spots output. ‘-G’ or ‘--spot-group’ Boolean option that results in fastq files divided into spot groups as defined in the Experiment (or eventually Run) xml. This command: fastq-dump -G SRR051894 Produces these five fragment files: SRR051894.fastq SRR051894_GDSX2KN04_PSORIASISMDA-POOL-738_CB028-01WG.fastq SRR051894_GDSX2KN04_PSORIASISMDA-POOL-738_CB036-01WG.fastq SRR051894_GDSX2KN04_PSORIASISMDA-POOL-738_CD021-01WG.fastq SRR051894_GDSX2KN04_PSORIASISMDA-POOL-738_CD036-01WG.fastq ‘-T’ or ‘--group-in-dirs’ Boolean option directing the utility to produce fastq files in sub-directories rather than producing files within the same directory ‘-O’ or ‘--outdir’ Indicates the directory where the fastq result should be placed For example: fastq-dump -O /tmp -T SRR000001 will create a directory, SRR000001, in /tmp with this tree structure: tree /tmp/SRR000001 /tmp/SRR000001 |-- 1 | `-- fastq |-- 2 | `-- fastq `-- fastq ‘-K’ or ‘--keep-empty-files’ Has no effect - at one time this option would represent all three possible files even if one or two were empty ‘-M’ or ‘--minReadLen’ Allows specification of the desired minimum read length to output (default is 25). The command ‘fastq-dump -M 0 SRR000001’ prevents any filtering based on read length. ‘-W’ or ‘--noclip’ Prevents clipping of a spot sequence based on the right clip information. Toggling ‘show-clipped’ in the ‘customize’ area for reads in the SRA Run Brower enables observing the effect of this option (e.g. see SRR000001 ). ‘-F’ or ‘--origfmt’ Results in fastq containing only the original identifier on the defline (i.e. no length or SRR identifier are present) ‘-C’ or ‘--dumpcs’ Forces color space sequence to be dumped instead of base space. If the optional ‘cskey’ if provided (i.e. A, C, T, or G), then all fastq files produced will use that key at the start of each color space sequence. ‘-B’ or ’--dumpbase’ Forces base space sequence to be dumped instead of color space. ‘-Q’ or ‘--offset’ Allows using a different offset value to represent a different offset character in the fastq output. For example, using an offset of 64 represents using ‘@’ as the offset character. ‘-I’ or ‘--readids’ Appends a read index to the run identifier starting with ‘1’ as the first index. Note that this differs from the spot descriptor in the Experiment xml where the read indices start with ‘0’. In the case of SRR000001, the first spot in each file would have the identifiers ‘SRR000001.5.4’, ‘SRR000001.1.2’, and ‘SRR000001.1.4’. Note that the first spot sequence in SRR000001.fastq, the fragment file, comes from the second biological/application read which has an index of ‘4’. ‘-E’ or ‘--no_qual_filter’ This option turns off quality filtering based on leading/trailing low quality values. As reads have become longer this option has become a more viable alternative. ‘-SF’ or ‘--complete’ Outputs the separated reads into a single file. For example, the command: fastq-dump -SF SRR029338 Results in the first eight lines of the file, SRR029338.fastq, containing: @SRR029338.1 080115_EAS112_0034:8:1:615:780 length=36 GGTTGAGTAAAGTGTCTAAAGGCA TAGCCTGATTAT +SRR029338.1 080115_EAS112_0034:8:1:615:780 length=36 IIIIIIIIIIIIIIIIIIIAIIAI8I+7I9+II2I @SRR029338.1 080115_EAS112_0034:8:1:615:780 length=36 AAAGTCAAATTTGAATTGTTGTCA GCTTGTCAAAAT +SRR029338.1 080115_EAS112_0034:8:1:615:780 length=36 IIIIIIIIDIIIIIIIIIIIII.1F2II=8*2+//I In the case of 454 pair submissions, the second technical read (i.e. linker) is included in this single output file. ‘-DB’ or ‘--defline-seq’ Allows specification of the sequence defline format. For example: -DB @$ac.$si $sn length=$rl This specification produces the same output as the default output. See Appendix D for a more in-depth explanation. Note that submission of a ‘fastq-dump’ command to a compute farm (e.g. Sun Grid Engine) can require preceding a number of the characters with backslash characters when using this option. The above example might require this version: -DB @\\\$ac.\\\$si \\\$sn length=\\\$rl ‘-DQ’ or ‘--defline-qual’ Allows specification of the quality defline format. For example: -DQ +$ac.$si $sn length=$rl ‘-alt ’ Provides alternative output formats without have to indicate the individual options. Alternate ‘1’, the only option, results in this format for SRR029338_1.fastq: @SRR029338.1 080115_EAS112_0034:8:1:615:780/1 GGTTGAGTAAAGTGTCTAAAGGCA TAGCCTGATTAT + IIIIIIIIIIIIIIIIIIIAIIAI8I+7I9+II2I And this format for SRR029338_2.fastq: @SRR029338.1 080115_EAS112_0034:8:1:615:780/2 AAAGTCAAATTTGAATTGTTGTCA GCTTGTCAAAAT + IIIIIIIIDIIIIIIIIIIIII.1F2II=8*2+//I 转换*.sra 文件格式到SFF格式 $ sff -dump -A SRR_accession -D Path_to_SRR_Directory -O Output_Path Options: Command Description -O Allows user to specify an output directory. If not used, output will default to the current directory. -N Minimum spot ID to output. The first spot in the output will be the number given for this option. -X Maximum spot ID to output. The last spot in the output will be the number given. Min and Max spot options can be combined to output subsections of an SRR. -G spotgroup-file Split into files by SPOT_GROUP -T spotgroup-dir Split into subdirectories (of -O ) by SPOT_GROUP -L Log level: 0-13 or fatal|sys|int|err|warn|info|debug . (default: info) Set to ‘4’ to mimic the unix standard of no messages for a successful operation. -H Prints this help message and version information. 转换*.sra 文件格式到 Illumina native文件格式 $illumina-dump -path directory_containing_the_accession acces Command Description -D, --table-path Path to accession data. -O, --outdir Output directory. Default: '.' -N, --minSpotId Minimum spot id to output. -X, --maxSpotId Maximum spot id to output. -G, --spot-group Split into files by SPOT_GROUP (member). -T, --group-in-dirs Split into subdirectories instead of files. -K, --keep-empty-files Do not delete empty files. -L, --log-level Logging level: 0-13 or fatal|sys|int|err|warn|info|debug . Default: info -H, --help Prints this message Format options: Command Description -r, --read Output READ: seq. Default: on -q, --qual1 Output QUALITY, into single (1) or multiple (2) files: qcal. Default: 1 -p, --qual4 Output full QUALITY: prb. Default: off -i, --intensity Output INTENSITY, if present: int. Default: off -n, --noise Output NOISE, if present: nse. Default: off -s, --signal Output SIGNAL, if present: sig2. Default: off -qseq Output QSEQ format: qseq. Default: off\
背景: 之前有讲过可以通过pymol统计氢键的信息, 由于pymol对pdbqt的支持不好 ADT中的脚本pdbqt_to_pdb.py 只能实现pdbqts_to_pdb的功能 所以我写了个shell脚本,实现pdbqts_to_pdbs功能,用这个脚本前必须配置一些环境变量。 ======================================pdbqts_to_pdbs.sh #!/bin/sh #Function : convert the pdbqt format of ligand database to pdb format #author: Chen Zhaoqiang #email: 744891290@qq.com #date:2013.10.09 #required:vina_split in vina and pdbqt_to_pdb.py in adt #usage: pdbqts_to_pdbs.sh --input xxx.pdbqt if ;then echo usage: pdbqts_to_pdbs.sh --input xxx.pdbqt\n; fi if ;then cd ./temppp rm ./* -f else mkdir temppp cd ./temppp fi cp ../$2 ./ vina_split --input $2 rm ./$2 for file in $( ls ); do echo $file pdbqt_to_pdb.py -f $file -o $file'.pdb' echo $file rm $file done #modify the name to short rename .pdbqt.out_ligand ./* rename .pdbqt ./* let count=1 for file in $( ls ); do echo -e MODEL $count\n $2 cat $file $2 echo -e ENDMDL\n $2 rm $file let count+=1 done let count-=1 echo $count rename .pdbqt.out.pdbqt _out_$count.pdb ./$2 cp ./* ../ cd .. rm -f temppp -r =========================================================== 把这个脚本放到path中,实现和pdbqt_to_pdb.py同样的地位 #### pdb的库中必须要有MODEL ENDMDL的标签, chimera这个软件不识别pdb文件中*,*不能代替一般字母 如果这个脚本不能用,先测试vina_split这个命令能不能使用。
http://www.bugaco.com/converter/biology/sequences/index.php Sequence conversion Provided by Bioinf @ Bugaco Conversion map: ace to clustal ace to fasta ace to fastq ace to fastq-solexa ace to fastq-illumina ace to genbank ace to nexus ace to phylip ace to stockholm ace to tab ace to qual clustal to fasta clustal to fastq clustal to fastq-solexa clustal to fastq-illumina clustal to genbank clustal to nexus clustal to phylip clustal to stockholm clustal to tab clustal to qual embl to clustal embl to fasta embl to fastq embl to fastq-solexa embl to fastq-illumina embl to genbank embl to nexus embl to phylip embl to stockholm embl to tab embl to qual fasta to clustal fasta to fastq fasta to fastq-solexa fasta to fastq-illumina fasta to genbank fasta to nexus fasta to phylip fasta to stockholm fasta to tab fasta to qual fastq to clustal fastq to fasta fastq to fastq-solexa fastq to fastq-illumina fastq to genbank fastq to nexus fastq to phylip fastq to stockholm fastq to tab fastq to qual fastq-solexa to clustal fastq-solexa to fasta fastq-solexa to fastq fastq-solexa to fastq-illumina fastq-solexa to genbank fastq-solexa to nexus fastq-solexa to phylip fastq-solexa to stockholm fastq-solexa to tab fastq-solexa to qual fastq-illumina to clustal fastq-illumina to fasta fastq-illumina to fastq fastq-illumina to fastq-solexa fastq-illumina to genbank fastq-illumina to nexus fastq-illumina to phylip fastq-illumina to stockholm fastq-illumina to tab fastq-illumina to qual genbank to clustal genbank to fasta genbank to fastq genbank to fastq-solexa genbank to fastq-illumina genbank to nexus genbank to phylip genbank to stockholm genbank to tab genbank to qual ig to clustal ig to fasta ig to fastq ig to fastq-solexa ig to fastq-illumina ig to genbank ig to nexus ig to phylip ig to stockholm ig to tab ig to qual nexus to clustal nexus to fasta nexus to fastq nexus to fastq- solexa nexus to fastq-illumina nexus to genbank nexus to phylip nexus to stockholm nexus to tab nexus to qual phd to clustal phd to fasta phd to fastq phd to fastq-solexa phd to fastq-illumina phd to genbank phd to nexus phd to phylip phd to stockholm phd to tab phd to qual phylip to clustal phylip to fasta phylip to fastq phylip to fastq- solexa phylip to fastq-illumina phylip to genbank phylip to nexus phylip to stockholm phylip to tab phylip to qual pir to clustal pir to fasta pir to fastq pir to fastq-solexa pir to fastq-illumina pir to genbank pir to nexus pir to phylip pir to stockholm pir to tab pir to qual stockholm to clustal stockholm to fasta stockholm to fastq stockholm to fastq-solexa stockholm to fastq-illumina stockholm to genbank stockholm to nexus stockholm to phylip stockholm to tab stockholm to qual swiss to clustal swiss to fasta swiss to fastq swiss to fastq- solexa swiss to fastq-illumina swiss to genbank swiss to nexus swiss to phylip swiss to stockholm swiss to tab swiss to qual tab to clustal tab to fasta tab to fastq tab to fastq-solexa tab to fastq-illumina tab to genbank tab to nexus tab to phylip tab to stockholm tab to qual qual to clustal qual to fasta qual to fastq qual to fastq-solexa qual to fastq-illumina qual to genbank qual to nexus qual to phylip qual to stockholm qual to tab