Line.strip() # Here you preprocess your data If your needs processing, you could write a generator funtion: def gene_yielder(filename): For example: whole_coding = open('big_file', 'rt').readlines() # Will consume memoryīut for gene in open('big_file', 'rt'): # will not read the whole thing into memory first But, if it is a file, make sure you do not read the whole file, and then iterate over the memory object. Second: maybe you can avoid getting the whole set into memory by using a generator? I do not know where your 'whole_coding' object is coming from. A nice howto (ignore the legacy design :-) ) can be found here I know there are other tools for alignment, but they mainly can just write the score in output file which need to be read and parsed again for retrieving and using the alignment scores.Īre there any tool which can align the sequences and return the alignment score inside python environment as pairwise2 does but without memory leakage?įirst, I used BioPython's needle for that. Job 4945543.1 died through signal XCPU (24) Result returned from supercomputer: Max vmem = 256.114G #Memory usage of the script Whole_coding_scores=alignment/min(len(whole_coding),len(whole_coding)) I have thousands of DNA sequences ranged between 100 to 5000 bp and I need to align and calculate the identity score for specified pairs.īiopython pairwise2 does a nice job but only for short sequences and when the sequence size get bigger than 2kb it shows severe memory leakage which leads to 'MemoryError', even when 'score_only' and 'one_alignment_only' options are used!! whole_coding_scores=įor genes in whole_coding: # whole coding is a <25Mb dict providing DNA sequencesĪlignment=(whole_coding,whole_coding,score_only=True,one_alignment_only=True)
0 Comments
Leave a Reply. |
Details
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |