Weeks_12and13

(source: http://learnyouahaskell.com,
cc Miran Lipovača)

#12th Week

The last week before the pencils down! As a small milestone I would like to check again the code to see if there are still possibilities which may improve the space consumption.

##Work To Do

Look at the code and try to do some more improvements with small changes.

##Work Done

###New cache structure The program reads its input from a previously created cache. Since the input is relatively large, a reoganization of the cache concerning its file structure could help improving the running time of the program. We reorganized the cache structure and tested its functionality. The cache has to be created once to be able to use the data as a reference when looking for homologous sequences. Creating a new cache takes a while because of the large dataset. That's why we weren't able to test the program's running time yet.

###Last changes to improve running time and memory consumption

This week's work was similar to the one of the first week, since I did many profiling runs to see what could still be improved and what data structures have the most memory consumption. Small changes which could improve the profiling results were included in the code, e.g.

performGC which forces the compiler to do garbage collection. This will reduce the memory consumption but needs some more time
rdeepseq to force the program not to be too lazy
some additional changes in the Judy library

The plot shows time and space profiling values of different profiling runs. In test 4, performGC was included which decreases the memory consumption a lot bit increases the running time a bit. Test 5 includes parBuffer, as it was described in week 11. This increases the space needed but decreases the running time. The best values combination I got in test 6 where parMap was included instead of parBuffer.

The following profiling statistic is the one showing running time and space consumption of test 6.

1,119,256,941,184 bytes allocated in the heap  
415,229,011,624 bytes copied during GC  
9,744,952,272 bytes maximum residency (356 sample(s))  
1,449,974,344 bytes maximum slop  
22476 MB total memory in use (0 MB lost due to fragmentation)  

                                Tot time (elapsed)    Avg pause    Max pause  
Gen  0     249748 colls,     0 par   83.21s   90.63s     0.0004s    0.1907s  
Gen  1       356 colls,     0 par   144.68s   188.41s     0.5292s    8.6086s  

TASKS: 4 (1 bound, 3 peak workers (3 total), using -N1)  

SPARKS: 327 (0 converted, 0 overflowed, 0 dud, 0 GC'd, 327 fizzled)  

INIT    time    0.00s  (  0.00s elapsed)  
MUT     time  451.24s  (470.36s elapsed)  
GC      time  227.89s  (279.03s elapsed)  
RP      time    0.00s  (  0.00s elapsed)  
PROF    time    0.00s  (  0.00s elapsed)  
EXIT    time    0.01s  (  0.01s elapsed)  
Total   time  679.14s  (749.41s elapsed)  

Alloc rate    2,480,424,747 bytes per MUT second  

Productivity  66.4% of total user, 60.2% of total elapsed

The following space profiling plot is based on the program used in test 4.

#13th Week

This is the last week of GSoC 2014 and since the pencils down date is on Monday this week, the last blog entry will be a small resume about what I did the last three months.
The plot shows the development of running time and space consumption over the weeks. The left y-axis shows the time in seconds, the right one the memory consumption in Gb.
As a rough calculation between the first and the last weeks' results, the running time decreased from around 31 minutes to 11 minutes and the memory consumption was lowered from around 53 Gb to 22 Gb. Regarding the trade-off between running time and space consumption one of both parameters can be decreased further while the other value will increase.

The final version of the code can be found here.
The profiling runs are done using the parameters +RTS -s -H1G -A4M -N .

#References

Back to main page.
Previous weeks.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Weeks_12and13

Clone this wiki locally