FRDCSA | git codebases | bashreduce

[Project image]

Jump to: Project Description

Project Description

We have a new bottleneck: we're limited by how quickly we can partition/pump our dataset out to the nodes. awk and sort begin to show their limitations (our clever awk script is a bit cpu bound, and @sort -m@ can only merge so many files at once). So we use two little helper programs written in C (yes, I know! it's cheating! if you can think of a better partition/merge using core unix tools, contact me) to partition the data and merge it back.

This page is part of the FWeb package.
Last updated Sat Oct 26 16:59:50 EDT 2019 .