Histogramming based large scale parallel sorting algorithm

We design a new large scale parallel sorting algorithm that combines sampling and histogramming. We prove sound guarantees about the algorithm and also show that it is essentially optimal in a sense. We implement it in a highly parallel astronomical application and show runtime benefits from our algorithm.