Efficiently Summing the Highest Values in an Array With JS

Earlier today I ran into a problem posed as an interview question. At a glance it seems simple, but it's an open ended question. It leaves room for programmers to express different solutions and degrees of aptitude.

The problem went a bit like this:

Write a function that efficiently takes an array of integers and returns the sum of the two largest integers in the array. The array can be made up of any valid integers, and the length is unknown (but could be very large).

The problem could be solved with either PHP or JavaScript, so I chose JavaScript.

My first thought was that this could be done two ways (that I know of) which I could implement properly in the test scenario. One is to sort the array in descending numeric order and pluck the top 2 elements. The other was to use a for loop to run through the array and continuously pull larger values until I hit the end and had the top 2 remaining. What's better, and why? What's the context of the usage cases? Does it need to scale?

Well, the problem states the arrays 'could be very large'. Since I had a (self imposed) time limit I decided to implement something:

  • Concise
  • Reasonably fast
  • Easy to understand

This would be trivial using Array.prototype.sort().

The Initial Solution

/**
 * Sums the two largest numbers in an array of numbers.
 *
 * @param  {Array} numbers An Array of Numbers
 *
 * @return {Number}        The sum of the two largest Numbers
 */
var sumTwoLargestNumbers = function (numbers) {  
    // Ensure the array is sorted large -> small
    numbers = numbers.sort(function (a, b) {
        return b - a;
    });

    return numbers[0] + numbers[1];
};

sumTwoLargestNumbers([20, 5, 13, 7, 200]);  
// > 220

It's consice and clean, and it appears to work, so what's wrong with it? Well, I missed something pretty important:

return numbers[0] + numbers[1];

I'm adding numbers[0] and numbers[1] without ensuring those indexes will be there. That's an easy fix by adding a condition like so:

/**
 * Sums the two largest numbers in an array of numbers.
 *
 * @param  {Array}  numbers An Array of Numbers
 *
 * @return {Number}        The sum of the two largest Numbers
 */
var sumTwoLargestNumbers = function (numbers) {  
    // Is the array long enough to sum?
    if (numbers.length < 2) {
      throw new Error('Expected an array with at least 2 elements.');
    }

    // Sort the array large -> small
    numbers = numbers.sort(function (a, b) {
        return b - a;
    });


    return numbers[0] + numbers[1];
};

sumTwoLargestNumbers([20, 5, 13, 7, 15]);  
// > 35 (20 + 15)

sumTwoLargestNumbers([5]);  
// > Error: Expected an array with at least 2 elements.

That seems better, but it has a big problem still.

Inefficiency With Array.prototype.sort()

This appears to be a nice solution because it is, for the developer, highly practical to write, maintain, and use. If you'd only ever use this function on small arrays I think you could stick with it indefinitely.

The thing is, it being concise and easy to understand is misleading. I haven't reduced the amount of work done by writing less code. Instead I've increased it by outsourcing the workload to a pretty heavy algorithm. It's doing work the function doesn't imply it should do nor even needs to do; sorting the array is pointless outside of convenience. This method only simplifies my job by abstracting labour to a worker which isn't doing the right work for our task. That seems like bad programming, and it is.

The Second Solution

Later in the day, the person who posed the initial test question asked why I chose sorting over using a for loop. I didn't have a great answer apart from what is overtly beneficial about the sort solution. It's easy, it's clean, and it works.

Needless to say it became pretty clear that I should reconsider my approach. I wasn't necessarily wrong to choose sort, but I also wasn't right either. I needed a better answer and a better understanding of the problem. I sat down and wrote out the following for loop solution in around 20 minutes. My sorting function took a total of about 15, for comparison.

/**
 * Sums the two largest numbers in an array of numbers.
 *
 * @param  {Array}  numbers An Array of Numbers
 *
 * @return {Number}        The sum of the two largest Numbers
 */
var sumTwoLargestNumbers = function (numbers) {  
    var largest,
        secondLargest,
        length = numbers.length,
        // Start i at 2 so we skip the first indexes we've already used for
        // largest and secondLargest
        i = 2;

    // Is the array long enough to sum?
    if (length < 2) {
        throw new Error('Expected an array with at least 2 elements.');
    }

    // Assign our starting values
    largest = numbers[0];
    secondlargest = numbers[1];

    // Ensure we're starting with largest and secondLargest properly arranged
    if (largest < secondLargest) {
        largest = numbers[1];
        secondLargest = numbers[0];
    }

    // Loop through the numbers
    for (; i < length; i++) {
        // If the new number is greater than largest, assign it to largest and
        // pass largest's value to secondLargest.
        if (numbers[i] > largest) {
            secondLargest = largest;
            largest = numbers[i];
        }
        // If the new number isn't greater than largest, check if it's still
        // greater than secondLargest.
        else if (numbers[i] > secondLargest) {
            secondLargest = numbers[i];
        }
    }

    return largest + secondLargest;
};

sumTwoLargestNumbers([20, 5, 13, 7, 15]);  
// > 35 (20 + 15)

sumTwoLargestNumbers([5]);  
// > Error: Expected an array with at least 2 elements.

It turns out the code isn't that much larger or gross to look at. My concerns about writing an excessively verbose test were unnecessary.

The logic from the top down is straight forward and we get the same behaviours as we did with sort. Now we also have the benefit of only looking at each number two times at most here if (numbers[i] > largest) then else if (numbers[i] > secondLargest). Our worst case scenario for performance is drastically better than the sort solution.

O(n) vs. O(n log n)

When we iterate through our array with a for loop, we have the benefit of knowing that if summing an array of 100 elements takes 5ms, then summing 500 elements should take something like 25ms. This is because the complexity of the task doesn't increase at all. Our work load has increased proportionally so if we have five times the work to do, we can expect our execution time to increase by around five times. This is O(n) and it's nice to have when possible.

When we use Array.prototype.sort(), we're passing the workload to the browser's implemented sorting algorithm. As I understand, most browsers use the merge sort algorithm which has a complexity of O(n log n). It's guaranteed to be significantly slower than using a for loop in my function.

What Are the Implications?

Unless this function is a tiny one-off to do a very simple summing of values, you just shouldn't use the sort method.

That's not to say native sorting is inherently slower than using for. In my case, it's only faster to use for because we're avoiding a lot of unnecessary work by not sorting the rest of the array. All we do is continuously flip through our elements and remember the biggest ones we saw. That's a simple task.

I'm sure the sort algorithm is implemented extremely well. It's just the wrong tool for the job.

Profiling Our Solutions

I decided to take my experiment to jsperf and see what the difference is. It's clearly not a well formed test as it could benefited by far more samples and test variations... But its initial results are incredibly telling. The sort method is ~97% slower!

I can say that with a smaller array, perhaps with 5 to 10 elements, we'd probably see results with it being more like 65% slower. This is on account of the sort algorithm's workload not increasing proportionally as the for loop's workload does.

That's probably acceptable for small tasks in the client. But the test above uses only 816 elements - What if you needed to sum 10,000 elements this way? It's clear that the for loop would be essential for getting great performance from this type of function.

Lesson Learned

Don't be afraid to get your hands dirty, even if the problem seems too simple to warrant a more complex solution. It's always worth considering or even pursuing other options.

Steve Adams

I'm a web developer and designer at work, but I try to spend a lot of my time with my kids and girlfriend. I love weight lifting, running, cycling, climbing, electronics, and woodworking.

Victoria, BC