The ultimate cop-out.

The ultimate cop-out.

Computers are useless. They can only give you answers. 
—Pablo Picasso

If you understand why this makes sense, then you can skip this post. Otherwise, read on.

A few years ago my boss asked a bunch of us to come up with an optimized schedule planner for the customer support team. You've probably run into this type of problem before: There are 30 employees, and you need to schedule them to work 40 hours a week, in shifts of no more than 8 hours long, with at least 12 hours in between shifts (because they have lives to lead), and a 30 minute break per shift, such that at all times at least 5 employees are working every hour of the week. Also no back to back night shifts. Also no back to back weekend shifts. Also etc...

As I was thinking about this, something about it felt like coming up with the optimal answer was going to be near impossible. Maybe I could come up with a useful approximation. And maybe it'll change over time, but a general solution somehow felt out of my reach. 

It turns out that this type of problem is part of a class of problems which are very well known and well studied in computer science, called NP-complete problems. What I wish I'd known at the time, was that NP-complete problems are as yet unsolvable in an efficient way. And its unclear if an efficient solution is even possible (I'll talk about what efficient means in a minute). I could have just told that to my boss and started working on something else. Instead, we toiled.

But let's take a quick side step and unpack what I mean.

Setting the scene: We care only about Decision problems. That is, problems in which the output is either YES or NO.

Basically every type of problem can be reframed as a decision problem. For example:

Traveling Salesman: What's the fastest route?

Traveling Salesman Problem (TSP): Given a collection of cities connected by highways, what is the shortest route that visits every city and returns to the starting place? We can ask this like Is there a route that visits every city and returns to the starting place in under X hours?

Decision problems are easier to think about so we'll stick to those.

The Class P (polynomial time)

We say that the class of problems that can be solved efficiently are in the class P. By efficiently, we mean in polynomial time, which is to say if you started running the algorithm, your computer will produce a result which is correct in a matter of seconds, days, or maybe months, depending on the complexity of the problem. 

That's right! I made this in Clojure!

Examples of problems in P are:

  • Multiplying two numbers together
  • Finding the largest number in a list
  • Searching for a name in a list of names

We like problems in P.




The Class NP (non-deterministic polynomial time)

This is a class of problems for which we can't solve efficiently but whose solution, if correct, we can verify efficiently. For example, no one has yet to prove that the traveling salesman problem (TSP) is solvable efficiently (note that no one has proved that it can't be solved efficiently either). Therefore we say that TSP is not in the class P, because it's harder. However, if I gave you a route for TSP that I claim is a solution, then you can check whether or not I'm lying very fast. Just plug it in and see for yourself. So we can't solve NP problems efficiently but we can check them.

As I said above, TSP has not been shown to be solvable efficiently and its not been shown that it can't be solved efficiently either. We still don't know, though people are highly incentivized to figure it out. This uncertainty is called the P vs. NP problem. Basically, if one problem in NP can be solved efficiently, then we say P = NP, and ALL problems in NP can be solved efficiently. This is a big deal. Imagine if there was no problem that computers couldn't solve in a matter of minutes (or days even).

  • Public-key cryptography is made obsolete (bye bye to your security and passwords)
  • Transportation routing becomes optimal (a dream for Fedex and the Post Office)
  • Every industry that has resource allocation issues becomes optimized (basically you'll rarely have to wait for anything. Ever)
  • Weather and earthquake predictions become very accurate
  • Vision recognition and language comprehension tasks become trivial (welcome to a robot world)

And of course, schedule planning becomes easy. Basically every hard problem that Google, Facebook, Amazon, the NSA, DoD, CERN, etc... are trying to solve becomes solvable.

However, before you get too excited, note that the consensus among experts is that P probably doesn't equal NP.

The Class NP-hard

These are the class of problems which are as hard or harder than the hardest problems in NP. I won't say more about NP-hard problems than that.

The Class NP-complete

This is the intersection of NP and NP-hard problems. That is, if I can efficiently verify a solution to a problem but that problem also shown to be NP-hard, then we call it NP-complete.

Here's the rub:

To show that your problem is NP-hard involves reducing it to an NP-hard problem. Reduction is a well studied procedure in computational complexity theory, however its details are pretty hairy for a blog post. In effect you're showing that both your problem and some well known NP-hard problem are essentially the same thing.

Can you see where I'm going with this?

If you're given a problem that you can reduce to an NP-hard problem (or look online somewhere for someone else that's done it), then you've shown that no one in the field of computer science has been able to prove that its efficiently solvable.

So what chance do you have? Just explain this post to your boss and let her know that you'll be taking a lunch break instead!

Happy eating!

The cost of being Greedy

The cost of being Greedy

Learnings from teaching data science

Learnings from teaching data science