Lessons on Deep Learning

Originally published on the Clarify.io blog. View archived copy.

I just got back from what is now, without a doubt, my favorite conference: Re-Work’s Deep Learning Summit. This one was held in Boston — so we had a lot of MIT types in attendance — which made it a lot of fun.

You’re probably thinking: “A deep learning conference. Fun? Really?”

Absolutely.
Here’s the thing: Re-Work conferences aren’t academic or research conferences and they aren’t commercial conferences. They’re both, at the same time.

Blending Academic & Commercial Conferences

Tara Sainath of Google Academic and commercial conferences are the yin and yang of idea exchange. Re-Work has figured out how to bring the two together to create something truly interesting.

Before I go on, I have a confession to make: When I went to my first Re-Work conference, I resisted it. I resisted the duality. I didn’t understand why they’d have speakers that talked right past half the audience. I wanted a less interesting conference like the ones I was used to I complained it was neither fish nor fowl.

And then I understood what should have been obvious from the start: they’re ahead of the curve and I was being a conservative stick-in-the-mud. Once the genius of Re-Work hit me, I realized that I was going to have to up my game — just like everyone else — in order to truly participate.

So what’s so great about putting researchers and vendors in the same room?

Tension!

The researchers call bullshit when vendors make ridiculous marketing claims and vendors call bullshit when academics present results that could only have been achieved under perfect lab conditions with pristine data. A breath of fresh air!

Beyond that, researchers have to work hard to communicate with vendors while presenting enough substance to satisfy their peers. This is difficult, because the people who represent these vendors at conferences rarely understand the low-level technology in their products.

And those same people can’t get up and make ridiculous product claims. They need to talk about how they leverage technology, and they have to reveal some of their future plans. This gives the researches some idea of the targets they should be trying to hit. For obvious competitive reasons, this is difficult.

But overcoming these difficulties is absolutely necessary, and Re-Work is right to bring these communities together. The daylight between academic research and commercialization is disappearing. It used to take years for ideas in research papers to make it into products. Now it takes months. At Clarify, we read every academic paper in our field, and we’re capable of folding the most promising ideas into our platform within weeks.

We can do that because we don’t sell shrinkwrapped software or a piece of hardware.

We make our software available via an API. Behind our API we run a complex system that can easily accommodate processing solely for the sake of research.

For example, if we read about a new summarization algorithm, we can implement it and deploy it in parallel within our existing algorithm. Our customers will continue to get the same output while we examine the difference between the two. If the new algorithm is better, we can flip a switch and replace the original. Without any change to their software, our customers immediately benefit.

This is powerful and it very clearly underlines why we – a commercial entity – want close relations with the academic community. The opposite is also true.

The idea that ivory tower academics don’t care about commercialization of their work is complete nonsense. Everyone wants to see their work have an impact on the world. Beyond the satisfaction of seeing their work used, academics care about two very concrete things that the commercial world can give them: money and data.

The need for money is obvious, but the need for data is critical.

Deep learning algorithms learn from data. Without it, the best algorithms are worthless. For some problems, we have training corpora, but for others, we don’t. That data has to come from the people who collect it as a side-effect of their commercial activity. And even when training corpora exists, it only gets you so far. It tends to be clean and well-organized. Unfortunately, the real world data isn’t like that. It’s messy, miscategorized, and incomplete.

So while the latest computer vision algorithm may be reporting shockingly good results when run against standard data sets, it might not do so well against arbitrary images from Flickr. The people who write those algorithms want to test them on real world data, and the people who sell products built on those algorithms need them to be effective on any data they process. Close collaboration between researchers and vendors is vital.

Why Re-Work is Important

If the research and commercial communities must exist in ever closer symbiosis, conferences would be foolish to not work the same way. Too often, they don’t but Re-Work is showing how it can be done. They organize conferences in the high level subject matter – like deep learning and IoT – but then apply it to the real applications like connected cities, virtual personal assistants, and robotics.

These conferences might make us uncomfortable today, but we’d better get used to them fast. They are the future.

Re-Work: Deep Learning Summit, Boston