AI has seriously improved its ability to generate the written word

181 阅读2分钟

Over the past year, AI has seriously improved its ability to generate the written word. By scanning huge datasets of text, machine learning software can produce convincing samples of everything from short stories to song lyrics. Now, those same techniques are being applied to the world of coding with a new program called Deep TabNine.

Deep TabNine is what’s known as a coding autocompleter. Programmers can install it as an add-on in their editor of choice, and when they start writing, it’ll suggest how to continue each line, offering small chunks at a time. Think of it as Gmail’s Smart Compose feature but for code.

Jacob Jackson, the computer science undergrad at the University of Waterloo who created Deep TabNine, says this sort of software isn’t new, but machine learning has hugely improved what it can offer. “It’s solved a problem for me,” he tells The Verge.

Jackson started work on the original version of the software, TabNine, in February last year before launching it that November. But earlier this month, he released an updated version that uses a deep learning text-generation algorithm called GPT-2, which was designed by the research lab OpenAI, to improve its abilities. The topplay update has seriously impressed coders, who have called it “amazing,” “insane,” and “absolutely mind-blowing” on Twitter.

One user, Franck Nijhof, an IT manager who works on open-source home automation software in his spare time, says he wasn’t just surprised by Deep TabNine — he was scared, in a pleasant fashion. “The first hour I used Deep TabNine was not helpful [because] I was continuously stopped by amazement trying to wrap my head around it,” Nijhof told The Verge over email. He kept asking himself, “How does it know that? But how?”

Autocompletion tools like this aren’t new, but Nijhof says Deep TabNine’s suggestions are just much more accurate. “I’ve tried some smart ‘universal’ ones in the past, but they were annoying and not helpful,” he says. “TabNine is undoubtedly a game-changer.”

The software offers better suggestions because it works on a predictive basis, says Jackson. Most autocompleters have to parse what the user has already written to make suggestions, working through their code like you would work through the steps in a mathematical formula. Deep TabNine, by comparison, relies on the ability of machine learning to find statistical patterns in data to make its predictions.

In the same way that text generation algorithms are trained on huge datasets of books, articles, and movie scripts, Deep TabNine is trained on 2 million files from coding repository GitHub. It finds patterns in this data and uses them to suggest what’s likely to appear next in any given line of code, whether that’s a variable name or a function.

Using deep learning to create autocompletion software offers several advantages, says Jackson. It makes it easy to add support for new languages, for a start. You only need to drop more training data into Deep TabNine’s hopper, and it’ll dig out patterns, he says. This means that Deep TabNine supports some 22 different coding languages while most alternatives just work with one.

(The full list of languages Deep TabNine supports are as follows: Python, JavaScript, Java, C++, C, PHP, Go, C#, Ruby, Objective-C, Rust, Swift, TypeScript, Haskell, OCaml, Scala, Kotlin, Perl, SQL, HTML, CSS, and Bash.)

Most importantly, thanks to the analytical abilities of deep learning, the suggestions Deep TabNine makes are of a high overall quality. And because the software doesn’t look at users’ own code to make suggestions, it can start helping with projects right from the word go, rather than waiting to get some cues from the code the user writes.