Recently returned back from the 2nd Conversational Intelligence Summer School organized by the Text Machine Lab of UMass Lowell and iPavlov lab from MIPT, Moscow. The School took place in Lowell, MA, USA, one of the first US industrial cities with remaining spirit of the industrial revolution.
Content-wise the school was pretty packed with fundamentals of NLP as well as bleeding edge of the progress in dialogue systems. We had 5.5 days of lectures, hands-on tutorials, invited talks, different cuisines, and, of course, team projects!
The first day was dedicated to getting aspiring folks on one track. Folks could refresh their knowledge of basic neural nets and words representations. There were two tutorial tracks similar with the content but different in implementation libraries - one for PyTorch and another one for DeepPavlov. I chose the PyTorch track, and D1 tutorials were all about introduction and basic FFNN architectures.
The first keynote was delivered by Sasha Rush from Harvard where he discussed the details of conditional text generation for a manifold of dialogue scenarios. I got especially impressed by the advancements in the question answering over tabular data. Namely, when you have tables in PDF or other formats and want to perform QA over those sources without setting up a database. That’s something companies are dreaming to have!
A brilliant idea of the organizers is to have a one-hour informal networking session where you could switch from a listener-mode to a more relaxing-mode (I wish I could say ‘no ties’ but it was super-hot all week and nobody ever wore a tie) and approach the speaker personally with your questions.
The second day started with a refreshment on convolutional and basic recurrent nets followed by the tutorial on sentence classification with word embeddings. You see, already on the second day you could build a decent sentence classification (definitely in top-5 most sought after text analysis tasks in industry)
The keynote by João Sedoc from University of Pennsylvania was dedicated to evaluation of dialogue agents. Right now in the literature there clearly exists an abundance of various metrics of how researchers evaluate quality of generative dialogue agents, i.e., when answer is not taken from some template but rather sampled from a latent distribution. Lots of authors invent their own metrics and report state-of-the-art on those in order to be accepted at top conferences. The question, however, is not just about metrics but about the methodology and evaluators. You’d probably agree that an agent considered polite in the US might seem as a totally arrogant piece of machinery in other cultures. So that João’s team released ChatEval, a tool that compares in a blindfold manner your agent against another agent and human responses. I’m convinced the the tool would allow for fighting bias in the data, promoting diversity in generated responses, and tweaking certain agent’s traits depending on a user.
D3 aimed at presenting bread & butter of current NLP research and practitioners: Transformers, Transformer-based architectures, and contextualized embeddings. Yes, it was all about ELMo, BERT, GPT and GPT-2. In the tutorial session we had some hands-on experience building seq2seq dialogue agents with attention.
The third keynote was delivered by great Kate Saenko from Boston University. She covered topics on the edge between computer vision and NLP, e.g., visual QA, enhanced label generation (for instance, “a quick brown fox jumps over a lazy dog” instead of “a fox and a dog”). It’s worth noting that Kate’s team proposed an early-fusion model to join visual and text representations, and the results are pretty compelling. What I particularly took home after the talk - CV folks already have a set of pre-trained models of varying complexity that you could plug into your task according to your requirements - still a long way for the NLP community. Additionally, joining multi-modal representations might lead to some really good results - I’d like to test that hypothesis with graph embeddings.
The fourth day started advanced topics (well, if monstrous deep multi-head attention transformers were not that advanced for you), namely memory networks, or how to keep and utilize context knowledge and/or knowledge obtained from a user. In the second part of the day we delved into Deep QA topics The tutorial went into details on fine-tuning BERT for different tasks - from question answering till natural language inference.
Event of the day was, however, a tour around Lowell! A boat trip, trolley rides, and a visit to an old mill (still manufactures by the way). Kudos to the scouts - I like listening to stories and anecdotes about places we travel to.
On the fifth day of Christmas I got an impression, the fifth day was anticipated most. We started with a discussion how to build and evaluate multi-skill conversational agents. That is, they can integrate both goal-oriented skills (a-la “hey Google, start my robot vacuum in the living room”) as well as chit-chat that has to be entertaining enough and not boring. The topic is closely related to natural language generation and dialogue state tracking. Secondly, we delved into hierarchical models that are actually able to grasp the context of a dialogue and memorize on a higher level peculiarities of a conversation. And finally, we studied approaches for improving dialogue diversity, e.g., with a latent variable from variational autoencoders (VAE) or Z-forcing. Those methods are pretty much state of the art, so attendants have no excuse now not to read the most recent papers built on top of those
The last keynote by Jason Weston from Facebook was about putting together the threads of conversational AI. Jason recognizes four main dimensions of improving interaction between humans and conversational agents:
- Engaginess - for agents to have personality, curiosity and diversity, ideally without tons of hard-coded rules
- Expertness - for agents to be knowledgeable and correct, especially relevant for goal-oriented domains
- Multimodality - for agents to be able to interact not only via text but understand other media types, e.g., images and sounds
- Continuous learning - for agents to grow and learn from experience, it is also about enabling agents to ask more questions, not just asking/reacing on user input
Well, it is admittedly still a way to go for conversational AI to accomplish those tasks. Let’s build more datasets! Actually, a new resource Beat The Bot from ParlAI has recently been released as a game for humans to play a character with certain traits and at the same time as a training dataset for conversational agents.
Half of Saturday was dedicated to wrapping things up and presenting the results of team projects. Folks managed to create spectacular and entertaining agents:
First place: Visual Dialog Game by Thomas Depierre, Sashank Santhanam, Mark Seliaev, Jon Ander Campos. The bot gives you four similar pictures (e.g., of trains) and memorizes one which user has to guess by asking questions against a picture, for instance, “is it a green train?”. R-CNN with attention allows to actually visualize a relevant part of a picture from a user question (e.g., highlight a green train). Sometimes a mischievous bot answers not what you’d expect :)
Second place: The Oracle bot by Beatriz Albiero, Estevão Uyrá, Manuel Ciosici. Pretty simple but very cool idea - given a user name, OpenAI GPT-2 model generates a life story, and then this story is fed as a context into a SQuAD-pretrained QA-system which can answer questions about your “future”. Works pretty well actually, and generates lots of puns - some girls realized they are going to be fathers by the end of the year and some
The project of our team aimed at improving question answering over knowledge graphs, namely, Wikidata in our case. Among some novelties we employed a zero-shot learning approach for predicting unseen relations (having trained on 120 relations we’d like to scale it up to 5000) and Wikidata graph embeddings published recently by PyTorch-BigGraph. I am really grateful to my team members for hard-working days and nights after the lecturing program
All the keynotes, slides and code are available in the github repo. Be sure to check them out! Videos are coming soon on YouTube for those who couldn’t attend personally. Many thanks to the organizing team for a great week full of super-interesting talks, delicious food and broad networking opportunities! See you next time