Notes on the Data2Vis - Deep Learning Paper

Written by Jaganadh Gopinadhan

Data visualization plays an essential role in Machine Learning/Data Science. Starting from Exploratory Data Analysis (EDA) to visualizing model metrics and production results visualization is required. Have we ever thought about using Machine Learning to generate visualization from the data? Maybe sometimes! If you ever used Microsoft Excel 2016 or Microsoft PowerBI or Tableau you might have sensed some sort of a reactive intelligence in recommending or enabling the visualization type. However, the paper “Data2Vis: Automatic Generation of Data Visualizations Using Sequence to Sequence Recurrent Neural Networks” by Victor Dibia and Çağatay Demiralp from IBM research narrates better story [1]. The paper reports result from a Deep Learning experiment, which tries to build a Vega-Lite visualization recommendation from a given data.

The paper is an attempt to report early results obtained from a Deep Learning Experiment.The objective is to generate a vega-Lite like visualization specification with the help of Machine Learning. The researchers used 4300 automatically generated Vega-Lite examples. These examples are from 11 distinct data sets. The training data was covering bar, area, circle, line, point and tick charts. The data transformations covered in the samples are aggregate, bin and time unit. The models were tested in the most popular Rdata sets and this data-set, or its visualization was not at all part of the training.

The team used a Sequence to Sequence recurrent Neural Net (seq2seq RNN) [2] for building the model. The seq2seq is very popular Deep Learning model for Machine Translation and allied Natural Language Processing tasks. The models were a two-layer bi-directional RNN based encoder-decoder with two layers. Each layer consists of 256 Long Short-Term Memory (LSTM) units. The team reports that they attempted with various options such as LSTM and GRU for the models and the LSTM’s provided better results compared to GRU. It is the first ever research paper reporting success of using Deep Learning techniques to generate visualization specification/grammar from raw data. Even though the authors report limitations to the current system such as lack of complex transformations and multivariable visualization the methodology is very promising.

The ground breaking innovations awaiting in this area with available public data-sets will be very promising. The world of visualization authoring will be transformed with AI enabled systems to make the self-service analytics more “Self Sufficient” in future.

Special Notes: The team provides a web application to have a first-hand feeling of the current algorithm results. You can test system with your data too. Check it at http://hci.stanford.edu/~cagatay/data2vis/

[1]Data2Vis: Automatic Generation of Data Visualizations Using Sequence to Sequence Recurrent Neural Networks, https://arxiv.org/abs/1804.03126, Accessed on May 20 2018.

[2] Sébastien Jean, Kyunghyun Cho, Roland Memisevic, and Yoshua Bengio. 2014. On using very large target vocabulary for neural machine translation. arXiv preprint arXiv:1412.2007 (2014).

Written on May 22, 2018

The Opinions Expressed In This Post Are My Own And Not Necessarily Those Of My Employer.

[ Deep Learning Data Visulization Machine Learning Artificial Intelligence ]