For the past few years I have acted as a “data host” for Benjamin Bach & Dave Murray-Rust’s annual Data Fair, which they run for students on the MSc in Design Informatics here at Edinburgh. This involves bringing a dataset (or two or three) for a small group of students to work with, and producing a ‘data brief’ which explains the form and extent of the dataset, why it is interesting, particular challenges they may encounter while working with it and so forth. The students start by analysing the dataset and then must find ways to visualise it and tell the ‘data stories’ within it, in consultation with the data host who as the domain expert can help to orient them to the most interesting findings.

This year I was working with four students, Esteban Serrano, Oliver Ford and Xinyu Du, on a tranche of international trade law documents collected by the ToTA: Text of Trade Agreements project. They used word embeddings, topic modelling and network analysis to do some exploratory work investigating the similarities between these long and complicated documents. The work was accepted to the DH2019 conference as a poster: the abstract is available on the conference website here, and the poster itself can be downloaded by clicking the image to the right. There are some additional visualisations on the students’ website.

Students I have worked with over the years have come up with some beautiful and compelling ways to present the data visually, and if you are Edinburgh-based and have buckets of data lying around I would recommend signing up.

Creative Commons License The material on this site is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License. Please also drop me a line at the address above to let me know what you’ve done with it. Thanks!