A summer of open data

This is a guest post by Henna Silvennoinen, Mira Tengvall and Edith Villegas Garcia, who spent their summer at CERN, working on CMS Open Data.

The signs are in the air. And they couldn’t be more obvious. People wandering in tunnels, long queues in the canteen after the morning lectures, the local supermarket’s parking lot crowded with white bikes that have blue stickers on them…

Summer has arrived at CERN – and so have hundreds of students like us!

img
Mira, Edith and Henna with a CMS umbrella, in front of an LHC magnet at CERN

Spending a couple of months at CERN would be an inspirational experience for any student. In our case it’s been even more than that. We have been working on a project that not only allows us to learn a lot and use our knowledge in creating something new but also gives us the chance to access all of it after we are back in our own countries – CMS Open Data, available on the CERN Open Data Portal. We were not very familiar with CMS Open Data when we first arrived here. After we were shown where to look for data and given some tools to handle it, the rest was pretty easy to figure out by ourselves. Having someone to guide us in right direction was very helpful for us, so we thought you might appreciate it as well. Let us share something we’ve been working on!

Just before we go on, stop and think about data for a moment. We are surrounded by it, aren’t we? Not just the ones used for entertainment but also statistics, news, emails… You name it. But do students know how to handle all that data? Do you? Data is being produced all the time: for example the annual data output of the Large Hadron Collider (LHC) is about the same size as all the videos uploaded to YouTube per year. You can imagine the effort that would be required to analyse it manually. Therefore we thought it would be important to create some material that would help students to learn basic data analysis skills and to get familiar with scientific data too.

Since we have been studying physics, education and mathematics, we were eager to find ways to combine all three subjects with CMS Open Data. One easy way to do this was using Jupyter Notebooks. Jupyter is an application which enables creating documents that combine regular text, images, mathematical equations and programming code. With Jupyter you can easily import and explore data downloaded from the CERN Open Data Portal. And learn some Python on the way.

During the summer we had the privilege of working with international teachers and students, people who know all about schools and teaching. They helped us to become acquainted with the high-school world and we showed them ways to use open data. To be honest, they did seem a bit terrified when we first introduced them to Jupyter but soon it all turned into enthusiasm. They were able to start creating their own notebooks and "Aha!" moments followed.

Inspired by our own experiences and the feedback received from the teachers and students, we decided to concentrate on developing materials using the Event Display and Jupyter Notebook. We created GitHub repositories to make the files available for everyone. You can go and explore our material in several languages at https://github.com/cms-opendata-education. The best part is that anyone interested in these topics can contribute and broaden the possibilities of using open data in the classroom.

All in all, we have had quite a melange of experiences during this summer and this project has given us even more than we had expected. We hope to have produced material that will help you to discover the wondrous world of open data. We will definitely continue to explore it and hope that you will join us!


The views expressed in CMS blogs are personal views of the author and do not necessarily represent official views of the CMS collaboration.