Juggling with Terabytes: From Higgs Physics to the Last Known Digit of Pi

Guest post by Peter Trüb, former CMS Collaborator, member of DECTRIS, a spin-off of the Paul Scherrer Institute


Today is 14th March. This is not only the day o Albert Einstein's birth , but thanks to the unintuitive English notation 3/14, also the official pi day. As we learned at school, pi is an irrational number. This has the nice consequence, that there is a never ending quest to compute pi to more and more digits. The list of records starts with the digit 3, as it was used by ancient peoples several thousand years ago. Only one and a half year ago, I had the opportunity to contribute the last entry to the list by computing the first πe trillion digits of pi. If you would like to learn, how my experience in doing Higgs analysis with the CMS experiment helped me achieve this record, and why Switzerland is a natural place for such an undertaking, you are kindly invited to read through the rest of this post.

Pi day is often celebrated with prettily decorated pies

My interest in π dates back to my high school time at the Alte Kantonsschule in Aarau, the school, where Albert Einstein got his university-entrance diploma too. In our class room, my maths teacher had fixed several printouts full with digits of π on the wall just behind my seat. He also told us about a bizarre club of pi friends, in which you can only become a member, if you are able to recite the first 100 digits of pi by heart. So my friend and I started to memorise the digits of π, but somehow I never managed to get beyond the first 20 digits.

After qualifying for university I followed in the steps of Albert Einstein to the Eidgenössische Technische Hochschule in Zurich to learn more about physics. After my diploma I joined the CMS pixel group at the Paul Scherrer Institute not far from Zurich. Here I was responsible for qualifying the silicon modules for the barrel pixel detector, which were currently being constructed by other members of the group. Besides this hardware related work I was investigating, how the Higgs particle could be found with its decay to tau leptons. For this study I had to generate and analyse several terabytes of Monte-Carlo events on the LHC grid. This was my first experience in handling very large data sets, at a time, where a 1TB hard drive still cost around 300 dollars.

The CMS barrel pixel module

A few years earlier, it was realised, that the hybrid pixel technology could also be used for X-ray detection and a team at the Paul Scherrer Institute developed this technology for the usage at the Swiss Light Source. Due to high demand for such detectors, the spin-off company DECTRIS was founded to commercialise the technology. After my PhD I joined DECTRIS and started to work on hybrid photon counting detectors. With the introduction of the EIGER detector series, our high-end products could operate with a frame rate of up to 3 kHz and produced a data stream of 5 GB/s. To help our customers with this flood of information, we had to build up the expertise to efficiently analyse and store this data.

Random walk in three dimensions based on the first 105 digits of pi

It was during that time that I noticed that these are exactly the skills needed for a world record pi computation. Interestingly, the limiting factor for computing pi to trillions of digits is not CPU power but storage bandwidth. Today, the fastest way to compute π is to use the Chudnovsky formula, a fast converging series, which yields about 14 additional digits with every term. To get the desired precision, each term of the series had to be computed up to 22.4 trillion digits. Unfortunately, such numbers do not fit into the memory of today’s computers, unless you have a fantastically high budget available. To limit the costs, intermediate results have to be stored on disk, which turns the computation to be limited by storage bandwidth.

Since we already had fast multi-core servers at DECTRIS, our only task was to setup a fast storage with 120 TB of disk space. It consisted of 20 disks with 6 TB size each, on which we could read and write in parallel with a total bandwidth of 4 GB/s. After downloading and running y-cruncher (the program used for all recent records) we only had to regularly backup intermediate checkpoints of the computation. After three and a half months, y-cruncher had read and written more than 8 petabytes of data without a single disk failure. When I saw the last digits, I was surprised. They read 237, my date of birth in the more intuitive German convention. Thanks to hybrid pixel detectors, DECTRIS, PSI, and CERN, the record has come to Switzerland. The local club of the friends of pi rejoiced and kindly granted me a membership, even without being able to recite the first 100 digits of pi by heart.

Summary of the computation showing the last known digits of pi

The views expressed in CMS blogs are personal views of the author and do not necessarily represent official views of the CMS collaboration.