International Competition
The 4th APAC HPC-AI Competition is an event that challenges APAC student teams to perform various computing tasks and achieve the highest benchmark scores. This competition was divided into two parts: GROMACS and DLRM, both of which required extensive computing and cluster tuning skills.
DLRM
DLRM, or Deep Learning Recommendation Model for Personalization and Recommendation Systems, is a deep learning model created by Facebook. It takes input involving several sparse and dense features, which are then transformed into embedding tables. After passing through the MLP networks, it produces probabilities of a click. In this competition, HPC-AI requires all participants to implement the MLPerf benchmark.
In this competition, my teammate and I were responsible for the DLRM task.
First, we spent months reading papers, analyzing features, and attempting to reproduce the model. Reproduction was the most challenging aspect of this process. Initially, we ran it on a single node, encountering several issues with package configurations and existing code and datasets. Thankfully, we passed this initial hurdle.
Secondly, to achieve a better benchmark score, we started tuning the cluster. With the assistance of NCHC, we obtained permission to use supercomputer clusters. This system used Singularity as its container system, which required us to familiarize ourselves with it. With support from other groups, we successfully set up the cluster environment.
Lastly, we needed to fine-tune the cluster and establish communication between clusters, a critical step for achieving a higher benchmark. Questions arose: How should we divide the dataset? How can we optimize communication for faster performance? Did we fully utilize all available resources?
Participating in this competition provided me with a valuable opportunity to delve into cluster systems and the intricacies of large-scale training processes. Throughout this journey, we encountered both minor and major challenges. Research, inquiry, and critical thinking were our most powerful tools. In the end, we were proud to receive the second place award. This experience has undoubtedly contributed to my growth on the path to becoming a software developer.