Parallelizing Transposed Convolution

1 minute read

I attended “Scalable High Performance Computing” in 2021 Fall. The class had gone through interesting topics about recent CPUs and GPUs, and the term project was exciting: parallelizing the transposed convolution in a generator of DCGAN architecture.

I have implemented a few techniques to parallelize transposed convolution. First, I have reduced unnecessary divergence(if-branches) to make the code friendly with the lock-step architecture of GPUs; since all streaming cores in a warp fetch instructions both from true and false blocks, including many if-branches, become overhead to the program.

Second, I tried to express transposed convolution as (massive) matrix multiplication. The motivation for this was that matrix multiplication is highly parallelizable. There are extremely fast matrix multiplication algorithms; thus, I would benefit from those if I regard transposed convolution as matrix multiplication. Designing and implementing this notion was very painful: building and reshaping the matrix in C was confusing, and debugging CUDA codes without using printf (which I thought I could not use in GPUs) was vastly time-consuming. Mainly, the arcane behaviors of C language macro fooled me; it made me waste a day debugging my code.

Going through these efforts, I could largely shorten the process of transposed convolution. The initial sequential CPU implementation of the method was excruciatingly long: it took longer than 1 minute to generate a single artificial face portrait. So, in theory, it would have taken longer than 16 hours to generate 1000 faces. However, implementing the generator in CUDA GPUs essentially shortened the process: I ended up with the project generating 1000 faces in under 0.2 seconds. With 5000 times boost in the process, I was pleased about how far computing technology would enhance human lives. It will take humankind to the level we could never reach without computers!

You can check out my term project report: Term project report(in Korean), Term project report(in English).

Updated: