This is part of my long-term project on Adaptive Multilevel Fast-Solver Project. An efficient implementation of Multigrid code (without using any technique related to hardware such as Cache-aware) outperforms the FFTW code, which is highly optimized by using the details of the underlying hardware. Some preliminary numerical results and more detailed discussion can be found in: Fast Poisson Solver.