AMATH 483 / 583 (roche) - HW6 Due Friday May 31, 11:59pm PT May 24, 2024 Homework 6 (80 points, 0 EC points)
- (+20) Complex double linear system solver. Plot both the log of the residual and the log of the normalized error ( kbAzk2 kAk1 kzk2 ✏machine ) versus the square matrix dimensions 16,32,64,...,8192 for the following LAPACK routine. It is supported in the OpenBLAS build on Hyak. Submit your plot, and label it accordingly. l a p a c k i n t LAPACKE zgesv( int matrix orde r , l a p a c k i n t n , l a p a c k i n t nrhs , lapack compl ex doubl e ∗ a , l a p a c k i n t lda , l a p a c k i n t ∗ ipiv , lapack compl ex doubl e ∗ b , l a p a c k i n t ldb ); Use the following snippet code to initialize your matrices and rhs vectors and note the headers I use: #include #include #include #include #include #include #include #include #include <c b l a s . h> #include <lapacke . h> . . . int main () { . . . a =( s td : : complex∗) malloc ( s izeof ( s td : : complex) ∗ ma ∗ na ) ; b = ( s td : : complex∗) malloc ( s izeof ( s td : : complex) ∗ ma ) ; z = ( s td : : complex∗) malloc ( s izeof ( s td : : complex) ∗ na ) ; . . . s rand ( 0 ); int k =0; for ( int j = 0 ; j < na ; j++) { for ( int i = 0 ; i < ma ; i++) { a [ k ] = 0 . 5 − (double ) rand () / (double )RANDMAX
- s td : : complex(0 , 1) ∗ ( 0 . 5 − (double ) rand () / (double )RANDMAX) ; i f ( i==j ) a [ k]∗= s tat ic cas t(ma ) ; k++; } } s rand ( 1 ); for ( int i = 0 ; i < ma; i++) { b [ i ] = 0 . 5 − (double ) rand () / (double )RANDMAX
- s td : : complex(0 , 1) ∗ ( 0 . 5 − (double ) rand () / (double )RANDMAX) ; } . . .
- (+20) CPU-GPU data copy speed on HYAK. Write a C++ code to measure the data copy performance between the host代 写AMATH 483 / 583 linear system solver CPU and GPU (host to device), and between the GPU and the host CPU (device to host). Copy 8 bytes to 256MB increasing in multiples of 2. Plot the bandwidth for both directions: (bytes per second) on the y-axis and the bu↵er size in bytes on the x-axis. Submit your plot and test code.
- (+20) Compare FFTW to CUFFT on HYAK. Measure and plot the performance of calculating the gradient of a 3D double complex plane wave defined on cubic lattices of dimension n3 from 163 to n = 2563, stride n⇤ = 2 for both the FFTW and CUDA FFT (CUFFT) implementations on HYAK. Let each n be measured ntrial times and plot the average performance for each case versus n, ntrial 3. Submit your performance plot which should have ’FLOPs’ on the y-axis (or some appropriate unit of FLOPs) and the dimension of the cubic lattices (n) on the x-axis. You will need to estimate the operation count of computing the derivative using FFT on a lattice.
- (+20) Fourier transforms. Evaluate the Fourier transform of the following functions by hand. Use the definitions I provided (includes p1 2⇡ , this is common in physics but also now the default used in WolframAlpha - a powerful math AI tool) as well as the definition for Dirac delta I used in lecture if WX:codinghelp