Multicore Computing

56 阅读3分钟

2024.1 Multicore Computing, Project #3
(Due : 11:59pm, May 26)
Submission Rule

  1. Create a directory {studentID#}_proj3 (example: 20203601_proj2). In the directory, create 
    subdirectories ‘prob1’ and ‘prob2’. 
    2.a For problem 1, write (i)’C with OpenMP’ source code prob1.c, and (ii)a document that reports 
    the parallel performance of your code into the directory "prob1". Insert the files (i), and (ii) 
    into the subdirectory ‘prob1’. 
    2.b For problem 2, write (i) ‘C with OpenMP’ source code prob2.c , and (ii) a document that 
    reports the parallel performance of your code. Insert (i) and (ii) into the subdirectory ‘prob2’.
    2.c For problem 3, insert demo video files (.mp4) into the directory {studentID#}_proj3.
  2. zip the directory {studentID#}_proj3 and submit the zip file into eClass homework board.
    ※ If possible, use quad-core/hexa-core/octa-core CPU (or CPU with more cores) rather than dual-core CPU for 
    your experimentation, which will better show the performance gains of the parallelism. 
    [Problem 1] In project 1, we looked at the JAVA program that computes the number of ‘prime numbers’ between 1 
    and 200000. The parallel implementation of a static approach based on bad work decomposition (i.e. just dividing 
    the entire range of the numbers into k consecutive sub-ranges, where k is the number of threads) may not give 
    satisfactory performance because (i) higher ranges have fewer primes and (ii) larger numbers are harder (i.e. taking 
    longer time) to test whether they are prime or not. Therefore thread workloads may become uneven and hard to 
    predict. For better performance, we implemented dynamic load balancing approach in project 1 where each thread 
    takes a number one by one and test whether the number is a prime number. 
    (i) Write ‘C with OpenMP’ code that computes the number of prime numbers between 1 and 200000. Your program 
    should take two command line arguments: scheduling type number (1 = “static with default chunk size”, 2 = 
    “dynamic with default chunk size”, 3 = “static with chunk size 10”, 4 = “dynamic with 代 写 Multicore Computing chunk size 10”), and 
    number of threads (1, 2, 4, 6, 8, 10, 12, 14, 16) as program input argument. Use schedule(static) , 
    schedule(dynamic) , schedule(static, 10) and schedule(dynamic, 10). Your code should print 
    the execution time as well as the number of the prime numbers between 1 and 200000. 
    command line execution: > a.out scheduling_type# #_of_thread
    execution example> a.out 1 8 <---- this means the program use “schedule(static)” using 8 threads.
    (ii) Write a document (in PDF file format) that reports the parallel performance of your code. The graph that shows 
    the execution time when using 1,2,4,6,8,10,12,14,16 threads. There should be at least four graphs that show the 
    result of static and dynamic scheduling policies. The document that reports the parallel performance should contain 
    (a) in what environment (e.g. CPU type, memory size, OS type ...) the experimentation was performed, (b) tables 
    and graphs that show the execution time (unit:milisecond) for thread number = {1,2,4,6,8,10,12,14,16}. (c) The 
    document should also contain explanation on the results and why such results can be obtained.

    exec time
    (unit: ms)
    chunk
    size
    1 2 4 6 8 10 12 14 16
    static default
    dynamic default
    static 10
    dynamic 10
    performace
    (1/exec time)
    chunk
    size
    1 2 4 6 8 10 12 14 16
    static default
    dynamic default
    static 10
    dynamic 10[Problem 2] Parallelize prob2.c (see our class webpage project 3 announcement to access prob2.c) using 
    OpenMP. Your program should take three command line arguments: scheduling type number (1=static, 2=dynamic, 
    3=guided), chunk size, and number of threads as program input argument. Your code should print the execution time 
    and the result of PI calculation. Assume the number of steps num_steps = 10000000.
    command line execution: > a.out scheduling_type# chunk_size #_of_thread
    execution example> a.out 2 4 8 <---- this means dynamic scheduling (chunk size = 4) using 8 threads.

    (i) submit the OpenMP source code prob2.c
    (ii) Write a document (in PDF file format) that reports the parallel performance of your code. Your report should 
    contain (a) following tables and graphs that shows information in the tables, and (b) brief explanation and 
    interpretation on the results (including why such results can be obtained). 
    execution time
    (unit:ms)
    chunk
    size
    1 2 4 6 8 10 12 14 16
    static
    dynamic 1
    guided
    static
    dynamic 5
    guided
    static
    dynamic 10
    guided
    static
    dynamic 100
    guided
    performace
    (1/exec time)
    chunk
    size
    1 2 4 6 8 10 12 14 16
    static
    dynamic 1
    guided
    static
    dynamic 5
    guided
    static
    dynamic 10
    guided
    static
    dynamic 100
    guided
    [Problem 3] Create a demo video file (.mp4 format) that shows compilation and execution of your source files (prob1.c, prob2.c). The size of the demo video file should be less than 50MB. 
    WX:codinghelp