Friday, February 6, 2015

How to configure and run cuda c with fortran in linux



Actually cuda can be programmed in Fortran and C . Cuda C is available in free whereas Cuda Fortran need to be paid for its use. Thus it is better to use cuda C.If our host code is
in C programming language then it is not a problem to use cuda C. However if we have a host code in fortran , then there are two option for us of using cuda fortran or cuda C. As cuda
C is freely available, we can use cuda C.

first we need to install nvida driver and cuda c toolkit . Installation of cuda c and nvidia driver is written in  earlier post here.

now the next step to implement cuda is first we need to call the C function from fortran using C wrapper as

1) create a  fortran file fortest.f95
PROGRAM fortest



! simple program which creates 2 vectors and adds them in a cuda function

IMPLICIT NONE

integer*4 :: i

integer*4, parameter :: N=8

real*4, Dimension(N) :: a, b

DO i=1,N

a(i)=i*1.0

b(i)=2.0

END DO

print *, 'a = ', (a(i), i=1,N)

CALL kernel_wrapper(a, b, N) // calling of C function from fortran

print *, 'a + 2 = ', (a(i), i=1,N)

END PROGRAM
2) create cuda file names as cudatest.cu



#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <cuda.h>
#include <cuda_runtime.h>
// simple kernel function that adds two vectors
__global__ void vect_add(float *a, float *b, int N)

{

int idx = threadIdx.x;
if (idx<N) a[idx] = a[idx] + b[idx];
}

// function called from main fortran program
extern "C" void kernel_wrapper_(float *a, float *b, int

*Np)
{

float *a_d, *b_d; // declare GPU vector copies

int blocks = 1; // uses 1 block of

int N = *Np; // N threads on GPU

// Allocate memory on GPU

cudaMalloc( (void **)&a_d, sizeof(float) * N );

cudaMalloc( (void **)&b_d, sizeof(float) * N );

// copy vectors from CPU to GPU

cudaMemcpy( a_d, a, sizeof(float) * N,

cudaMemcpyHostToDevice );



cudaMemcpy( b_d, b, sizeof(float) * N,

cudaMemcpyHostToDevice );

// call function on GPU

vect_add<<< blocks, N >>>( a_d, b_d, N);

// copy vectors back from GPU to CPU

cudaMemcpy( a, a_d, sizeof(float) * N,

cudaMemcpyDeviceToHost );

cudaMemcpy( b, b_d, sizeof(float) * N,

cudaMemcpyDeviceToHost );

// free GPU memory

cudaFree(a_d);

cudaFree(b_d);

return;

}





3)Compile Fortran file with gfortran complier
    $ gfortran –c fortest.f95

4)This create fortest.o file in current working directory

5)Compile CUDA file with nvcc complier
    $ nvcc –c cudatest.cu

6) This create cudatest.o file in current working directory

7)Link and compile these two object file

    $ gfortran –o <your executable file> fortest.o cudatest.o –L<your CUDA library path> -lcudart –lstdc++

          Example:

         final_file <my executable file> /usr/local/cuda-6.5/lib64 <my CUDA library path>

      $ gfortran –o final_file fortest.o cudatest.o –L /usr/local/cuda-
      6.5/lib64 -cudart –lstdc++

8)This will create executable file name as <your executable file>

    in current working directory

     Example:

     In my case final_file is created


9) Finally run this newly created executed file

./<your executable file>

  Example:

  $ ./final_file


OUTPUT


1 comment: