Sorting of local variables or using thrust inside kernels in cuda

Thrust is a library to cuda which allows to perform several operation in an easy way i.e we just need to call the the function without taking much care on it. Sorting is also one among several operation performed by thrust in cuda.

Generally thrust code is a host code. We need to call the thrust function from host and whenever we tried to used inside kernel it complains that the host code cannot be called in device function. However we may come across the situation that we need to sort the number within kernel. for e.g we may have a local variables within kernel which needs to be sorted and perform operation .

If we are in need of such operation and now it is possible in cuda with thrust . But what we need is appropriate thrust version. In thrust version v1.7 this feature is not supported. we need to have thrust v1.8 and further if we are programming in windows platform with visual studio then beware vs 2005 doesnot support. visual studio 2010 works well in my case . I havenot tested for other newer version.

thrust:sort can be combined with the thrust:seq execution policy to sort numbers sequentially within a single CUDA thread (or sequentially within a single CPU thread). and

#include <thrust/execution_policy.h> needs to be added in the header.

here below is the complete code to sort the local array with cuda 5.0 with thrust v 1.8

#include <stdio.h>

#include<iostream>

#include <cuda.h>

// main routine that executes on the host
for(int i=0;i<N; i++)
}
int main(void)
cudaMemcpy(a_h, a_d, sizeof(int)*N, cudaMemcpyDeviceToHost);

#include "cuda_runtime.h"

#include "device_launch_parameters.h"

#include <thrust/sort.h>

#include <thrust/binary_search.h>

#include <thrust/device_ptr.h>

#include <thrust/execution_policy.h>

__global__ void sort_array(int *a, int N)

{

int idx = blockIdx.x * blockDim.x + threadIdx.x;

int td[10];

for(int i=0;i<N; i++)

{

td[i]=a[i];

}

thrust::device_ptr<int> t_a(td);

thrust::sort(thrust::seq,t_a, t_a + N);

{

a[i] = td[i];

}

{

int *a_h, *a_d; // Pointer to host & device arrays

const int N = 10; // Number of elements in arrays

size_t size = N * sizeof(int);

a_h = (int *)malloc(size); // Allocate array on host

cudaMalloc((void **) &a_d, size);// Allocate array on device

std::cout<<"enter the 10 numbers";

// Initialize host array and copy it to CUDA device

for (int i=0; i<N; i++)

{

std::cin>>a_h[i];

}

for (int i=0; i<N; i++) printf("%d %d\n", i, a_h[i]);

cudaMemcpy(a_d, a_h, size, cudaMemcpyHostToDevice);

sort_array <<< 1,1 >>> (a_d, N);

/* thrust::device_ptr<int> t_a(a_d);

thrust::sort(a_d, a_d + N);*/

// Do calculation on device:

// Print results

printf("sorted value\n");

for (int i=0; i<N; i++) printf("%d %d\n", i, a_h[i]);

// Cleanup

free(a_h); cudaFree(a_d);

}

output

CODING EVERYTHING

Pages

Thursday, September 4, 2014

Sorting of local variables or using thrust inside kernels in cuda

No comments:

Post a Comment