Multi-Objective Evolutionary Algorithms(MOEAs) have been gaining increased popularity and usage in different fields of engineering. For real world large scale optimization problems with large variable/search space, using a large population of individuals in proportion to the size of search space is ubiquitous. Solving such problems with current state of the art algorithms like NSGA-II  is pervasive. The strength of NSGA-II lies in its non-dominance selection procedure and non-dominance based sorting of a population of individuals. Although, the non-dominated sort is computationally efficient for a small population (102 - 103) of solutions but becomes computationally expensive and slow for a large population (104 - 105) of solutions. Also, various archive based algorithms ,  have been proposed in past which make use of a large population apart from the principal population. Therefore, there is a huge need for a scalable and parallel implementation of NSGA-II. With advent of consumer level Graphics processing units(GPUs) and advancement of CUDA framework we try to fill this research gap using GPGPU architecture. In this paper we propose a parallel GPU based implementation of NSGA-II with major focus on non-dominated sorting. The proposed approach can be easily coupled with the original form of NSGA-II to solve real world problems using large populations.