SPARK : How to generate Nested Json using Dataset

I have come across requirements where in I am supposed to generate the output in nested Json format.Below is a sample code which helps to do the same.The input to this code is a csv file which contains 3 columns .

  1. company name
  2. department
  3. employee name


We need to generate a json which contains company name at the top most level and then list of employees per department as below:

Below code reads the csv file into dataset and create a temp view “employees” over it.First we group the employees and create a “list” of employees per-company-per-deparment(grouping on company and department).Then we will create a struct of deparment and employee list and group it based on company.Finally we create the json and display it.

Sample Input:


Formatted Output:

Leave a Reply

Your email address will not be published. Required fields are marked *