Category Archives: SPARK

SPARK : How to generate Nested Json using Dataset

I have come across requirements where in I am supposed to generate the output in nested Json format.Below is a sample code which helps to do the same.The input to this code is a csv file which contains 3 columns . company name department employee name Example: google,jessica,sales google,sita,technology We… Read more »

SPARK: Java code to Read files with Custom Record Delimiter

By default SPARK reads text files with newline(‘\n’) character as the Record delimiter.But there could be instances where in record delimiter is some other character, for eg: CTRL+A (‘\001’) or a Pipe(“|”) character. So how can we read such files? We can set the textinputformat.record.delimiter parameter in the Configuration object… Read more »

SPARK :Add a new column to a DataFrame using UDF and withColumn()

In this post I am going to describe with example code as to how we can add a new column to an existing DataFrame using withColumn() function of DataFrame. There are 2 scenarios: The content of the new column is derived from the values of the existing column The new… Read more »