Author Archives: baahu

VTD Xml Example

baahu   January 23, 2017   2 Comments on VTD Xml Example

VTD-XML is a good alternative to Simple API for XML (SAX) and Document Object Model (DOM), as it does not force you to trade processing performance for usability. The Java-based, non-validating VTD – XML parser is faster than DOM and better than SAX.Unlike other XML processing technologies, VTD-XML is designed… Read more »

SPARK: Java code to Read files with Custom Record Delimiter

By default SPARK reads text files with newline(‘\n’) character as the Record delimiter.But there could be instances where in record delimiter is some other character, for eg: CTRL+A (‘\001’) or a Pipe(“|”) character. So how can we read such files? We can set the textinputformat.record.delimiter parameter in the Configuration object… Read more »

SPARK :Add a new column to a DataFrame using UDF and withColumn()

In this post I am going to describe with example code as to how we can add a new column to an existing DataFrame using withColumn() function of DataFrame. There are 2 scenarios: The content of the new column is derived from the values of the existing column The new… Read more »

Hive Error:Failed to recognize predicate Failed rule: ‘identifier’ column specification

Earlier I was using Hive 0.13 and the query which was running fine in this version of Hive started misbehaving once I upgraded to Hive 1.2.My query was something looked something like below SELECT count(*) as rows,userId FROM USERTABLE GROUP BY userId; Below is the snippet of the error. FailedPredicateException(identifier,{useSQL11ReservedKeywordsForIdentifier()}?)… Read more »