VTD-XML is a good alternative to Simple API for XML (SAX) and Document Object Model (DOM), as it does not force you to trade processing performance for usability. The Java-based, non-validating VTD – XML parser is faster than DOM and better than SAX.Unlike other XML processing technologies, VTD-XML is designed… Read more »
There would be instances where in we do not have direct access to Database server running in an external network, i.e we cannot open connection to the database directly from our local machine .Usually to access these databases we will have to login to another server(Bastion host) thru putty to… Read more »
By default SPARK reads text files with newline(‘\n’) character as the Record delimiter.But there could be instances where in record delimiter is some other character, for eg: CTRL+A (‘\001’) or a Pipe(“|”) character. So how can we read such files? We can set the textinputformat.record.delimiter parameter in the Configuration object… Read more »
Given a squared sized grid of size in which each cell has a lowercase letter. Denote the character in the the row and in the th column as G[i][j]. You can perform one operation as many times as you like: Swap two column adjacent characters in the same row G[i][j]… Read more »
In this post I am going to describe with example code as to how we can add a new column to an existing DataFrame using withColumn() function of DataFrame. There are 2 scenarios: The content of the new column is derived from the values of the existing column The new… Read more »
Jack and Daniel are friends. They want to encrypt their conversation so that they can save themselves from interception by a detective agency. So they invent a new cipher. Every message is encoded to its binary representation B of length N. Then it is written down K times, shifted by… Read more »
You will be given a list of 32 bits unsigned integers. You are required to output the list of the unsigned integers you get by flipping bits in its binary representation (i.e. unset bits must be set, and set bits must be unset). Take 1 for example, as unsigned 32-bits… Read more »
Earlier I was using Hive 0.13 and the query which was running fine in this version of Hive started misbehaving once I upgraded to Hive 1.2.My query was something looked something like below SELECT count(*) as rows,userId FROM USERTABLE GROUP BY userId; Below is the snippet of the error. FailedPredicateException(identifier,{useSQL11ReservedKeywordsForIdentifier()}?)… Read more »
The city of Gridland is represented as an matrix where the rows are numbered from to and the columns are numbered from to . Gridland has a network of train tracks that always run in straight horizontal lines along a row. In other words, the start and end points of… Read more »
A DataFrame is a collection of data, organized into named columns.DataFrames are similar to tables in a traditional database DataFrame can be constructed from sources such as Hive tables, Structured Data files, external databases, or existing RDDs. Under the hood, a DataFrame contains an RDD composed of Row objects with… Read more »