VTD-XML is a good alternative to Simple API for XML (SAX) and Document Object Model (DOM), as it does not force you to trade processing performance for usability. The Java-based, non-validating VTD – XML parser is faster than DOM and better than SAX.Unlike other XML processing technologies, VTD-XML is designed to be capable of random-access without incurring excessive resource overhead.
An important optimization feature of VTD-XML is non-extractive tokenization. Internally, VTD-XML retains the XML message in memory intact and un-decoded, and tokens are represented using starting offset and length exclusively
The Java API of VTD-XML consists of three essential components:
- VTDGen (VTD generator) that encapsulates the parsing routine that produces the internal parsed representation of XML
- VTDNav (VTD navigator) which is a cursor-based API that allows for DOM-like random access to the hierarchical structure of XML
- Autopilot which is the class that allows for document-order element traversal.
The following steps need to be performed to use VTD-XML for processing an XML document either from disk or via HTTP.
- The first step is to find out the length of the XML document, allocate adequate memory big enough to hold the document, and then read the entire document into memory.
- The next step is to create an instance of VTDGen and assign the byte array to it using setDoc().
- The final step is to call parse(boolean ns), to generate the parsed XML representation. When ns is set to true, subsequent document navigation is namespace aware. If parsing succeeds, you can retrieve an instance of VTDNav by calling getNav().
At the onset of navigation, the cursor of the VTDNav instance points at the root element of the XML document. You can use one of the overloaded versions of toElement() function, to move the cursor manually to different positions in the hierarchy. The toElement() function when declared as toElement(int direction) takes an integer as the input, to indicate the direction in which the cursor moves. Defined as class variables of VTDNav, the six possible values of this integer are ROOT, PARENT, FIRST_CHILD, LAST_CHILD, NEXT_SIBLING, and PREV_SIBLING.
Input Xml:
Googled By God Thriller Pulkit Ahuja 200 Dan Brown The Da Vinco Code Mystry 300 Murthy The Magic of Lost Temple Fantasy 150 Sudha
Code:
import java.io.File; import java.io.FileInputStream; import java.util.HashMap; import java.util.Iterator; import java.util.Map.Entry; import com.ximpleware.AutoPilot; import com.ximpleware.VTDGen; import com.ximpleware.VTDNav; public class vtdbook { public static void main(String[] args){ try{ HashMap<String,ArticleData> data = new HashMap<String,ArticleData>(); File f = new File("/home/mydir/Downloads/book.xml"); FileInputStream fis = new FileInputStream(f); byte[] ba = new byte[(int)f.length()]; fis.read(ba); VTDGen vg = new VTDGen(); vg.setDoc(ba); vg.parse(false); VTDNav vn = vg.getNav(); AutoPilot ap = new AutoPilot(vn); //Jump to the section that we want to process ap.selectXPath("/library/book/title"); //Get all the titles and print each of those while(ap.evalXPath() != -1) { //getText will return the index of the VTDRecord int titleIndex = vn.getText(); //Get the text of the VTDRecord String title = vn.toNormalizedString(titleIndex); System.out.println("Title is "+title); } //Lets print the ID and the Author(s) ap.selectXPath("/library/book"); //Evaluate all the books while (ap.evalXPath() != -1) { //Get the id of the book int idIndex = vn.getAttrVal("id"); if(idIndex != -1){ //We are doing a parseint() since we know id is a integer int id = vn.parseInt(idIndex); System.out.println("Book id: " + id); } //Hit to the first element which is author, after that //we will loop to check if there are any further authors if(vn.toElement(VTDNav.FIRST_CHILD,"author")){ do{ int authorIndex = vn.getText(); if(authorIndex != -1){ String author = vn.toNormalizedString(authorIndex); System.out.println("\tAuthor:" + author); } }while(vn.toElement(VTDNav.NEXT_SIBLING,"author")); } //Go back to the parent element vn.toElement(VTDNav.PARENT); } } catch(Exception e) { System.out.println("Exception is:"+e.getMessage()); } } }
Output:
Title is Googled By God Title is The Da Vinco Code Title is The Magic of Lost Temple Book id: 1 Author:Pulkit Ahuja Book id: 2 Author:Dan Brown Book id: 3 Author:Murthy Author:Sudha
References : https://dzone.com/articles/building-regular-dsls-in-scala
I’ll right away grasp your rss as I can not to find your e-mail
subscription hyperlink or e-newsletter service. Do you have
any? Please let me realize so that I may subscribe. Thanks.
It’s amazing for me to have a web site, which is helpful in support of my
know-how. thanks admin
You should be a part of a contest for one of the finest websites on the net.
I am going to recommend this web site!
It’s very simple too find out any topic on web ass compared to textbooks, as I found this post at this web site.
webpage
Excellent beat ! I wish to apprentie wwhile you amend your web site, how could i subvscribe for
a blog website? The account aided me a acceptable deal.
I had been tiny bit acquainted of ths your broadcast proovided
bright clear idea
homepage
Greetings! Very useful advice in this particular post!
It’s the little changes that will make the most significant changes.
Thanks a lot for sharing!
Pretty component to content. I just stumbled upon your site and in accession capital to assert that I acquire in fact enjoyed account your weblog posts.
Any way I’ll be subscribing to your augment or even I fulfillment you get admission to consistently fast.
bookmarked!!, I love your site!
Thanks for the good writeup. It in fact was once a enjoyment
account it. Glance advanced to far delivered agreeable from you!
By the way, how can we communicate?
I used to be recommended this web site through my cousin. I’m
no longer sure whether this post is written by way
of him as no one else know such targeted approximately my problem.
You’re amazing! Thank you!
Have a look at my homepage ซื้อหวย