Tuesday, September 17, 2013

Build a tree from a xml file

Give a xml file like this,
<a>
   <b>Text<\b>
   <c>
      <d>Computer Science<\d>
      <d>Linear Algebra<\d>
   <\c>
   <e> Hello world <\e>
<\a>
Convert the file into a tree
                     <a>
                    /     \       \
                 <b>  <c>   <e>
                          /    \
                       <d>  <d>
Note the result don't have to be a binary tree. To functions are given.
1. getToke(): It returns two values Type and Text. Type includes BEGIN, TEXT, END. Text is the actual content of the token.
2. hasNext(): It will return true until the scan hit the end of the file.

Solution:
You have to first decide how to store the tree structure. Then another problem is how to differentiate between parent and children. We use a stack to store the current parent.

One thing to be careful about is that the xml file may be not in the correct format, i.e. <a>text<\c>. Return NULL in this case.

class Node {
public:
    string name;
    string content;
    vector<Node *> children;
    Node (string n, string c = "");
};

struct Token {
    Type type;
    String content;
};

Node * buildTree (){
    Token t = getToken();
    Node * root = new Node (root.content);
    stack<Node *> s;
    s.push (root);
    
    while (hasNext()) {
        t = getToken();
        if (t.type == BEGIN) {
            Node * tmp = new Node (t.content);
            s.top->children.push_back (tmp);
            s.push (tmp);
            continue;
        }
        if (t.type == TEXT) {
            tmp->content += t.content;
            continue;
        }
        if (t.type == END) {
            //Here you need to check whether the format of xml file is correct
            //If the type of token doesn't match, return NULL
            if (t.content == s.top())
                s.pop();
            else
                return NULL;
        }
    }
    //if the stack is not empty at the end, xml is not correct formatted, return NULL
    if (!s.empty())
        return NULL;
    else
        return root;
}

No comments:

Post a Comment