Tuesday, August 21, 2012

Converting xml to json with a few nice touches

During my recent outings in heavyweight programming, one of the things we needed to do was converting a large XML structure from the server to JSON object on the browser to facilitate easy manipulation/inspection.

Also, the XML from the server was not the nice kind - what I mean is that tag names were consistent - but the content was wildly inconsistent. For ex, all of the following were recd:


<!-- different variations of a particular tag -->
<BgSize>100,23</BgSize>
<BgSize>0,0</BgSize>
<BgSize>,</BgSize>

Ideally, in this case, we wanted to parse and validate the node (and all its different variations) and convert it to an X,Y pair only if it was a valid data in it. Also, a lot of these were common tags as you might expect that showed up in various different entities in the XML, so we wanted that all these rules get applied sooner centrally rather than having to deal with them at disparate places later down the stream.

The other reason was that a lot of the nodes really had structured data crammed into a single tag - which we ideally wanted parsed as a javascript object so that we could manipulate it easily


<!-- xml data with structured content -->
<!-- font, size, color, bold, italic-->
<Font>Arial;Lucida,14,0x0044,True,False</Font>

So that brought up a search for the best way to convert XML to JSOn -and of course stackoverflow had a question. THe article in the answer makes for very interesting reading into all the different conditions that have to be handled. The associated script at http://goessner.net/download/prj/jsonxml/ is the solution I picked. Really not much going on below other than to use the xml2json function to convert the xml to a raw json object.


@parseXML2Json: (xmlstr) ->
    log xmlstr
    json = $.parseJSON (xml2json $.parseXML (xmlstr)
    destObj = Utils.__parseTypesInJson(json)
    log "raw and parsed objects", json, destObj
    return destObj

But now to the more interesting part - once the xml is converted to a JSON, we need to do our magic on top of it - of applying validations and conversions. This is where the Utils.__parseTypesInJson method comes in

What we're doing here is walking through the JSON object recursively. At each step, we keep track of the path of the xml that we have descended into so that we can check the path and based on the path, apply validations or conversions. At each step, we also need to check the type of JSOn object we're dealing with - starting with undefined, null, string, array or object

If its a string, we further delegate to a __parseString function to convert the string to an object if needed.


@__parseTypesInJson: (obj, path = "") ->
 if typeof obj is "undefined"
  return undefined
 else if obj is null
  return null
 else if typeof obj is "string"
  newObj =  Utils.__parseString(obj, path)
  validator = _.find Utils.CUSTOM_VALIDATORS, (v)->
  v.regex.test path
  return validator.fn(newObj)  if validator?
  return newObj
 else if Object.prototype.toString.call(obj) is '[object Array]'
  destObj = (Utils.__parseTypesInJson(o, path) for o,i in obj)
  destObj = _.reject destObj,  (obj) ->
  obj == null
  return destObj
 else if typeof obj is "object"
  destObj = {}
  destObj[k]  = Utils.__parseTypesInJson(obj[k],  "#{path}.#{k}") for k of obj
  validator = _.find Utils.CUSTOM_VALIDATORS, (v)->
  v.regex.test path
  return validator.fn(obj)  if validator?
  return destObj
 else
  return obj


At each step, once the object is formed, we see if there's a custom validator defined in the array of custom Validators. Each validator is a regex and a callback function - if the regex matches the path, then the callback is passed the json object which it may manipulate before returning


@CUSTOM_VALIDATORS = [ choice =
                        regex: /choice$/
                        fn: (obj)->
                            if obj["#text"]?
                                return obj
                            else
                                log "returning null"
                                return null
                        ]

THe parseString method for completeness - you can really tweak this to your
taste and there's nothing complicated going on in this.


@__parseString : (str,  path) ->
    if not str?
        return str
    if _.any(Utils.SKIP_STRING_PARSING_REGEXES, (r)->
                                                    r.test path)
        log "Skipping string parsing for:" , path, str
        return  str
    if str
        if /^\d+$/.test str
        return parseInt str
    else if /^\d+,\d+$/.test str
        [first,second] = str.split(",")
        return  {"x": parseInt(first), "y": parseInt(second)}
    else if str == ','
        return null
    else if /^true$/i.test str
        return true
    else if /^false$/i.test str
        return false
    else if   /^[^,]+,\d+,(0x[0-9a-f]{0,6})?,((True|False),(True|False))?$/i.test str
        log "Matched font: ", str
        return  Utils.parseFontSpec(str)
    else
        return str

Microsoft Releases Git TFS integration tool

Microsoft released a cross platform Git TFS integration tool Git TF!! It's definitely a good step and acknowledgement about the mindshare that Git has.
I took it for a spin - the integration is supposed to be cross platform - so that it should work on cygwin also. However, the first time I tried, it did not and had to tweak the script a little.

In the script <install folder>/git-tf

# On cygwin and mingw32, simply run the cmd script, otherwise we'd have to
# figure out how to mangle the paths appropriately for each platform
if [ "$PLATFORM" = "cygwin" -o "$PLATFORM" = "mingw32" ]; then
#exec cmd //C "$0.cmd" "$@"                 #Orig
exec cmd /C "$(cygpath -aw "$0.cmd")" "$@"  #Changed
fi

Anyway, after that, things did seem to work - the only issue is that your windows domain password is echoed on the cygwin console :(... other than that minor irritant, I was able to clone the project and work on it using the Git integration. Going to try it out some more over the next few days and will post if find anything more. THis is definitely a great step from MS - and if this works properly, it will almost make working with TFS source control much much bearable :D

Friday, August 10, 2012

Coffeescript rocks!

I've been absent a few weeks from the blog. Life got taken over by work - been deep in the Javascript jungles and Coffeescript has been a lifesaver.
Based on my earlier peek at Coffeescript, we went ahead full on with Coffeescript and I have to say it has been a pleasant ride for the team with over 4.7KLoc of Javascript (with Coffeescript source weighing in around 3.7KLoc including comments etc) that now I can confidently recommend it for any sort of Javascript heavy development.
I'm going to list down benefits we saw with Coffeescript and hopefully someone else trying to evaluate it might find this useful:
  1. Developers who haven't dove deep into Javascript's prototype based model find it easier to get up to speed sooner. Yes - once in a while they do get tripped up and then have to look again into what's going under the covers - but this is normal. The key point is that its much much more productive and enjoyable to use Coffeescript.
  2. The conciseness of the Coffeescript definitely goes a long way in improving readability. One of the algorithms implemented was applying a bunch of time overlap rules. We also used Underscore.js - and between Coffeescript and Underscore.js, the whole routine was within 20 lines, mostly bug free and very easy for new folks to pick upand maintain over time. Correspondingly, the generated JS was much more complicated (though Underscore helped hide some of loop iteration noise) - and it wouldn't have been too different had we written the JS directly.
  3. Integrating with external frameworks - jquery, jquery ui etc was again painless and simple.
  4. Another benefit was that the easy class structure syntactic sugar helped quickly prototype new ideas and then refine them to production quality. With developers who're still shaky on JS, I doubt the same approach would have worked since they'd have spent cycles trying to get their heads wrapped around JS's prototype based model.
  5. Coffeescript also allows you to split the code to multiple source files and merge all of them before compiling to JS - this allowed us to keep each source file separate and reduce merges required during commits.
  6. Finally, performance is a non issue - you do have to be a little careful otherwise you might find yourself allocating function objects and returning them back when you don't mean to but this is easily caught in reviews.
One latent doubt I had going into this was the number of times we'd have to jump in to the JS level to debug issues. With a larger Coffeescript codebase spread across multiple files, this is a real concern since the error line numbers wouldn't match with source and if we have to jump through hoops to fix issues. Luckily, this wasn't a problem at all - over time, in cases of either an error in JS or just inspecting code in the browser, its easy to map to the Coffeescript class/function - so you just fix it there and regenerate the JS. Secondly, the generated JS is quite readable - so even when investigating issues, it's quite trivial to drop breakpoints in Chrome and know what's going on.
The one minor irritation was if there was a Coffeescript compile issue, then when joining the file, the line number reporting.fails and then you have to compile each file independently to figure out the error. Easily automated with a script - so that's just being nitpicky.
Anyway, if you got here looking for advice on using Coffeescript, then you've reached the right place and maybe this post's helped you make up your mind!