Nifty Tidbits

Friday, May 28, 2010

Fun with python’s decorators

Was in need of a utility function that can retry an arbitrary function a few times before giving up. Essentially something like Gmail or Google Readers behavior when there’s no network connection.

Thought it would be a few minutes job to cook up a decorator utility in Python. Boy! was I wrong! I mean, the basic use case is definitely trivially easy with Python – however, once you want something that’s more useful than that and resembles something that you’d actually use in production, the complexity goes over the top!

Anyway, I’m figuring out all sorts of fun things about decorators – and all of it the hard way! OTOH, its a lot of fun to write small test code to test & validate assumptions!

Make no mistake – I’m still a python fanboy :) – just that going through some pains with decorators right now. Will follow this up with a longer/detailed post that may have some useful insights I’ve gained till then. Thanks for stopping by!

Saturday, March 06, 2010

Back after a long time…

Obviously, I’m not writing enough out here… part of the reason being even though wordpress’s web editor is great, I really like not having to type gobs of text in a text area.

So eventually, looked around and found Windows Live Writer. Its going out on its customary spin :).

So what’s been cooking? Actually a bunch of things over the last several months:

Stuff – on which I mean to put up individual posts

Had a fun exercise benchmarking lighttpd with python wsgi
Been doing some stuff on mysql cluster – mostly around seeing how it compares with the mysql master-master replication setup I had in place.
Dipping my toes into Amazon EC2 finally – though Linode or Rackspace is way easier if you want to just spin up a VM. Amazon’s EC2 does have some interesting stuff (reliability of back ups, CDN etc). However, it comes at the cost of having a model that initially is hard to understand.
Resin server – heard good things about it, had to see if it would fit at some stuff at work. Disappointed that the free version is really hamstrung.
Apache Wicket: I’ve always hated web UI and somehow the action oriented frameworks (Struts and their ilk) never appealed from a coupling/cohesion standpoint. In that respect, seemed like ASP.net got a lot of things right going the component oriented way. However, it seems fatally flawed with stuff like viewstate and postback and so on. On the Java end, tried Tapestry out, but, it comes with too much baggage for my taste. Had been reading of Wicket for sometime now and decided to take the plunge and was pleasantly surprised doing my contrived example:

Took much less to get off the ground compared to Tapestry
Mentally, a lot easier to understand
Managed to realize my goal of exploiting OO techniques to DWIM – even on a simple contrived example.

Books:

Steve Souders excellent "High performance websites” book: if you’re doing anything near a high performance website, then grab this book today!
Wicket In Action
Agile Principles, Patterns and Practices by Robert C Martin: read about the SOLID principles first and then buy this. This is a book to own if you aspire to become a good Agile/OO practitioner. Don’t worry about the C# in the title – it applies universally.

Saturday, January 30, 2010

A new tool for the toolbox!

Firstly - my VM setup:

I'm running Virtualbox with Xubuntu 9.10 on Win7 host - and its pretty. Its on a office standard issue Dell D531 - meaning they're AMD Turion X2 TL-60 and 2GB of RAM.

Now the Turion's supposed to have hw virtualization (AMD-V) however, the moment hw virtualization was enabled in virtualbox and I tried starting the vm, the machine would hard reboot!!!

After searching high and low, turns out that its an issue with Dell bioses and they dont have any updates. Here's a page that tracks the issue. Imagine my happiness when a couple of days ago, found that dell had released an unofficial bios update (T12).Well, its gone in, and things are running swimmingly well - my VM now has 2 procs, is stable and I hardly feel I'm in a VM :). In fact, this post is coming from the VM - firefox with 12 tabs, a few terminals and emacs running on 600 MB of RAM.

Now let me come to the new tool I was talking about

I like to run the VM full screen - feels best that way. After trying out enough and more of multiple desktop softwares, have finally settled on VirtuaWin - beats the crap out of other tools, systray integration is great, has window rules and so on. Over the past couple of weeks, its come close to the ideal tool - does the job well and you hardly know its there :-)

Friday, August 28, 2009

Hudson for CI - Tips, Tricks and insights

Just started using Hudson recently and I'm wowed! It's head and shoulders above CruiseControl and things that I like a lot are

Snappy web based config - felt great that I could set up a CI build with essentially the repo path alone

Plugin system!

Deep maven2 integration (though read on below that this isnt always what works)

Trending data OOB - essentially giving you nice charts about how your build is doing over time

Now that I've said all the very nice things about it, here's a few things that were hard to figure out/or weren't immediately apparent. If your maven builds aggregates modules then you'll find the experience a bit challenging

The generated site doesnt work: Basically, the link is to one of the modules' site instead of a link to the parent project. This apparently is a known issue and the solution on hudson user list is to run the site:deploy goal and have a link in the project description to point to that url

Code coverage: none of the coverage tools (EMMA, clover etc) support code coverage over a multi module build. Since coverage is very important to me, I eventually resorted to having separate build jobs instead of using the default multi module support. Here's how my svn structure looks
[sourcecode language="sh"]
/trunk/basebuild #contains the parent pom
/trunk/project1 # pom refers to ../basebuild/pom.xml
/trunk/project2 # ditto here
[/sourcecode]

With the directory structure above, there are build jobs for project1 and project2. Each build job checks out both the project folder (/trunk/project1) and the basebuild folder so that the POM references work.
One undesirable effect of this set up is that if project 2 depends on project 1, then project 1 build will have to install the artifact to the local repo for the project2 build to work.

Findbugs plugin - Running maven builds with findbugs configured did a Out of Memory (OOM) and failed the build. I tried setting MAVEN_OPTS to -Xmx512M at a bunch of places and nothing worked. Eventually, it turned out that the right place to specify it is in the Hudson COnfigure job page in the build section!

Violations plugin - This is a great little hudson plugin. However, I couldnt get this to work with a inherited POM setup above. Eventually resorted to using Findbugs and PMD hudson plugins individually.

I should mention that I'm running hudson 1.321 with the latest plugins. If you have any tips to share on running hudson - please do drop a link in the comments. Overall, a great big 'thank you' to the Hudson folks!

Wednesday, August 26, 2009

Recipe: Unit testing Apache CXF RESTful services

Recently, decided to use Apache CXF to expose a service with a RESTful API. Part of the reason for choosing REST had more to do with the fact that the client is going to be a mobile client. These days, though mobile devices stacks have come a long way and provide SOAP clients, it still seems prudent to not depend on a whole slew of technologies where plain 'ole HTTP and JSON might do the trick.
As I started exploring CXF, I liked the JAX-RS implementation and decided to go ahead with it - however, almost immediately, hit a snag when I went on to write test cases. Apache CXF documentation is not quite there and things do require some investigation - at least initially till you get a hang of the framework. As it took time to figure out the solution, it makes sense to share it on blogosphere. Here's how to go about writing unit tests:

Firstly, the service and the service implementation:

[sourcecode language="java"]
package com.aditi.blackberry.web;

import javax.ws.rs.FormParam;
import javax.ws.rs.GET;
import javax.ws.rs.HeaderParam;
import javax.ws.rs.POST;
import javax.ws.rs.Path;
import javax.ws.rs.Produces;
import javax.ws.rs.QueryParam;
import javax.ws.rs.core.Response;

@Path("/chat")
@Produces("application/json")
public interface ChatWebService {

@POST
@Path("connect")
public Response connect(@FormParam("user")String username, @FormParam("pass")String password);
}
[/sourcecode]

The service implementation:

[sourcecode language="java"]
package com.aditi.blackberry.web;

import javax.ws.rs.Produces;
import javax.ws.rs.core.Response;
import javax.ws.rs.core.Response.Status;

@Produces("application/json")
public class ChatWebServiceImpl implements ChatWebService {

public Response connect(String username, String password) {
if(username ==null || "".equals(username) ||
password ==null || "".equals(password)) {
return Response.status(Status.BAD_REQUEST).build();
}
String[] response = {username, password};
return Response.ok(response).build();
}
}
[/sourcecode]

The corresponding spring context xml (applicationContext.xml) is:

[sourcecode language="xml"]

xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:p="http://www.springframework.org/schema/p"
xmlns:aop="http://www.springframework.org/schema/aop" xmlns:tx="http://www.springframework.org/schema/tx"
xmlns:jaxrs="http://cxf.apache.org/jaxrs" xmlns:cxf="http://cxf.apache.org/core"
xsi:schemaLocation="http://www.springframework.org/schema/beans
http://www.springframework.org/schema/beans/spring-beans-2.5.xsd
http://www.springframework.org/schema/aop
http://www.springframework.org/schema/aop/spring-aop-2.5.xsd
http://www.springframework.org/schema/tx
http://www.springframework.org/schema/tx/spring-tx-2.5.xsd
http://cxf.apache.org/core http://cxf.apache.org/schemas/core.xsd
http://cxf.apache.org/jaxrs http://cxf.apache.org/schemas/jaxrs.xsd">

[/sourcecode]

A few things to note here - logging is turned on using interceptors and the jaxrs server is defined. I'm also using flexJson to convert arbitrary objects to json - so a MessageBodyWriter bean is also injected into the jaxrs server node. The most important thing is that we havent included either the cxf-servlet.xml config for the cxf-extension-http-jetty.xml. Essentially, what we want to do is for the actual build, include cxf-servlet.xml and for the test runs, run the service on the bundled jetty server.

So, go ahead and define a applicationContext-web.xml:

[sourcecode language="xml"]
class="org.springframework.beans.factory.config.PropertyPlaceholderConfigurer">

[/sourcecode]

This is the context xml that we'll provide to the ContextLoaderListener in our web.xml.

For the test cases, define applicationContext-test.xml - this is the context xml which we'll load from the test cases.

[sourcecode language="xml"]
class="org.springframework.beans.factory.config.PropertyPlaceholderConfigurer">

[/sourcecode]

As you see, we also define a jaxrs:client for the test context xml.

There's one final issue to address - which is that we would ideally like the urls we use to access the service to be the same. The spring jaxrs:server binding takes an address attribute which defines the url the service is hosted on. For deployment onto an external container, this takes the form of "/myservice" - a path element relative to the context location. For the internal jetty hosted service, it takes the full http path (http://localhost:port/my/path/to/service). The easiest way is to have this set using a property reference in spring and have the applicationContext-web.xml and applicationContext-test.xml load different property files as shown in above.

For completeness, here's the web.xml:

[sourcecode language="xml"]

xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://java.sun.com/xml/ns/javaee http://java.sun.com/xml/ns/javaee/web-app_2_5.xsd">

CXF REST Example

contextConfigLocation

classpath:/applicationContext-web.xml

org.springframework.web.context.ContextLoaderListener

CXFServlet

org.apache.cxf.transport.servlet.CXFServlet

CXFServlet

[/sourcecode]

And finally, here's a junit test case:

base class:

[sourcecode language="java"]
package com.aditi.blackberry.web;

import org.junit.runner.RunWith;
import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.beans.factory.annotation.Qualifier;
import org.springframework.test.context.ContextConfiguration;
import org.springframework.test.context.junit4.SpringJUnit4ClassRunner;

@RunWith(SpringJUnit4ClassRunner.class)
@ContextConfiguration(locations = { "classpath:/applicationContext-test.xml" })
public abstract class AbstractApiTest {

@Autowired
@Qualifier("chatclient")
protected ChatWebService proxy;
}
[/sourcecode]

A test case for the connect API:

[sourcecode language="java"]
package com.aditi.blackberry.web;

import javax.ws.rs.core.Response;
import javax.ws.rs.core.Response.Status;

import org.junit.Assert;
import org.junit.Test;

public class ConnectApiTest extends AbstractApiTest{
@Test
public void testConnect() {
Response resp = proxy.connect("raghu", "password");
Assert.assertTrue(resp.getStatus() == 200);
System.out.println(resp.getEntity().toString());
}
}
[/sourcecode]

Thursday, January 01, 2009

PIL vs Imagemagick

Decided that I want to timestamp my photo collection with the date from the exif data. Many digicams have an option to do this - unfortunately, my Panasonic DMC-LZ8 doesn't seem to do this. I knew imagemagick would do the trick, but thought it would be a good time to play around with PIL and python.

Here's my PIL effort - functional, but one that came with quite some amount of googling and trying to make sense of the PIL documentation which is inadequate at best.

[sourcecode language="python"]

from PIL import Image
from PIL import ImageFont, ImageDraw
from PIL.ExifTags import TAGS
from os.path import basename, dirname,join
import logging
import sys
import datetime
import time

# Important: I set out to write the image annotation in PIL - there's one serious drawback though. When saving
# the image, the exif data is'nt preserved.

logging.basicConfig(level=logging.DEBUG,
                    format='%(asctime)s %(levelname)s %(message)s')
logger = logging.getLogger()
logger.level = logging.DEBUG

def readExif(image):
    info = image._getexif()
    ret ={}
    for tag,value in info.items():
        ret[TAGS.get(tag,tag)] = value
    dt = datetime.datetime (*time.strptime (ret['DateTime'],"%Y:%m:%d %H:%M:%S")[0:6])
    ret['DateTime'] = dt
    return ret

def annotateImage (file):
    i = Image.open(file)
    font = ImageFont.truetype("/usr/share/fonts/truetype/ttf-dejavu/DejaVuSans-Bold.ttf", 36)
    exif = readExif(i)
    draw = ImageDraw.Draw(i)
    width, height = i.size
    draw.text((width * 0.7, height - 100),exif['DateTime'].strftime("%a %d-%b-%Y %l:%M %p"), font=font, fill='orange')
    outfile = join(dirname(file), "Ann_" + basename(file))
    i.save (outfile, quality=98)
    logger.debug (outfile + " saved")

if __name__== "__main__":
    logger.debug ("getting exif for " + sys.argv[1])
    for file in sys.argv[1:]:
        logger.debug ("Annotating " + file)
        annotateImage(file)

[/sourcecode]

Unfortunately, PIL has a fatal flaw - you can annotate the image and save it - but then the saved image doesn't retain the original image's exif metadata. I also tried the exiv2 library, but couldnt figure out a way to load the image, annotate it and then copy over the metadata. Googling around didn't turn up any intersting solutions - so if any of you have any ideas, please share.

Meanwhile, as I was getting tired of coaxing PIL to do what I want, I just wrote a a little bash script to do the same in imagemagick. Its as painless as it can be, comes with excellent documentation, hardly any gotchas, a world of options in case you feel creative and the job gets done in 10 mins. Here's the bash script below.

#! /bin/bash
# script adds a black 18px bottom  border to the pic with the Exif datetime tag
# no safety checks :). Original pics are left untouched.
while [ "x$*" != "x" ]
do
file=$1;
shift;
outfile="$(dirname "$file")/Ann_$(basename "$file")"
echo $outfile
echo $file
date=$(identify -verbose "$file" | grep 'DateTime:'| sed 's/  Exif:DateTime: //;s/:/-/;s/:/-/')
date="$(date -d "$date" +"%a %d-%b-%Y %l:%M %p")"
convert "$file" -size 1x18 xc:Black -fill White -background Black -append -gravity Southeast -draw "text 0,0 '$date'"  "$outfile"
done

Overall, the experience left me dissappointed and dissatisfied with PIL.

Tuesday, October 07, 2008

andLinux with Hardy Heron

andLinux is built on top of co-linux (co operative linux) and basically runs side by side with Windows. andLinux packages the whole thing better (coLinux bundled with Xming and a nice systray app allowing you to launch Linux apps right in windows).

Here's details on getting off the ground - and the reason that I have this post is that though andLinux comes with an installer application, it still needs some amount of fiddling under the hood to make it work. This post is just to make sure I can go through the process again when the time comes

When installing andlinux, choose the COFS option for making your hard drive visible in Linux

Install with the command line option to launch andLinux (do not install it as a service just yet)

Post installation, tweak andlinux's network setup - set up a couple of virtual TAP adapter . You will have to tweak things both on the linux side and on the windows side. Basically, you create a 2 TAP adapters - one is a loopback and another for sharing your LAN connection. Your wireless network is shared via Slirp (doesnt need a TAP adapter setup).

Keep in mind a gotcha - slirp wont allow you to ping - so if you have only slirp working, then try a wget www.google.com to check if you have network connectivity.

Start the andlinux server (or if its already running) make sure that your c drive is shared - on the bash prompt you should be able to do ls /mnt/windows

do a apt-get update to update your package list. run an update. As of this time, the only prebuilt images on andlinux.org is gutsy.

do a apt-get install update-manager-core

run do-release-upgrade - and you should see apt running and updating your system to hardy.

Monday, June 23, 2008

Compact Ubuntu

I've always hated the fact that on Ubuntu with the default themes, there's far too much space wasted. The buttons are too tall, the treeview wastes too much space so that if you're on eclipse or some other ide, you see a precious few items on the screen.

I've been trying to tweak it to no end - even looking to see if there are any ~/.gtkrc-2.0 tweaks. Found a few links such as this Making Eclipse look good on Linux - Max's blog - however, didn't really satisfy my need.

And so it stayed until today when I came across Clearlooks Compact Gnome Theme.

I love it - one more for my list of must-haves!

Wednesday, June 18, 2008

Enjoy symlinks and hardlinks on NTFS

Can't believe I didnt come across this before - if you've gotten used taming your hdd by creating links to folders and have been annoyed with the lack of symlinks and hardlinks on NTFS, then despair no more. I've been using Mark Russinovich's (of sysinternals fame) tool - junction.exe all this while and though it works great, have always wanted something that would integrate with Explorer too. For an in-depth discussion - read http://shell-shocked.org/article.php?id=284 Anyways, I'm extremely happy with NTFS Link - this will surely go into my list of "Must have tools - install immediately on a new machine" list :-)

Upgrade blues - upgrading to Firefox 3 final from Firefox RC 3

As evident from other posts here - have been keenly waiting for the FF 3 final. Imagine my surprise when the "Check updates" didnt find an upgrade! (I'm on FF3 rc3).

Anyway, so off I went to Mozilla.org and downloaded a copy of the final - and did my bit towards FF download day. Happily installed it - all defaults as usual. Install told me that it was installing into the same location as my current installation (c:\program files\mozilla firefox 3 beta 1 - that's where my FF3 install have been going - all the way from b1 to b5 and then from rc1 to rc3 - so no surprise).

Well, installation completed successfully, and I started FF 3 - but my title bar still says Build 2008052906 - even the file version has the same build ID.

Something's up - don't know what yet - but has anyone else had a similar experience?

Monday, June 16, 2008

Desultory Monday...

This entry was posted using Its all text on Firefox 3.0 RC2 on Ubuntu Hardy heron, with emacs 23 snapshot as the editor. I love it :-)

Well, Its all Text is great if you hate typing into webforms with textboxes that make editing such a big pain in the butt.

Its great to see that Its All text has been updated to work with FF 3.0 now. The fun would be to see if this works on Windows with cygwin emacs as the editor. Had problems the last time I tried that - but that's been sometime ago now.

Today's been a desultory Monday. Spent sometime getting emacs snapshot with pretty fonts on my hardy. Its beautiful.

The next thing has been mostly scratching my head on hadoop. What I'd like to do is parse an access log and generate multiple outputs - ie single input of gobs of web access logs and multiple outputs - with say requests by country, popular pages, % of client browser and so on.

parse web log

pull out remote ips and use geo ips to find the originating country

pull out user agent field and figure out browser distribution.

Filter the requested resource and pull out only pages - find pages by popularity

Now there seem to be quite a number of ways of doing this -

Code the whole thing in Java - and this is where I'm getting into analysis paralysis.
Look at ways to generate multiple outputs from MapRed and then use Job and JobControl to setup the pipeline.

Use Pig - Pig examples on the Pig overview page seem to suggest that this should be trivial with Pig.

Use Cascading - seems to be doing the same thing - will need to do this in JRuby or Groovy though.

Will post an update once I get through the java route

Thursday, June 12, 2008

VPN into Windows VPN Server from Ubuntu Hardy Intrepid

** Update 2008/11/17 **: Networkmanager is broken in intrepid. To get it working had to install network manager from ppa as given here - http://www.ubuntu-forums.com/showpost.php?s=e0d93c09b8c340976477456593ac4cf7&p=6094870&postcount=5

Ok - this was easy - and while there's some resources on google, I had to figure out a few itty bitty things for my work VPN setup.

install

network-manager-pptp

pptp-linux

Restart network manager with

killall nm-applet
sudo /etc/init.d/dbus restart
nm-applet --sm-disable &

Configure VPN settings

Click on the network manager applet and click on VPN connections

Create a new VPN connection

Ensure that you select Refuse CHAP in the authentication tab.

In the routing tab, you can give netmasks that need to go through VPN - for my work network, I have: 10.10.5.0/24 172.16.106.0/24

That's it. Now click on the Network applet, and connect to your VPN. In the authentication dialog, use <domain>\username and your windows domain password.

Thursday, June 05, 2008

Scalability principles: Lessons from eBay

InfoQ: Scalability Best Practices: Lessons from eBay

Great article on Ebay scalability principles.

Thursday, May 01, 2008

Drip...

Drip..

IMG_3370_crop, originally uploaded by Raghu Rajagopalan.

Drip....

Water droplets - dipping a toe in macro photography

he he - so I have a canon S3 IS - got it last year since it allows enough manual control while also having family friendly thingies like video :). Also, with the chdk hack, the S3 IS is good enough for me to experiment.

So, one of these long time itches has been to take a water droplet splash - you know, the immensely close up snaps where you see a single drop splashing...

Here's the snaps after two evenings of trial and error (mostly errors though) - feeling quite smug myself :)

Wednesday, April 02, 2008

Free subversion hosting - What's the best?

All - I've just signed up for an Assembla account - these folks provide free subversion hosting with a 500 meg space and unlimited spaces.

Will see how it goes.

Firefox 3 beta 5 released. Yahoo Mail is still broken.

Firefox 3 Beta 5 release today. Release notes and downloads here.

Installed it as soon as I got to know today morning and the first thing to check was whether Yahoo Mail still crashed. Initially, Yahoo Mail seemed to work alright for all of 50 seconds - quickly moving over items in inbox caused Firefox to crash :-(

Guess will wait for some more time. I'm sure there's a bug report somewhere on this - Yahoo mail was broken on Beta 2, got fixed in Beta 3, then was broken in Beta 4 and is still broken on Beta 5.

Will wait for it to be fixed - Any idea if this is a firefox issue or a Yahoo! issue? Seems odd that script can cause the browser to crash so badly.

Monday, March 31, 2008

Hardy heron - first impressions

He he :-) - finally got Ubuntu Hardy heron beta on my home and work laptop. first impressions below:

1. Wubi install from within windows is easy and works great. If after setting up so many boxes, I can go on and on about it, I'm sure that its great help for anyone who's on Windoze. I mean, the barrier to entry has never gone down so much.

2. I guess once you've installed via Wubi and configured your system to your liking, you can uninstall and take an image that you finall install to a dedicated partition - isn't that just awesome.

3. Comes installed with Firefox 3b4 -which is awesome. Given that FF crashes badly on yahoo, this might be a bummer for many people. Should probably have some first time customization that will let you install Opera.

4. Installation is super fast - took about 10 mins for wubi to install, reboot once, finish installation and reboot again. Grub default to Last selected would probably be a better idea.

The not so good

1. Wifi doesnt work out of the box - didn't on my Dell Inspiron 1501 or the Dell Latitude D620. Its the ye olde broadcom problem. This is really the BIGGEST turn off. Hope it will get fixed by the time the final release is out. Meanwhile, had to jump through hoops getting ndiswrapper in. I didn't go the broadcom fwcutter way since that only allows a 802.11b connection from what I read. I'm still not sure what fixed the issue - irrespective, I had to update the system and then things started working like a charm.

2. Compiz configuration isnt installed by default. If this is your first time on Ubuntu and you've come this way to see the awesome 3D desktop, then this is a bummer. Finding out what you need to do is a pain too.

I think that's all there is to it. Its great once wifi starts working normally.

Friday, March 28, 2008

Gnuplot, dstat - easy graphing on Linux

Recently, started fiddling around with how to monitor and graph performance data on linux boxes. Other than the usual tools like top and vmstat, which are either interactive (top) or too textual to do anything much.

First off, vmstat, doesnt lend itself well to graphing without additional scripts to lay out the data so tools like gnuplot can be used. Secondly, and more seriously, it doesn't include a timestamp in the output.

Looking around a bit found that dstat seems to be a good replacement to vmstat (and iostat) - and the generated data is consumable with gnuplot.

Here's a quick example of generating graphs for CPU user, system and idle times

dstat -tc 5 500 > dstat.raw

now fire up gnuplot and go ahead and plot it

gnuplot> set xdata time gnuplot> set timefmt "%s" gnuplot> set format x "%M:%S" gnuplot> plot "dstat.raw" using 1:2 title "User" with lines, "dstat.raw" using 1:3 title "Sys" with lines, "dstat.raw" using 1:4 title "Idle" using lines

To make gnuplot generat an output file, you need

gnuplot> set term png

gnuplot> set output "dstat.png"

gnuplot> replot

And you're done. here's the graph generated on my machine. There's loads more that you can do - and admittedly, you can do everything by dumping your file to excel. However, that doesn't lend itself well to a completely automated process. When you're doing performance testing and such like, you will likely repeat this enough number of times. Not having to do it manually helps big time!

Thursday, March 27, 2008

Working with huge XML files - tools of the trade.

XMLStarlet is great for slicing and dicing huge XML files. Had a run in recently - had a 80 Mb XML file in a single line :D. Guess what, most editors that I tried balked and fell over. This was on a 2Gig Core2 Duo machine.

XMLSpy, vi, emacs, notepad++ all died - and trying to do something with a 80 Gig XML where the 80 gigs are on a single line isnt much fun. So the first order of business was to pretty print the XML. XMLstarlet worked great -

xmlstarlet fo file.xml > output.xml

and you're done.

The next order of business was that we needed to validate the XML document against a schema. Our first attempt was with Sun's multi schema validator (MSV). MSV does not validate the whole document but instead stops after a certain number of failures. So, MSV - out, XMLStarlet in. XMLStarlet can validate documents again W3C schema, DTD or a RELAXNG schema.

xmlstarlet val --err --xsd schema.xsd input.xml >  errors.txt

And presto! - you get an error report that you can slice and dice with sed/awk or anything else at all.

XMLStarlet also allows you to write Xpaths to query the xml - however, I found the syntax too weird and round about. A better alternative is a perl based solutions - XSH2 - a command line xml editing shell. You can install it under cygwin and it supports basic command pipelining and redirection.

So go ahead and launch XSH. At your cygwin prompt

[~]xsh
---------------------------------------
 xsh - XML Editing Shell version 2.1.1
---------------------------------------

Copyright (c) 2002 Petr Pajas.
This is free software, you may use it and distribute it under
either the GNU GPL Version 2, or under the Perl Artistic License.
Using terminal type: Term::ReadLine::Gnu
Hint: Type `help' or `help | less' to get more help.
$scratch/>

Now, lets load up our document, type

$scratch/>$x:=open formatted.xml

Your prompt changes to

$x/>

So go ahead and try a few xpaths

$x/> ls /path/to/node

and XSH prints out the matching nodes. Now what if you need to create a document fragment of nodes matching a certain xpath? Piece of cake - do ahead

$x/> ls /path/to/node | tee fragment.xml

XSH2 has many, many more features - but this should be good enough to get you off the ground.

Subscribe to: Posts ( Atom )

Pages