<rdf:RDF
    xmlns:s='http://snipsnap.org/rdf/snip-schema#'
    xmlns:rdf='http://www.w3.org/1999/02/22-rdf-syntax-ns#'
    xml:base='http://community.moertel.com/ss/rdf'>
    <s:Snip rdf:about='http://community.moertel.com/ss/rdf#start/2005-08-10/1'
         s:name='start/2005-08-10/1'
         s:cUser='tmoertel'
         s:oUser='tmoertel'
         s:mUser='tmoertel'>
        <s:content>1 Simple data formats are not going away {anchor:Simple data formats are not going away}&#xA;I saw that {link:ziggy wrote a small program to compute summary statistics|http://use.perl.org/~ziggy/journal/26221}&#xA;and emit the output in ~~name~~=~~value~~ format.&#xA;This is the same output format that my&#xA;{link:stats|http://community.moertel.com/ss/space/Tom%27s+Perl+code/stats|img=none}&#xA;program uses, and it&apos;s great.  First, it&apos;s easy to eyeball:&#xA;&#xA;{code:none}&#xA;$ __stats rnorm100.dat__&#xA;count   = 100&#xA;min     = -2.567090756&#xA;10% cut = -0.9992493139&#xA;25% cut = -0.5645971535&#xA;median  = 0.0074455585&#xA;mean    = 0.0347402067700001&#xA;75% cut = 0.542629871&#xA;90% cut = 1.3079738435&#xA;max     = 2.051265585&#xA;var     = 0.788500663119147&#xA;stdev   = 0.88797559826785&#xA;popvar  = 0.780615656487955&#xA;popstdv = 0.883524564733746&#xA;{code}&#xA;&#xA;Second, it&apos;s easy to manipulate.  If I just want the median and mean,&#xA;for example, I grep for them:&#xA;&#xA;{code:none}&#xA;$ __stats rnorm100.dat | grep me__&#xA;median  = 0.0074455585&#xA;mean    = 0.0347402067700001&#xA;{code}&#xA;&#xA;For mass analysis, however, this format is too verbose: I ~~do not~~&#xA;want to look at one hundred of these summaries to try to figure out&#xA;the big picture.  What I want is to see ~~all~~ of the stats at&#xA;once.  I want a summary table: each data set in its&#xA;own row and each summary statistic in its own column.&#xA;&#xA;While it is easy to write an ad-hoc program to compile the individual&#xA;summaries into mass summary, I wrote a small program {link:tabulate|http://community.moertel.com/ss/space/Tom%27s+Perl+code/tabulate|img=none} that&#xA;is more flexible and reusable.  It reads a stream of&#xA;~~name~~=~~value~~ pairs, deduces the record structure of the steam,&#xA;and emits a corresponding summary table.  I can concatenate a bunch&#xA;of summaries, and ~~tabulate~~ will figure out how to split&#xA;them up.&#xA;&#xA;For example, if I give ~~tabulate~~ a single summary, it gives&#xA;me back a single-row table:&#xA;&#xA;{code:none}&#xA;$ __stats rnorm100.data | grep me | tabulate__&#xA;median        mean&#xA;0.0074455585  0.03474020677&#xA;{code}&#xA;&#xA;If I give it two summaries (there are two data sets in my working directory), it gives me back two rows:&#xA;&#xA;{code:none}&#xA;$ __for set in *.dat; do stats $set | grep me; done | tabulate__&#xA;median        mean&#xA;0.5122150670  0.88521159344&#xA;0.0074455585  0.03474020677&#xA;{code}&#xA;&#xA;Now, however, I can&apos;t tell which set of statistics is which.  But&#xA;this problem is easy to solve: I just prepend each data set&apos;s name to&#xA;its stream of summary statistics.  The simple data format&#xA;makes this easy, and I can even use ~~echo~~ to do the job&#xA;in my shell&apos;s ~~for~~ loop:&#xA;&#xA;{code:none}&#xA;for dataset in *.dat; do&#xA;    echo dataset = $dataset  # insert name into stream&#xA;    stats $dataset | grep me&#xA;done |&#xA;tabulate&#xA;{code}&#xA;&#xA;Now I get the results I need:&#xA;&#xA;{code:none}&#xA;dataset       median        mean&#xA;exp100.dat    0.5122150670  0.88521159344&#xA;rnorm100.dat  0.0074455585  0.03474020677&#xA;{code}&#xA;&#xA;I would not want try that with XML, which is why I think that simple&#xA;ASCII formats will be with us forever.&#xA; </s:content>
        <s:mTime>2005-08-10 15:39:31.805</s:mTime>
        <s:cTime>2005-08-10 15:03:56.694</s:cTime>
        <s:comments
             rdf:type='http://www.w3.org/1999/02/22-rdf-syntax-ns#Bag'/>
        <s:snipLinks>
            <rdf:Bag>
                <rdf:li rdf:resource='#tmoertel'/>
                <rdf:li rdf:resource='#snipsnap-index'/>
                <rdf:li rdf:resource='http://community.moertel.com/ss/rdf#'/>
                <rdf:li rdf:resource='#pxsl'/>
                <rdf:li rdf:resource='http://community.moertel.com/ss/rdf#Talk - Fun with Asterisk and Perl'/>
                <rdf:li rdf:resource='http://community.moertel.com/ss/rdf#A Coder&apos;s Guide To Coffee'/>
                <rdf:li rdf:resource='http://community.moertel.com/ss/rdf#space/start/2005-08-10/1'/>
                <rdf:li rdf:resource='#LectroTest'/>
                <rdf:li rdf:resource='http://community.moertel.com/ss/rdf#RPMs/'/>
                <rdf:li rdf:resource='#PXSL'/>
                <rdf:li rdf:resource='http://community.moertel.com/ss/rdf#Alsa - No sound'/>
            </rdf:Bag>
        </s:snipLinks>
        <s:attachments
             rdf:type='http://www.w3.org/1999/02/22-rdf-syntax-ns#Bag'/>
    </s:Snip>
</rdf:RDF>
