grafter.tabular

Functions for processing tabular data.

_

An alias for the identity function, used for providing positional arguments to mapc.

add-column

(add-column dataset new-column value)
Add a new column to a dataset with the supplied value lazily copied
into every row within it.

add-columns

(add-columns dataset hash)(add-columns dataset source-cols f)(add-columns dataset new-col-ids source-cols f)
Add several new columns to a dataset at once.  There are a number of different parameterisations:

(add-columns ds {:foo 10 :bar 20})

Calling with two arguments where the second argument is a hash map
creates new columns in the dataset for each of the hashmaps keys and
copies the hashes values lazily down all the rows.  This
parameterisation is designed to work well build-lookup-table.

When given either a single column id or many along with a function
which returns a hashmap, add-columns will pass each cell from the
specified columns into the given function, and then associate its
returned map back into the dataset.  e.g.

(add-columns ds "a" (fn [a] {:b (inc a) :c (inc a)} ))

; =>

| a | :b | :c |
|---+----+----|
| 0 |  1 |  1 |
| 1 |  2 |  2 |

As a dataset needs to know its columns in this case it will infer
them from the return value of the first row.  If you don't want to
infer them from the first row then you can also supply them like so:

(add-columns ds [:b :c] "a" (fn [a] {:b (inc a) :c (inc a)} ))

; =>

| a | :b | :c |
|---+----+----|
| 0 |  1 |  1 |
| 1 |  2 |  2 |

all-columns

(all-columns dataset cols)
Takes a dataset and any number of integers corresponding to column
numbers and returns a dataset containing only those columns.

If you want to use infinite sequences of columns or allow the
specification of more cols than are in the data without error you
should use columns instead.  Using an infinite sequence with this
function will result in non-termination.

One advantage of this over using columns is that you can duplicate
an arbitrary number of columns.

apply-columns

(apply-columns dataset fs)
Like mapc in that you associate functions with particular columns,
though it differs in that the functions given to mapc should receive
and return values for individual cells.

With apply-columns, the function receives a collection of cell
values from the column and should return a collection of values for
the column.

build-lookup-table

(build-lookup-table dataset key-cols)(build-lookup-table dataset key-cols return-keys)
Takes a dataset, a vector of any number of column names corresponding
to key columns and a column name corresponding to the value
column.
Returns a function, taking a vector of keys as
argument and returning the value wanted

column-names

If given a dataset, it returns its column names. If given a dataset and a sequence
of column names, it returns a dataset with the given column names.

columns

(columns dataset cols)
Given a dataset and some columns, narrow the dataset to just the
supplied columns.

cols are paired off with columns in the data and then a selection is
done.  Any cols left over after the pairing are discarded, but if a
selected col is not actually in the data an IndexOutOfBoundsException will
be thrown.

This function can safely be used with infinite sequences.

derive-column

(derive-column dataset new-column-name from-cols)(derive-column dataset new-column-name from-cols f)
Adds a new column to the end of the row which is derived from
column with position col-n.  f should just return the cells value.

If no f is supplied the identity function is used, which results in
the specified column being cloned.

drop-rows

(drop-rows dataset n)
Drops the first n rows from the dataset.

grep

multimethod

Filters rows in the table for matches.  This is multi-method
dispatches on the type of its second argument.  It also takes any
number of column numbers as the final set of arguments.  These
narrow the scope of the grep to only those columns.  If no columns
are specified then grep operates on all columns.

lift->vector

(lift->vector x)
FIXME: write docs

make-dataset

(make-dataset)(make-dataset data)(make-dataset data columns-or-f)
Like incanter's dataset function except it can take a lazy-sequence
of column names which will get mapped to the source data.

Works by inspecting the amount of columns in the first row, and
taking that many column names from the sequence.

Inspects the first row of data to determine the number of columns,
and creates an incanter dataset with columns named alphabetically as
by grafter.sequences/column-names-seq.

mapc

(mapc dataset fs)
Takes a vector or a hashmap of functions and maps each to the key
column for every row.  Each function should be from a cell to a
cell, where as with apply-columns it should be from a column to a
column i.e. its function from a collection of cells to a collection
of cells.

melt

(melt dataset & pivot-keys)
Melt an object into a form suitable for easy casting, like a melt function in R.
It accepts multiple pivot keys (identifier variables that are reproduced for each
row in the output).
(use '(incanter core charts datasets))
(view (with-data (melt (get-dataset :flow-meter) :Subject)
(line-chart :Subject :value :group-by :variable :legend true)))
See http://www.statmethods.net/management/reshape.html for more examples.

move-first-row-to-header

(move-first-row-to-header [first-row & other-rows])
For use with make-dataset.  Moves the first row of data into the
header, removing it from the source data.

open-all-datasets

(open-all-datasets dir & {:keys [metadata-fn make-dataset-fn], :or {metadata-fn without-metadata-columns, make-dataset-fn make-dataset}})
Returns a sequence of incanter.core.Dataset's, recursively found beneath
a given directory.

Files may contain one or more datasets.

By default it returns the sheets un-altered by using
without-metadata-columns as its metadata function.

You can provide it with other metadata functions which will splice
the context into the sheet as new columms.

open-tabular-file

multimethod

Takes a File or String as an argument and coerces it based upon its
file extension into its concrete low-level representation, provided by
the file adapter.

This is intended to be used by adapter developers, and developers who
need access to low level details.  Specific to the adapter.  Normal
users should prefer open-all-datasets.

Supported files are currently csv or Excel's xls or xlsx files.

Additionally open-as-table takes an optional set of key/value
parameters which will be passed to the concrete function opening the
file.

Supported options are currently:

:ext - An overriding file extension (as keyword) to force a particular
       file type to be opened instead of looking at the files extension.

rename-columns

(rename-columns dataset col-map-or-fn)
Renames the columns in the dataset.  Takes either a map or a
function.  If a map is passed it will rename the specified keys to
the corresponding values.

If a function is supplied it will apply the function to all of the
column-names in the supplied dataset.  The return values of this
function will then become the new column names in the dataset
returned by rename-columns.

resolve-column-id

(resolve-column-id dataset column-key)(resolve-column-id dataset column-key not-found)
Finds and resolves the column id by converting between symbols and
strings.  If column-key is not found in the datsets headers then
not-found is returned.

resolve-key-cols

(resolve-key-cols dataset key-cols)
FIXME: write docs

rows

(rows dataset row-numbers & {:as opts})
Takes a dataset and a seq of row-numbers and returns a dataset
consisting of just the supplied rows.  If a row number is not found
the function will assume it has consumed all the rows and return
normally.

swap

(swap dataset first-col second-col)(swap dataset first-col second-col & more)
Takes an even numer of column names and swaps each column

take-rows

(take-rows dataset n)
Drops the first n rows from the dataset.

test-dataset

(test-dataset r c)
Constructs a test dataset of r rows by c cols e.g.

(test-dataset 2 2) ;; =>

| A | B |
|---+---|
| 0 | 0 |
| 1 | 1 |

with-metadata-columns

(with-metadata-columns [context data])
Takes a pair of [context, data] and returns a dataset.  Where the
metadata context is merged into the dataset itself.

without-metadata-columns

(without-metadata-columns [context data])
Ignores any possible metadata and leaves the dataset as is.