grafter.tabular

Functions for processing tabular data.

_

An alias for the identity function, used for providing positional arguments to mapc.

add-column

(add-column dataset new-column value)

Add a new column to a dataset with the supplied value lazily copied into every row within it.

add-columns

(add-columns dataset hash)(add-columns dataset source-cols f)(add-columns dataset new-col-ids source-cols f)

Add several new columns to a dataset at once. There are a number of different parameterisations:

(add-columns ds {:foo 10 :bar 20})

Calling with two arguments where the second argument is a hash map creates new columns in the dataset for each of the hashmaps keys and copies the hashes values lazily down all the rows. This parameterisation is designed to work well build-lookup-table.

When given either a single column id or many along with a function which returns a hashmap, add-columns will pass each cell from the specified columns into the given function, and then associate its returned map back into the dataset. e.g.

(add-columns ds "a" (fn [a] {:b (inc a) :c (inc a)} )) ; =>

a :b :c
0 1 1
1 2 2

As a dataset needs to know its columns in this case it will infer them from the return value of the first row. If you don’t want to infer them from the first row then you can also supply them like so:

(add-columns ds [:b :c] "a" (fn [a] {:b (inc a) :c (inc a)} )) ; =>

a :b :c
0 1 1
1 2 2

apply-columns

(apply-columns dataset fs)

Like mapc in that you associate functions with particular columns, though it differs in that the functions given to mapc should receive and return values for individual cells.

With apply-columns, the function receives a collection of cell values from the column and should return a collection of values for the column.

It is also possible to create new columns with apply-columns for example to assign row ids you can do:

(apply-columns ds {:row-id (fn [_] (grafter.sequences/integers-from 0))})

build-lookup-table

(build-lookup-table dataset key-cols)(build-lookup-table dataset key-cols return-keys)

Takes a dataset, a vector of any number of column names corresponding to key columns and a column name corresponding to the value column. Returns a function, taking a vector of keys as argument and returning the value wanted

column-names

If given a dataset, it returns its column names. If given a dataset and a sequence of column names, it returns a dataset with the given column names.

columns

(columns dataset cols)

Given a dataset and a sequence of column identifiers, columns narrows the dataset to just the supplied columns.

Columns specified in the selection that are not included in the Dataset will be silently ignored.

The order of the columns in the returned dataset will be determined by the order of matched columns in the selection.

The supplied sequence of columns are first cropped to the number of columns in the dataset before being selected, this means that infinite sequences can safely supplied to this function.

dataset?

(dataset? ds)

Predicate function to test whether the supplied argument is a dataset or not.

defgraft

macro

(defgraft name docstring? tabular->graph-fn)(defgraft name docstring? pipeline template quad-fn*)

Declares an entry point to a graph-generating pipeline allowing it to be exposed to the Grafter import service and executed via the leiningen plugin.

It is effectively equivalent to the following call with additional metadata benefits:

(def my-graft (comp make-graph my-pipeline))

It is used with defpipe to indicate that a transformation also supports conversion into graph data.

It takes an optional docstring, if no docstring is specified then a default docstring will be generated.

defpipe

macro

(defpipe & args)

Declares an entry point to a grafter pipeline, allowing it to be exposed to the Grafter import service and executed via the leiningen plugin.

It has the same form as “defn” but adds metadata to the defined var that lets pipelines be discovered at runtime through both syntactic and meta-data means.

derive-column

(derive-column dataset new-column-name from-cols)(derive-column dataset new-column-name from-cols f)

Adds a new column to the end of the row which is derived from column with position col-n. f should just return the cells value.

If no f is supplied the identity function is used, which results in the specified column being cloned.

drop-rows

(drop-rows dataset n)

Drops the first n rows from the dataset, retaining the rest.

graph-fn

macro

(graph-fn [row-bindings] & forms)

A macro that defines an anonymous function to convert a tabular dataset into a graph of RDF quads. Ultimately it converts a lazy-seq of rows inside a dataset, into a lazy-seq of RDF Statements.

The function body should be composed of any number of forms, each of which should return a sequence of RDF quads. These will then be concatenated together into a flattened lazy-seq of RDF statements.

Rows are passed to the function one at a time as hash-maps, which can be destructured via Clojure’s standard destructuring syntax.

Additionally destructuring can be done on row-indicies (when a vector form is supplied) or column names (when a hash-map form is supplied).

grep

multimethod

Filters rows in the table for matches. This is multi-method dispatches on the type of its second argument. It also takes any number of column numbers as the final set of arguments. These narrow the scope of the grep to only those columns. If no columns are specified then grep operates on all columns.

make-dataset

(make-dataset)(make-dataset data)(make-dataset data columns-or-f)

Like incanter’s dataset function except it can take a lazy-sequence of column names which will get mapped to the source data.

Works by inspecting the amount of columns in the first row, and taking that many column names from the sequence.

Inspects the first row of data to determine the number of columns, and creates an incanter dataset with columns named alphabetically as by grafter.sequences/column-names-seq.

mapc

(mapc dataset fs)

Takes a vector or a hashmap of functions and maps each to the key column for every row. Each function should be from a cell to a cell, where as with apply-columns it should be from a column to a column i.e. its function from a collection of cells to a collection of cells.

If the specified column does not exist in the source data a new column will be created, though the supplied function will need to either ignore its argument or handle a nil argument.

melt

(melt dataset pivot-keys)

Melt an object into a form suitable for easy casting, like a melt function in R. It accepts multiple pivot keys (identifier variables that are reproduced for each row in the output).

(use '(incanter core charts datasets))

(view (with-data (melt (get-dataset :flow-meter) :Subject)

(line-chart :Subject :value :group-by :variable :legend true)))

See http://www.statmethods.net/management/reshape.html for more examples.

move-first-row-to-header

(move-first-row-to-header [first-row & other-rows])

For use with make-dataset. Moves the first row of data into the header, removing it from the source data.

read-dataset

(read-dataset source & {:as opts})

FIXME: write docs

read-datasets

(read-datasets dataset & {:keys [format], :as opts})

Opens a lazy sequence of datasets from a something that returns multiple datasetables - i.e. all the worksheets in an Excel workbook.

rename-columns

(rename-columns dataset col-map-or-fn)

Renames the columns in the dataset. Takes either a map or a function. If a map is passed it will rename the specified keys to the corresponding values.

If a function is supplied it will apply the function to all of the column-names in the supplied dataset. The return values of this function will then become the new column names in the dataset returned by rename-columns.

resolve-column-id

(resolve-column-id dataset column-key)(resolve-column-id dataset column-key not-found)

Finds and resolves the column id by converting between symbols and strings. If column-key is not found in the datsets headers then not-found is returned.

resolve-key-cols

(resolve-key-cols dataset key-cols)

FIXME: write docs

rows

(rows dataset row-numbers)

Takes a dataset and a seq of row-numbers and returns a dataset consisting of just the supplied rows. If a row number is not found the function will assume it has consumed all the rows and return normally.

swap

(swap dataset first-col second-col)(swap dataset first-col second-col & more)

Takes an even numer of column names and swaps each column

take-rows

(take-rows dataset n)

Takes only the first n rows from the dataset, discarding the rest.

test-dataset

(test-dataset r c)

Constructs a test dataset of r rows by c cols e.g.

(test-dataset 2 2) ;; =>

A B
0 0
1 1

with-metadata-columns

(with-metadata-columns [context data])

Takes a pair of [context, data] and returns a dataset. Where the metadata context is merged into the dataset itself.

without-metadata-columns

(without-metadata-columns [context data])

Ignores any possible metadata and leaves the dataset as is.

write-dataset

(write-dataset destination dataset & {:keys [format], :as opts})

FIXME: write docs