In a previous post, I showed how we could use Clojure and specifically Incanter to process access logs to graph hits on our site. Now, we’re going to adapt our solution to allow us to to show the number of unique users over time.
We’re going to change the previous solution to pull out the core dataset representing the raw data we’re interested in from the access log –
records-from-access-log remains unchanged from before:
(col-names (to-dataset (records-from-access-log filename)) ["Date" "User"]))
The raw dataset retrieved from this call looks like this:
Now, we need to work out the number of unique users in a given time period. Like before, we’re going to use
$rollup to group multiple records by minute, but we need to work out how to summarise the user column. To do this, we create a custom summarise function which calculates the number of unique users:
(defn num-unique-items [seq] (count (set seq)))
Then use that to modify the raw dataset and graph the resulting dataset:
(defn access-log-to-unique-user-dataset [access-log-dataset] ($rollup num-unique-items "User" "Date" (col-names (conj-cols ($map #(round-ms-down-to-nearest-min (as-millis %)) "Date" access-log-dataset) ($ "User" access-log-dataset)) ["Date" "Unique Users"]))) (defn concurrent-users-graph [dataset] (time-series-plot :Date :User :x-label "Date" :y-label "User" :title "Users Per Min" :data (access-log-to-unique-user-dataset dataset))) (def access-log-dataset (access-log-to-dataset "/path/to/access.log")) (save (concurrent-users-graph access-log-dataset) "unique-users.png")
You can see the full source code listing here.