-
interesting collection of design patterns for Web UIs.
-
So basically, it's a level of redirection on just checking the dependencies in in the first place.
-
A clojure wrapper over jsch
-
Several kinds of awesome
-
A command-line interface for the cache header checking redbot. Very cool.
-
The Resource Expert Droid is an online app which validates cache headers for HTTP ans gives advice on them. Mundo coolio.
-
Video lectures from the SICP authors
-
In a similar vein to my recent Clojure & Incanter work, albeit with slightly more important data
-
Fantastic pictorial representation of a PHD
In a previous post, I showed how we could use Clojure and specifically Incanter to process access logs to graph hits on our site. Now, we’re going to adapt our solution to allow us to to show the number of unique users over time.
We’re going to change the previous solution to pull out the core dataset representing the raw data we’re interested in from the access log – records-from-access-log
remains unchanged from before:
[clojure]
(defn access-log-to-dataset
[filename]
(col-names (to-dataset (records-from-access-log filename)) ["Date" "User"]))
[/clojure]
The raw dataset retrieved from this call looks like this:
Date | User |
---|---|
11/Aug/2010:00:00:30 +0100 | Bob |
11/Aug/2010:00:00:31 +0100 | Frank |
11/Aug/2010:00:00:34 +0100 | Frank |
Now, we need to work out the number of unique users in a given time period. Like before, we’re going to use $rollup
to group multiple records by minute, but we need to work out how to summarise the user column. To do this, we create a custom summarise function which calculates the number of unique users:
(defn num-unique-items [seq] (count (set seq)))
Then use that to modify the raw dataset and graph the resulting dataset:
(defn access-log-to-unique-user-dataset [access-log-dataset] ($rollup num-unique-items "User" "Date" (col-names (conj-cols ($map #(round-ms-down-to-nearest-min (as-millis %)) "Date" access-log-dataset) ($ "User" access-log-dataset)) ["Date" "Unique Users"]))) (defn concurrent-users-graph [dataset] (time-series-plot :Date :User :x-label "Date" :y-label "User" :title "Users Per Min" :data (access-log-to-unique-user-dataset dataset))) (def access-log-dataset (access-log-to-dataset "/path/to/access.log")) (save (concurrent-users-graph access-log-dataset) "unique-users.png")
You can see the full source code listing here.
Graphing Hits Per Minute Using Clojure & Incanter
Continuing in a re-occurring series of posts showing my limited understanding of Clojure, today we’re using Clojure for log processing. This example is culled from some work I’m doing right now in the day job – we needed to extract usage information to better understand how the system is performing.
The Problem
We have an Apache-style access log showing hits our site. We want to process this information to extract information like peak hits per minute, and perhaps eventually more detailed information like the nature of the request, response time etc.
The log looks like this:
43.124.137.100 - username 05/Aug/2010:17:27:24 +0100 "GET /some/url HTTP/1.1" 200 24 "http://some.refering.domain/" "Mozilla/5.0 (Windows; U; Windows NT 5.1; en-GB; rv:1.9.0.9) Gecko/2009040821 Firefox/3.0.9 (.NET CLR 3.5.30729)"
Extracting The Information
We want to use Incanter to help us process the data & graph it. Incanter likes its data as a sequence of sequences – so that’s what we’ll create.
First up – processing a single line. I TDD’d this solution, but have excluded the tests from the source listing for brevity.
user=> (use 'clojure.contrib.str-utils) nil user=> (use '[clojure.contrib.duck-streams :only (read-lines)]) nil user=> (defn extract-records-from-line [line-from-access-log] (let [[_ ip username date] (re-find #"^(d{1,3}.d{1,3}.d{1,3}.d{1,3}) - (w+) (.+? .+?) " line-from-access-log)] [date username])) #'user/extract-records-from-line user=> (defn as-dataseries [access-log-lines] (map extract-records-from-line access-log-lines)) #'user/as-dataseries user=> (defn records-from-access-log [filename] (as-dataseries (read-lines filename))) #'user/records-from-access-log
A few things to note. extract-records-from-line
is matching more than strictly needed – I just wanted to indicate the use of destructing for matching parts of the log line. I’m pulling in the username & date – the username is not strictly needed for what follows. Secondly, note the use of read-lines
from clojure.contrib.duck-streams
– rather than slurp
, read-lines
is lazy. We’ll have to process the whole file at some point, but it’s a good idea to look to use lazy functions where possible.
At this point, running records-from-access-log
gives us our sequence of sequences – next up, pulling it into Incanter.
Getting The Data Into Incanter
We can check that our code is working properly by firing up Incanter. Given a sample log:
56.24.137.230 - fred 05/Aug/2010:17:27:24 +0100 "GET /some/url HTTP/1.1" 200 24 "http://some.refering.domain/" "SomeUserAgent" 12.14.137.140 - norman 05/Aug/2010:17:27:24 +0100 "GET /some/url HTTP/1.1" 200 24 "http://some.refering.domain/" "SomeUserAgent" 42.1.137.110 - bob 05/Aug/2010:17:28:24 +0100 "GET /some/url HTTP/1.1" 200 24 "http://some.refering.domain/" "SomeUserAgent" 143.124.1.50 - clare 05/Aug/2010:17:29:24 +0100 "GET /some/url HTTP/1.1" 200 24 "http://some.refering.domain/" "SomeUserAgent"
Let’s create a dataset from it, and view the resulting records:
user=> (use 'incanter.core) nil user=> (def access-log-to-dataset (to-dataset (records-from-access-log "/path/to/example-access.log"))) #'user/access-log-dataset user=> (view access-log-dataset)
The result of the view
command:
Unfortunately, no column names – but that is easy to fix using col-names
:
user=> (def access-log-dataset (col-names (to-dataset (records-from-access-log "/path/to/example-access.log")) ["Date" "User"])) #'user/access-log-dataset user=> (view access-log-dataset)
At this point you can see that it would be easy for us to pull in the URL, response code or other data rather than the username from the log – all we’d need to do is change extract-records-from-line
and update the column names.
Graphing The Data
To graph the data, we need to get Incanter to register the date column as what it is – time. Currently it is in string format, so we need to fix that. Culling the basics from Jake McCray’s post, here is what I ended up with (note use of Joda-Time for date handling – you could use the standard SimpleDateFormat if you preferred):
user=> (import 'org.joda.time.format.DateTimeFormat) nil user=> (defn as-millis [date-as-str] (.getMillis (.parseDateTime (DateTimeFormat/forPattern "dd/MMM/yyyy:HH:mm:ss Z") date-as-str))) #'user/as-millis user=> (defn access-log-to-dataset [filename] (let [unmod-data (col-names (to-dataset (records-from-access-log filename)) ["Date" "User"])] (col-names (conj-cols ($map as-millis "Date" unmod-dataset) ($ "User" unmod-dataset)) ["Date Time In Ms" "User"]))) #'user/access-log-to-dataset
While the date parsing should be pretty straightforward to understand, there are a few interesting things going on with the Incanter code that we should dwell on briefly.
The $
function extracts a named column, whereas the $map
function runs another function over the named column from the dataset, returning the modified column (pretty familiar if you’ve used map
). conj-cols
then takes these resulting sequences to create our final dataset.
We’re not quite done yet though. We have our time-series records – representing one hit on our webserver – but don’t actually have values to graph. We also need to work out how we group hits to the nearest minute. What we’re going to do is replace our as-millis
function to one that rounds to the nearest minute. Then, we’re going to use Incater to group those rows together – summing the hits it finds per minute. But before that, we need to tell Incanter that each row represents a hit, by adding a ‘Hits’ column. We’re also going to ditch the user column, as it isn’t going to help us here:
user=> (defn access-log-to-dataset [filename] (let [unmod-dataset (col-names (to-dataset (records-from-access-log filename)) ["Date" "User"])] (col-names (conj-cols ($map as-millis "Date" unmod-dataset) (repeat 1)) ["Date" "Hits"]))) #'user/access-log-to-dataset
Next, we need to create a new function to round our date & time to the nearest minute.
Update: The earlier version of this post screwed up, and the presented round-ms-down-to-nearest-min
actually rounded to the nearest second. This is a corrected version:
(defn round-ms-down-to-nearest-min [millis] (* 60000 (quot millis 60000)))
If you wanted hits per second, here is the function:
(defn round-ms-down-to-nearest-sec [millis] (* 1000 (quot millis 1000)))
And one more tweak to access-log-to-dataset
to use the new function:
(defn access-log-to-dataset [filename] (let [unmod-dataset (col-names (to-dataset (records-from-access-log filename)) ["Date" "User"])] (col-names (conj-cols ($map #(round-ms-down-to-nearest-min (as-millis %)) "Date" unmod-dataset) (repeat 1)) ["Date" "Hits"])))
Finally, we need to roll our data up, summing the hits per minute – all this done using $rollup
:
(defn access-log-to-dataset [filename] (let [unmod-dataset (col-names (to-dataset (records-from-access-log filename)) ["Date" "User"])] ($rollup :sum "Hits" "Date" (col-names (conj-cols ($map #(round-ms-down-to-nearest-min (as-millis %)) "Date" unmod-dataset) (repeat 1)) ["Date" "Hits"]))))
$rollup
applies a summary function to a given column (in our case “Hits”), using another function to determine the parameters for that function (“Date” in our case). :sum
here is a built-in Incanter function, but we could provide our own.
Now we have our dataset, let’s graph it:
user=> (defn hit-graph [dataset] (time-series-plot :Date :Hits :x-label "Date" :y-label "Hits" :title "Hits" :data dataset)) user=> (view (hit-graph (access-log-to-dataset "/path/to/example-access.log")))
This is deeply unexciting – what about if we try a bigger dataset? Then we get things like this:
Conclusion
You can grab the final code here.
Incanter is much more than simply a way of graphing data. This (somewhat) brief example shows you how to get log data into an Incanter-frendly format – what you want to do with it then is up to you. I may well explore other aspects of Incanter in further posts.