reverse-proxying ntop: Tragedy in 4 Acts

Prolog

If you install - like I had to do recently - a lot of open source software, you sometimes wonder what the developers had thought about architecture. No, not about the architecture of their own products; I have stopped wondering about this. It's about how individual pieces are supposed to fit into an existing or evolving landscape, i.e. how they coexist or even interact with other software.

Yes, dear open source developer, so I think sometimes, the universe is bigger than a single planet and there is life out there. Some packages play very nicely with others (more about this later), but there are also violent offenders.

Which brings me directly to ntop the traffic analysis and statistics tool.

It exists for quite some time, I have used it now for years on various occasions. Purists may already point out that the fact that it comes with a web interface may disqualify it out of hand. I would tend to disagree with that, although I admit that hard-coded CSS and HTML code disqualifies it for being uncustomizable. But worse, the JavaScript menues add nothing, absolutely, definitively and positively nothing in terms of functionality, let alone usability.

Much more concern I have when a tool ships with it's own web server. For an architect this creates additional headaches for where to put that unwanted piece. Fortunately, because of ntop's nature to listen on a whole subnet this is less of an issue. Still, if you are concerned about these mundane little things like security or the practicality of a shiny single enforcement point then such an intrusive attitude presents itself as problem.

Act I

So let's try to reverse-proxy ntop to hide all its ugliness behind our main Apache. Naively enough, we start with

ProxyPass        /admin/network/   http://127.0.0.1:3000/
ProxyPassReverse /admin/network/   http://127.0.0.1:3000/

This assumes that ntop is running on localhost on the same machine where we have our Apache server running. While ntop's promiscuous daemon is hogging on all interfaces of that box, we have restricted ntop's web server only to run on the loopback interface. By convention ntop is using port 3000.

The ProxyPass Apache directive tells Apache to forward all requests for /admin/network/.... to http://127.0.0.1:3000/.... It is actually quite clever, especially together with ProxyPassReverse which rewrites URIs in the reverse direction; that is: URIs visible to Apache, i.e. those in the headers. You get these directives when you install mod_proxy_http:

apt-get install libapache2-mod-proxy-http

In an ideal world, that's it, and there are enough tools (SquirrelMail and TWiki to name a few) which are handled with the above, maybe by tweaking somewhere a BaseURL configuration variable.

Act II

But not ntop. It stubbornly insists to serve all documents underneath the root / and to ship these to the client as such. And since most (but not not all) HTML is hardcoded, there is no easy way around it. To cope with this problem, we engage mod_proxy_html as shiny knight. It also is part of the Debian distribution:

apt-get install libapache2-mod-proxy-html

It does the otherwise unthinkable: It rewrites on the fly all documents while they are proxied back to the user. How and when, you can nowadays determine with rules which match a RegExp against the HTML text, only to substitute the matches with a replacement, such as in
<Location /admin/network/>
  SetOutputFilter     proxy-html
  ProxyHTMLLogVerbose On
  ProxyHTMLExtended   On
  ProxyHTMLBufSize    16384

  ProxyHTMLURLMap     /                            /admin/network/
  ProxyHTMLURLMap     /admin/network/plugins/ntop/ /admin/network/plugins/
  ProxyHTMLURLMap     url\("/    url\("/admin/network/   Re

# otherwise the server sends in that encoding => big mess
  RequestHeader    unset  Accept-Encoding
</Location>

Once the module is enabled in Apache, it will replace all URLs inside the HTML code according to the above mappings. A / somewhere as URL is replaced with /admin/network/, something which we want the client to receive and then subsequently activate.

If you have seen HTML code before (did I mention that I am strictly against the death penality for HTML developers?), then you know that this is not exactly bulletproof: there are just too many ways to obscure HTML.

Act III

Enter the ueber-villain: JavaScript. The HTML proxy module can be coerced into dealing with .js and .css files with

ProxyHTMLExtended On

but here you skate again on thin ice. And you end up with bug-plugging such as me in the last ProxyHTMLURLMap rule whichI introduced to clean URLs inside CSS files. Not that I can swear that it covers all cases, but - seriously - how can one ultimately know?

If you have followed so far, you will be rewarded with a completely unreadable HTML document. The reason - I so educatedly guess - lies in the fact that modern browsers request documents with Encodings, i.e. the way how characters are represented as bits and bytes. If they are not shipped in clear text (read: the good old canonical ASCII way of life), then translating them seems out of reach for mod_proxy_html.

Or whatever. The point is that you will want to suppress the encoding which comes already as part of the request. Hence the cryptic

RequestHeader    unset  Accept-Encoding

line above. Ah, yes, and this means one more Apache module to install. Whoopie. mod_headers is your friend here.

Act IV

So you think, we have it now? Not quite, at least if you care to follow the error_log. Some inline images are missed by the browser, so you have to shoehorn the setup a bit more:

RewriteCond %{HTTP_REFERER} /admin/network/
RewriteRule    ^/(spacer.gif)$             http://127.0.0.1:3000/$1         [P]
RewriteCond %{HTTP_REFERER} /admin/network/
RewriteRule    ^/(arrow.gif)$              http://127.0.0.1:3000/$1         [P]
RewriteCond %{HTTP_REFERER} /admin/network/
RewriteRule    ^/(blank.gif)$              http://127.0.0.1:3000/$1         [P]

This is where yet another module, the infamously famous mod_rewrite plays another knight, or one or all of the apocalyptic riders, YMMV. The RewriteCond before each rule just makes sure that the rule only applies in the context of a page underneath /admin/network, and not some other pages you incidentially might have on your web site.

Epilog

So, consider this, we needed mod_proxy, mod_proxy_http, mod_proxy_html, mod_headers and mod_rewrite. Yes, I am concerned about the memory consumption. Always. And, yes, I am also concerned about the CPU usage, especially if this web site actually is used by a number of people. And last, but not least, I am concerned about the complexity of the setup. Unnecessary complexity I claim.

There are more useful hints at https://svn.ntop.org/trac/wiki/ntop,
http://jamespo.org.uk/wp/?p=52 and https://svn.ntop.org/trac/wiki/ntop and also inside ntop's FAQ. There I also found some gems, specifically some about user management and security:

Q. How good is the default security ntop provides through the web server.
A. Good question...
Q. The plugins aren't very secure.
A. True.

Priceless.

Posted In