HTTP::Request & Friends Too Memory-Happy

Now that my maps and all the supporting vector space data is getting larger, I noticed that HTTP::Request (as well as HTTP::Response) are not overly economic when it comes to memory usage.

The Problem Definition

Let us look at the following code

use constant B_1K => 1024;
use constant B_1M => B_1K * B_1K;

my $fatty = 'A' x 1 x B_1K;

use HTTP::Request::Common qw/PUT/;
my $req = PUT '/mapreduce/' ,
              'Content-Type' => 'text/x-astma',
              Content => $fatty;

and at the RSS (resident memory, in MB) usage. On my box I will find

after PUT: 5.08984375 at test.pl line 9.

If I set

my $fatty = 'A' x 1 x B_1M;

then

after PUT: 9.1015625 at test.pl line 9.

About 4 MB more. As if the content is copied 3 times. And I did not even send this to the server.

Let's try 10MB:

after PUT: 45.11328125 at test.pl line 9.

So factor 3 sounds about right.

The Confusion

I thought "Maybe I am just using it wrong":

my $req = PUT '/mapreduce/' ,
              'Content-Type' => 'text/x-astma';
$req->content ($fatty);

Same thing. What about passing in a reference:

use HTTP::Request::Common qw/PUT/;
my $req = PUT '/mapreduce/' ,
              'Content-Type' => 'text/x-astma';
$req->content_ref (\$fatty);

Much better, but still at least one additional internal copy. And if I ship content to localhost and back I still pay a penalty of factor 4.

Of course I looked inside the HTTP::* code, but it is too mature to be easily fixed for memory issues. And it would be much too core-y for me, anyway.

Solution 1: Chunked Transfer

Reading the small print - this time in LWP::UserAgent - I came across the following gem:

You are allowed to use a CODE reference as "content" in the request object passed in. The "content" function should return the content when called. The content can be returned in chunks. The content function will be invoked repeatedly until it return an empty string to signal that there is no more content.

That looks like this:

my $req = PUT '/mapreduce/' ,
              'Content-Type' => 'text/x-astma';

use constant N    => 3;

my @data = ('1' x B_1M,
            '2' x B_1M,
            '3' x B_1M);

my $n;
$req->content ( sub {
    return  ++$n <= Test->N
                ? $data[$n-1]
                : undef; } );

Chunked transfers are really supported by LWP::UserAgent, but you MUST NOT have any content defined before in the request.

While this improves the memory situation somewhat, it has its drawbacks:

  • You have to train your HTTP/REST server to handle this on the incoming side. (I have done that.)
  • It is pretty inconvient to use, especially if you do a lot with GET/PUT/POST, as is natural in REST programming. A subclass could solve this.
  • But it does not handle similar size issues with HTTP::Response.

Bummer.

Solution 2: HTTP::Request::Mmap

Instead, I created a subclass of HTTP::Request and utilized Sys::MMap. The central part is to redefine the content method:

sub content {
    my $self = shift;
    if (my $fn = shift) {
        open (my $fh, '<', $fn)  or die "unable to open $fn ($!)";

        $self->{_fh} = $fh;                  # we keep it
        $self->{_fn} = $fn;                  # we keep it
        use Sys::Mmap;
        my $c;
        mmap ($c, 0, PROT_READ, MAP_SHARED, $fh)
                    or die "unable to mmap ($!)";
        $self->{_mmapped} = \$c;
    } else {
        return ${ $self->{_mmapped} };
    }
}

The method now expects a file instead of a scalar, the file then being mmapped to a scalar. That way I can even be ruthless at times to add this to an existing request object:

my $req = PUT '/mapreduce/' ,
              'Content-Type' => 'text/x-astma';
use HTTP::Request::Mmap;
bless $req, 'HTTP::Request::Mmap';  # re-blessing
$req->content ($fatty_on_files);

I like being brutal.

Also on the way back from the server to the client, the memory problem can be handled that way:

use HTTP::Response::Mmap;
return HTTP::Response::Mmap->new ( HTTP_OK,
                                  "here we go",
                                  [ 'Content-Type', '....' ],
                                  $filepath );

What Gives?

The nice thing is that the environment is blissfully unaware of the mapping. And especially on the server (where Perl memory usage never goes down), an economical usage profile makes me happy.

Once I have completely disentangled it from my homebaked REST server, I can publish it onto CPAN. Unless someone has been there before me, of course.

Posted In