Posted on 2005-08-14 04:55:35-07 by bblakley
Suggestion for mutator to convert unknown schemes

First I'll set the stage so you know where I'm coming from.

I wrote a specialized proxy server that uses HTTP::Proxy as it's core, with other features like proxy cookie support, authentication, and domain redirection built in. The program is targeted primarily toward wireless Internet users (like cell phones) that want to run their own proxy server to provide Internet access for their device, instead of paying their carrier $5 a month to use the carrier's proxy. These wireless users have some special needs, not the least of which being that most cell phone Internet browsers don't have cookie support (they just ignore them). So, my proxy server grabs Set-Cookie headers and stores them on a per-user basis, and then puts appropriate Cookie headers into requests when the user requests a page from a host for which cookies have been stored. Another specialized need for these users is that most carriers try to make it difficult for the user to run their own proxy by hard coding the home page of the browser to something impossible like "http://homepage". My proxy server identifies those "local domain" requests (anything without a TLD) and redirects the browser to a homepage of the users choice.

However it has recently come to light that one major carrier locks the phone's homepage to "proxy:homepage". This is insidious because most proxy servers (HTTP::Proxy included) see "proxy:" as specifying a protocol/scheme (like http, https, etc.). HTTP::Proxy therefore returns a 501 error when a request for "proxy:homepage" comes through.

I was able to fix this situation with a change to HTTP::Proxy. At line 328 in Proxy.pm in 0.15 contains this code:

# can we serve this protocol? if ( !$self->agent->is_protocol_supported( my $s = $req->uri->scheme ) ) { # should this be 400 Bad Request? $response = HTTP::Response->new( 501, 'Not Implemented' ); $response->content_type( "text/plain" ); $response->content("Scheme $s is not supported by this proxy."); $self->response($response); goto SEND; }

I remarked out that code and replaced it with this code:

if (!$self->agent->is_protocol_supported($req->uri->scheme)) { $req->uri->scheme('http'); }

What this does is basically say, if you get a request for a scheme/protocol you don't know about, change the protocol to 'http'. This allows the request to pass through and make it into my request header filter, where I identify it as a local domain request (because the host is blank and therefore there is no TLD which is what I am looking for). Since my proxy redirects all local domain requests to the user's homepage of choice, it cleanly works around the "proxy:homepage" problem.

I'm not saying you'd want HTTP::Proxy to work this way by default. What would be nice however is something like an 'overrideUnknownSchemes' mutator that if set to a true value would cause HTTP::Proxy to behave the way my new code illustrates. This would save me from having to run a proprietary (modified) version of HTTP::Proxy (which is something I'd rather not do).

I realize this is a little esoteric and might not matter to anyone but me, but it is a simple change and you never know who might benefit from it in some other oddball scenario.

Direct Responses: 883 | Write a response
Posted on 2005-08-14 07:56:45-07 by book in response to 881
Re: Suggestion for mutator to convert unknown schemes

Do you plan to release your code on CPAN? (Just curious)

If we want this to be as general as possible, then I could add a specialised hook for unknown schemes that would return false if the protocol is really not supported and another URL to try otherwise.

In your case, the callback would simply return "http://homepage".

What do you think of this? It should work for you and possibly for future users having even more esoteric needs than you. :-)

Direct Responses: 884 | Write a response
Posted on 2005-08-14 08:17:45-07 by bblakley in response to 883
Re: Suggestion for mutator to convert unknown schemes

I have released the source code for my proxy server, but not on CPAN. It is available at www.HoTTProxy.org. I don't know if it's good enough for CPAN. :-) It started out just being for me, and then I decided to release it for all to use so I had to clean it up a bit and make it a little more user friendly. I'm also working on a web based administration console for it using the HTTP::Server::Simple::CGI module (which is also great).

When you say you'd "return false" if the protocol is not supported, it sounds like the request would be short circuited and wouldn't make it through to the filters (and for my purposes, it needs to). If we wanted to make the concept a little more generic perhaps instead of a true/false mutator that always maps unknown schemes to http, it would make more sense to have a mutator like:

mapUnknownProtocolTo => 'http'

so that each developer could choose, for their particular situation, what protocol to treat unsupported protocols as. This would provide maximum flexibility and allow the requests to pass through to the various filters, etc., just like any other request of that protocol.

What do you think?

Direct Responses: 885 | Write a response
Posted on 2005-08-14 08:59:44-07 by book in response to 884
Re: Suggestion for mutator to convert unknown schemes

The hook would be called only if it exists, otherwise the usual 501 response would be sent.

If the hook exists and it returns false, the same old 501 response is sent. If it returns a true value, it is taken to be a URL and process restart a few lines earlier (so if the protocol in the new url is not supported, this sould be catched as well... Mmm, wouldn't that create infinite loops?) OK, scratch that.

If the hook returns a true value, it's taken as a url, and if the protocol is still not supported, the 501 is sent. Otherwise, processing restart a few lines earlier (maybe at the start of the loop).

...

I think there's another way to do what you want. Enable protocol 'proxy' in your proxy's agent, and add a specialised filter for scheme "proxy". That should work, and has the bonus of me not writing a line of code. :-)

# simply accept the 'proxy' scheme $proxy->agent->protocols_allowed( [ @{ $proxy->agent->protocols_allowed }, 'proxy' ] ); # should be pushed very early (first?) $proxy->push_filter( scheme => 'proxy', request => HTTP::Proxy::HeaderFilter::simple->new( sub { my ( $self, $headers, $message) = @_; # ... compute the new $uri ... $message->uri( $uri ); } ), );

This code is untested, but you get the idea.

Direct Responses: 886 | Write a response
Posted on 2005-08-14 10:22:19-07 by bblakley in response to 885
Re: Suggestion for mutator to convert unknown schemes

I like the concept of being able to add to the list of supported schemes without having to modify the Proxy.pm. Adding 'proxy' to the list of protocols allowed seems to work OK using this code:

my $proxy = HTTP::Proxy->new( port => 8080, host => '', via => 'HoTTProxy' ); $proxy->init(); $proxy->agent->protocols_allowed( [ @{ $proxy->agent->protocols_allowed() }, 'proxy' ] );

I had to add the $proxy->init() because Proxy.pm was saying:

Can't call method "protocols_allowed" on an undefined value at C:\HoTTProxy\HoTTProxy.pl line 112.

if I didn't. However, even after getting it to take the additional "allowed" protocol, the filter is dying with this error, and I don't know why:

Unsupported scheme: proxy at C:\HoTTProxy\HoTTProxy.pl line 125

I tried reading the code around line 630 in Proxy.pm but can't see why it should think that scheme 'proxy' is unsupported once it is added to protocols_allowed. Just to ensure that 'proxy' really was in protocols_allowed, I even tried just hard coding it into the list in line 225 of Proxy.pm, and I still get the same "unsupported scheme" error.

Stumped for now. I think I should go to bed. It will be light soon.

Direct Responses: 887 | Write a response
Posted on 2005-08-14 11:24:07-07 by book in response to 886
Re: Suggestion for mutator to convert unknown schemes

Yes, I forgot to say that init() is required in my example. It creates the default useragent used by HTTP::Proxy.

As for the "Unsupported scheme" error, I've written the following test script (note the use of Devel::SimpleTrace, a very useful little module):

use strict; use warnings; use HTTP::Proxy; use HTTP::Proxy::HeaderFilter::simple; my $proxy = HTTP::Proxy->new(@ARGV); $proxy->init(); $proxy->agent->protocols_allowed( [ @{ $proxy->agent->protocols_allowed() }, 'proxy' ] ); print "$_ => ", $proxy->agent->is_protocol_supported($_), "\n" for qw( http https ftp gopher wais plonk proxy ); $proxy->push_filter( scheme => 'proxy', request => HTTP::Proxy::HeaderFilter::simple->new( sub { my ( $self, $headers, $message ) = @_; $message->uri('http://www.perdu.com/'); } ), ); $proxy->start;

which dies with the same error, and a little more information:

http => 1 https => 1 ftp => 1 gopher => 1 wais => 0 plonk => 0 proxy => 0 Unsupported scheme: proxy at foo.pl line 19 at Carp::croak(unknown source) at HTTP::Proxy::push_filter(lib/HTTP/Proxy.pm:622) at main::(foo.pl:19)

(don't look at the line numbers, I'm working with the CVS version of HTTP::Proxy)

The code that triggers the error is here:

for (@scheme) { croak "Unsupported scheme: $_" if !$self->agent->is_protocol_supported($_); }

The is_protocol_supported() of LWP::UserAgent not only checks the protocols_allowed() value, but also verifies that there exists an implementor for the scheme. This is done by the method implementor() of LWP::Protocol.

Since I don't want to overwrite LWP::Protocol::implementor, I guess I'll have to take another approach. Probably by checking that the protocol is supported by directly looking into protocols_allowed.

Well, I've just added a is_protocol_supported method to HTTP::Proxy, to check if scheme is supported directly by the proxy, even if LWP doesn't support it. It's in the CVS. The above example works perfectly.

HTTP::Proxy 0.16 will be released before YAPC::Europe (Aug 30, 2005). If you heavily use HTTP::Proxy, I suggest that you subscribe to the mailing lists listed at http://http-proxy.mongueurs.net/ : the mailing list is the main point of contact for people using HTTP::Proxy. And you can even receive commit emails, if you want!

Direct Responses: 889 | Write a response
Posted on 2005-08-14 23:48:49-07 by bblakley in response to 887
Re: Suggestion for mutator to convert unknown schemes

Thanks, this sounds perfect. I have subscribed to the two mailing lists you referenced. I will also download the latest snapshot of the CVS repository and do some testing with it.

Thanks again!

Direct Responses: 890 | Write a response
Posted on 2005-08-15 01:14:54-07 by bblakley in response to 889
Re: Suggestion for mutator to convert unknown schemes

I downloaded and installed the CVS version of HTTP::Proxy and tested the changes. Everything works great! Thanks again!

Direct Responses: Write a response
Perl Weekly newsletter
A free weekly newsletter for people who are busy to read all the blogs. click here to check it out.