CSCI-4220 Network Programming

Project 2 - HTTP Proxy
Frequently Asked Questions


Q: The gethostbyname example code found in the lecture notes doesn't work for me !

A: The lecture notes were wrong. The following code does not work:

memcpy(&http_server_skaddr.sin_addr, h->h_addr_list,sizeof(struct in_addr));
The problem is that the address h->h_addr_list is not valid - this is the address of the array itself (not the address contained in the first element of the array). You need this:
 
memcpy(&http_server_skaddr.sin_addr, h->h_addr_list[0], sizeof(struct in_addr));
Q: How do I use ssh to do port forwarding so I can get past my firewall?

A: Below are instructions for both tera-term (a Windows based ssh client) and the Unix version of ssh. Both assume that your firewall will allow ssh traffic through!

  • Port forwarding using Tera Term for Windows.

    Establish an ssh connection to monica (login to monica using tera-term and make sure you select that you want to use ssh and not telnet).
    Now select SSH Forwarding from the Setup menu.
    Click on the Add button
    Now you need to enter the Forward local port number - this port number can be anything as long as the port is free. Something like 5432 worked for me.
    Now you need to specify the "to remote machine" hostname and port, you probably want to specify monica.cs.rpi.edu and whatever port number your proxy server is running on.

    Click OK and you should be all set with a "tunnel" to your proxy server. Now you can tell Netscape to use the proxy server running on your machine, port 5432 (or whatever port number you specified as the local port number).

    If you rebuild/rerun your proxy you just need to change the port numbers in the tera-term forwarding dialog - you can probably keep the same local port number so you don't need to tell Netscape anything new.

  • Port forwarding using command line ssh from Unix.

    The version I'm using is ssh 2.0.12, the instructions may differ for other versions (man ssh for more info).

    I couldn't find a way to add port forwarding to an existing ssh connection - it looks like you specify this ahead of time (when the original ssh connection is established). The command line I used was this:

    ssh -L 5432:monica.cs.rpi.edu:8888 monica.cs.rpi.edu
    
    This sets up port forwarding from port 5432 on my machine to port 8888 on monica (this is where the proxy is running in this example) and also establishes a remote login on monica. If you close down the remote login to monica the port forwarding will also be closed...
For general information about ssh check out www.ssh.org
Q:Can we play with your sample proxy?

A:Yup - an executable is in ~hollingd/public.html/netprog/P2. This will run on monica (or any PC running BSD). To run it (on monica) you can type:

> ~hollingd/public.html/netprog/P2/sampleproxy

Q:How can I force a HEAD request from Netscape?

A:(thanks to Robert Foulis for this) Tell the browser to update bookmarks it will send a bunch of HEAD requests.

To update bookmarks you edit your bookmarks and then open the edit menu - there will be a "Update bookmarks" item.

Q:Any parsing code available yet?

A:Yes - it's here: parse.c. The code is not commented well, but it seems to work and a few people have told me it isn't hard to understand.

Q: I'm trying to use gethostbyaddr to determine the hostname of the client, but it always returns NULL:
  struct sockaddr_in from;      
  ...
  if ( (sd = accept( ld, (struct sockaddr*) &from, &addrlen)) < 0) {
  ...
  if((hptr=gethostbyaddr(&from,sizeof(from),AF_INET))==NULL){
  ...

A: gethostbyaddr expects an IP address only, not the entire sockadr_in, you want something like this:

  if((hptr=gethostbyaddr(&from.sin_addr,sizeof(from.sin_addr),AF_INET))==NULL){
Q: When using a browser connected to my proxy I sometimes see a bunch of GET requests even though I've only told the browser to get a single document - is this normal?

A: Yes, this is normal. Once the browser tries to render some HTML it may find that it needs some images, so for each image it has to make another GET request.

Q: The Textbook talks about inet_ntop and inet_pton functions, but I can't find them.

A: These functions are part of IPV6 and are not available on most machines. You can use the IPv4 specific functions inet_ntoa and inet_aton. The CS BSD machines do seem to support inet_ntop and inet_pton, although there are no man pages for them.

Q:Is this a valid URI to test our project on:
       GET http://www.rpi.edu:12345/~blah/foo.html HTTP/1.0
where the port is given and there is a path as well?

A: Yes - you need to be able to handle any URI, including those that specify port numbers.

Q:My program works perfectly for some urls. But it desn't work for others.

A:You may be bumping into problems with the transition between HTTP 1.0 and HTTP 1.1. According to the HTTP 1.1 spec. proxy servers must remove the "http://hostname:port" part of the URI, although it also requires that all HTTP servers are able to deal with complete URIs (that include the "http://hostname:port" part). It appears that not all the servers on the WWW can deal with complete URIs, so the best thing to do is to make sure you don't forward the "http://hostname:port" part of the request. I've done this and it works fine.

Q: Help! I don't understand what a proxy web server is!!!

A: A proxy web server accepts HTTP requests through a known TCP port, and forwards each request to the real server, sending back any reply to the client. So your program must be able to act as a server (to receive HTTP requests) and as a client (to make the HTTP requests from the real server). You will need to write code that does something like this:

  1. establish a passive mode TCP socket and print out the port number bound to the socket.
  2. accept a TCP connection from a HTTP client.
  3. Read the first line sent by the client and parse it. If it is not a GET, HEAD or POST request you can ignore it (close the connection and go back to step 2). You need to parse the URI to determine the hostname and port number. Here is an example URI:
    http://www.foo.com:1234/funny/pages
    In this case your program needs to know that the HTTP server it should contact is running on the host www.foo.com on port 1234.
  4. Establish a TCP connection to the host,port specified in the URI (creating a new TCP socket and calling connect()).
  5. Forward the GET request (and any following HTTP header lines) to the HTTP server.
  6. Read everything sent back by the HTTP server and send it back through the socket to the client.
  7. Closing the connections to the server and client.
  8. go back to step 2 to handle the next client.
These steps make up a (minimal), iterative proxy HTTP server.