25

As the title says, I would like to write a HTTP server. My question is this, how do I do this? I know this sounds VERY general and too "high level", but there is a method to my madness. An answer to this question should be, I believe, language agnostic; meaning, no matter what language I use (e.g., C, C++, Java, etc.) the answer should be the same. I have a general idea of how this is supposed to work:

  1. Open a socket on port 80.
  2. Wait for a client to make a request.
  3. Read the request (i.e., this person wants page "contact-us.html").
  4. Find and read "contact-us.html".
  5. Send an html header, then send the content of "contact-us.html"
  6. Done

Like I said, I believe this is the process, but I am not 100% sure. This leads me to the heart of my question. How or where does a person find out this information?

What if I didn't want to write just an HTTP server, what if I wanted to write an FTP server, a chat server, an image viewer, etc.? How does a person find out the exact steps/process needed to create a working HTTP server?

A co-worker told me about the html header, so I would have NEVER know this without him. He also said something about handing off each request to a new thread. Is there some big book of how things work? Is there some manual of what it takes to be an HTTP server?

I tried googling "how does a HTTP server work", but the only answers I could find were geared towards your average Joe, and not towards a person wanting to program a HTTP server.

haylem
  • 28,856
  • 10
  • 103
  • 119
Brian
  • 361
  • 1
  • 3
  • 6
  • 14
    [RFC2616](http://www.w3.org/Protocols/rfc2616/rfc2616.html) should have all the nitty gritty details of the HTTP protocol for you. [RFC959](http://www.w3.org/Protocols/rfc959/) is the same thing for FTP. – Mike Jun 07 '13 at 14:51
  • 4
    Alternatively (or additionally), look at how existing simple HTTP servers are implemented. More than one, that should give you an idea what structures make sense. – Michael Borgwardt Jun 07 '13 at 14:53
  • Michael Borgwardt - I would do that, but I have a tendency to copy what I have seen when I look at other code. I was hoping to go into this clean, to see if I could do it on my own without "cheating". – Brian Jun 07 '13 at 15:16
  • your web search phrase is wrong, it is targeted at users, that's why you're getting average Joe stuff. Use: **"how to develop a HTTP server"** instead, it better reflects what you're looking for. I just tried it with Google and got a full page of references that explain this stuff – gnat Jun 07 '13 at 16:09
  • consider reviewing other implementations, for instance apache tomcat. It probably does more than you want, but it will demonstrate one technique to solve the problem. – DwB Jun 11 '13 at 18:34
  • In 2014, RFC2616 was replaced by multiple RFCs (7230-7237). – Lothar Nov 12 '15 at 16:26

2 Answers2

25

Use the RFC2616, Luke!

You read the RFC 2616 on HTTP/1.1, and you go for it.

That was actually a project in my 3rd year in engineering school, and that's pretty much the project description.

Tools

Your tools are:

  • basic networking stuff (socket management, binding, understand addresses),
  • good understanding of I/O streams,
  • a lot patience to get some shady parts of the RFC (mime-types are fun).

Fun Considerations

Things to consider for extra fun:

  • plug-in architecture to add CGI / mod support,
  • configuration files for, well, many things,
  • lots of experimentation on how to optimize transfers,
  • lots of experimentation to see how to manage load in terms of CPU and memory, and to pick a dispatch model (big fat even loop, single accept dispatch, multi-thread, multi-process, etc...).

Have fun. It's a very cool thing to look at.

Other (Simpler) Suggestions

  • FTP client/server (mostly RFC959 but there are older versions and also some extensions)
  • IRC client/server (mostly RFC1459, but there are extensions)

They're way easier to tackle first, and their RFCs are a lot easier to digest (well, the IRC one has some odd parts, but the FTP one is pretty clear).

Language Choice

Of course, some implementation details will be highly dependant on the language and stack you use to implement it. I approached all that in C, but I'm sure it can be fun just as well in other languages (ok, maybe not as much fun, but still fun).

Robert Harvey
  • 198,589
  • 55
  • 464
  • 673
haylem
  • 28,856
  • 10
  • 103
  • 119
  • Yeah I had to do this as a project back in school too. It's surprisingly fun and gives you more of an appreciation for "industrial strength" web servers. – Evicatos Jun 07 '13 at 16:45
  • Getting the protocol implementation right is one part; architecting the server is another... – tdammers Jun 07 '13 at 17:05
  • @tdammers: RFCs are pretty good, if you follow them, you already have a decent barebone blueprint to follow. You still have lots of room for your architecture design, but it's a pretty good and directive spec. – haylem Jun 07 '13 at 17:47
  • @haylem: yes and no. Implementing the spec gives you an individual worker, but you still need to embed this worker in a bigger picture - how do you take care of handling concurrent requests? How do you provide useful content? Where do you keep state? – tdammers Jun 07 '13 at 19:18
  • @tdammers: me: `You still have lots of room for your architecture design, but it's a pretty good and directive spec.` you: `yes and no`. I think we already narrowed down than the RFC wasn't everything. And I think it's up to the OP to then discover these things rather than directly pointing them out more than what I've already done in the "extra considerations" sections and other things. It's part of the fun. – haylem Jun 07 '13 at 21:09
  • In 2014, RFC2616 was replaced by multiple RFCs (7230-7237). – Lothar Nov 12 '15 at 16:26
2

Each of the protocols used on the internet is specified in one or more public documents called RFCs. All the current RFCs can be found at http://www.rfc-editor.org/, which also has a decent search function.

The HTTP protocol (version 1.1), for example, is specified in RFC2616 and the FTP protocol is specified in RFC959.

As specification go, the RFCs are, in my opinion, very well readable.

Bart van Ingen Schenau
  • 71,712
  • 20
  • 110
  • 179
  • I'm really confused with this RFCs. Will they ever update HTTP RFCs? In the above answer, there is a comment that states `In 2014, RFC2616 was replaced by multiple RFCs (7230-7237).`. So, how to find the updated RFCs if they are present? Should I check for `Obsoleted by` list? – SkrewEverything Mar 15 '18 at 20:53
  • @SkrewEverything: RFCs don't get updated but they get replaced by newer RFCs. You find the newer ones indeed by following the "Obsoleted by" links. – Bart van Ingen Schenau Mar 16 '18 at 08:25