A few of our friends have been asking us what are some of the best practices we learnt over the last two years designing and implementing RESTful Web Services as the back-end of the feedly service. Here is a quick/high level brain dump:
Phase 1 – Defining a simple resource/service | Take a sample resource such as Customer Information, model it as JSON. Build a simple servlet where PUT creates a new customer, GET returns the customer information based on the customer key, DELETE deletes the customer and POST updates the customer information. Make sure that PUT returns the right information regarding the URL of the newly created resource. In our case, we have a framework which maps JSON to our Java Model and use hibernate to persist that model in a MySQL database. The important things for this phase are to the JSON representation right and the base url formatting simple and clean. [Update: see some additional clarification regarding the use of PUT versus POST for creation and update at the end of this document]
Phase 3 – Adding validation | Change your service implementation to add some data validation to the JSON resource which is being received during PUT and POST. Learn how to use HTTP error code to define and transfer exception information. Learn how to handle those exceptions on the client side. The important thing for this phase is to make sure that you know the existing HTTP error codes, reuse them when it makes sense and create new one which are compliant with HTTP when needed.
Phase 4 – Complex resources | More services evolve over time into resources which are more complex/composite. This will have impact on your URL hierarchy. It will also have impact on the way you marshal those composite JSON resources into domain objects. Try extending the customer resource to include [1…N] address resources. Make sure that the new “service” allows you to get a customer with all its addresses but also allows you to get one address or add/edit/delete an address.
Phase 5 – Adding caching | The web infrastructure offers rich caching mechanisms (last modified information, cache duration, eTag). Learn about those mechanism and see if you can leverage them to improve the performance and scalability of your service. In the back end, learn about Memcached and see how you can leverage it to reduce the load on our service. This is all the more important when you start dealing with sets of large composite resources which are expensive database-wise to build but are not updated often (in those cases, it might be worth to building the resource once and asking the client to cache it if you know when it is going to expire or ask memcached to cache it you do not.
Phase 6 – Adding Authentication | Learn who to leverage your existing web authentication frameworks to for the user to login and validate credential before the service is accessed. Look at how you could use Open Id and OAuth if you are building a consumer centric service. Here again use existing HTTP error codes when possible.
Phase 7 – Publishing Business Events | Often, changes to resource require different types of back end processing. When possible try to avoid creating a monolithic service, instead, save the resource and fire a business event. Defining the right granularity for the business event and the right typing is hard. You might have to iterate a few time to get this right. My advice is to not over engineer it: keep it as simple as possible and re-factor as new use cases appear. For example, if you want to have some processing to send a new welcome email to each customer, define a ON_NEW_CUSTOMER event with a payload which includes the customer URI and instrument our service to fire that event each time a new customer is created.
Phase 7 – Adding Lifecycle | If your resource includes a life-cycle (example Order), your can model that life-cycle as part of the resource and use the state for the the validation in phase 3 and the published business events (state transitions are usually good candidates for business events).
We will try to add more as our back end evolves. We are also looking at taking some of our back-end infrastructure and open sourcing it. We should know more in the next few months. If you have any suggestions you would like to share please post them in the comments or send me an email to firstname.lastname@example.org and I will update this list. Thanks!
Update/May 27th, 2009: There has been some great comments regarding the use of PUT versus POST. So I did some additional research and found this interesting post.
The crux of the issue comes down to a concept known as idempotency. An operation is idempotent if a sequence of two or more of the same operation results in the same resource state as would a single instance of that operation. According to the HTTP 1.1 specification, GET, HEAD, PUT and DELETE are idempotent, while POST is not. That is, a sequence of multiple attempts to PUT data to a URL will result in the same resource state as a single attempt to PUT data to that URL, but the same cannot be said of a POST request.
After that discussion, a more realistic mapping would seem to be:
- Create = PUT iff you are sending the full content of the specified resource (URL).
- Create = POST if you are sending a command to the server to create a subordinate of the specified resource, using some server-side algorithm.
- Retrieve = GET.
- Update = PUT iff you are updating the full content of the specified resource.
- Update = POST if you are requesting the server to update one or more subordinates of the specified resource.
- Delete = DELETE.
Update July 8th 2009. Here is a great presentation from LinkedIn on how they use RESTful APIs for high performance integration. Great watch.