rpc | brandonio21

A key aspect of building a server-based (cloud-based in today’s lingo) service is communication. The (often times remote) client needs to communicate with the server. Further, sometimes other processes on the server also need to communicate with each other. There are several ways to accomplish this, one of which is with RPC.

Many blooming programmers and hackathoners today will jump straight to “Why not just create a simple REST API using JSON serialization?”. On the surface, there are many good things about this. I’m sure you’ve heard them all:

REST is simple and stateless. The semantics of a REST API are widely used and therefore easy to use once you’ve learned them. With a REST API on your server, it’s super easy to manipulate data and debug operations using tools like Postman.
REST encourages readability. I’m all about readability everywhere in computing. Code should be readable, text should be readable, interfaces should be readable. It only makes sense, then, that an API should be readable as well. REST encourages this. It’s very easy to tell what GET api/users will do.
REST encourages readable serializations. Since the API endpoints are readable, it’s only natural to make the response readable as well. Today, most APIs accomplish this by serialiazing data in the easy-to-read JSON format. This way, data is easy to read and easy to manipulate.

This is all well and good. Hackathoners and newcomers should not feel discouraged from using REST/JSON to create an interface to their cloud application. There’s one problem that I’m sure you’ve noticed, however.

JSON is heavy. When you’ve implemented a distributed, load-balanced, fully cached, and 100% optimized service, the largest bottleneck is transmission time from the server to the client, especially if the response object is large. In fact, on all of the teams I’ve worked on at various companies (except one), complaints about transmission time for huge serialized objects were extremely common.

Also, REST is cumbersome. I need more than two hands to count how many times I’ve coded up, from scratch, an interface layer that handles REST-style requests and serves the response in JSON. At one point I thought it a good idea to create a C# tool which actually generates a PHP REST interface when given the dataschema. Why did I have to do this? Surely, someone else has already done it!

Enter Facebook’s Thrift, which is open source and Apache licensed. For me, Thrift’s biggest features are how it solves the problems I’ve mentioned above. In order to use Thrift, you design a schema to represent both your objects and your service. The Thrift compiler then uses your schema to generate a client and server for you, meaning that you no longer have to handle the communication or serialization problems.

As a contrived example, say that I wanted to make a simple service to get my server uptime. I first design a Thrift schema:

service UptimeService {
    i32 getUptimeInDays();
}

Then, I compile the schema using Thrift

thrift --gen py uptime.thrift

I am going to use Python for my client and server, so I use the --gen py flag. Thrift has many supported languages, however.

I can then use the generated Python libraries to write my server implementation:

import UptimeService
import subprocess

from thrift.transport import TSocket
from thrift.transport import TTransport
from thrift.protocol import TBinaryProtocol
from thrift.server import TServer

class UptimeHandler:
    def getUptimeInDays():
        return int(subprocess.check_output('uptime').split(' ')[3])

if __name__ == '__main__':
    handler = UptimeHandler()
    processor = UptimeService.Processor(handler)
    transport = TSocket.TServerSocket(port=9090)
    tfactory = TTransport.TBufferedTransportFactory()
    pfactory = TBinaryProtocol.TBinaryProtocolFactory()

    server = TServer.TThreadedServer(processor, transport, tfactory, pfactory)
    server.serve()

And I also use the generated Python libraries to write my client implementation:

import UptimeService

from thrift import Thrift
from thrift.transport import TSocket
from thrift.transport import TTransport
from thrift.protocol import TBinaryProtocol

def main():
    transport = TSocket.TSocket('localhost', 9090)
    transport = TTransport.TBufferedTransport(transport)
    protocol = TBinaryProtocol.TBinaryProtocol(transport)
    client = UptimeService.Client(protocol)

    transport.open()
    print("Server uptime: {} days".format(client.getUptimeInDays()))
    
    transport.close()

if __name__ == '__main__':
    main()

Of course, this client implementation only works when run from the server itself. If I wanted to get the uptime remotely, I would just change ‘localhost’ to my domain name.

And that’s it! All of the networking details and serialization is handled by Thrift. Of course, a big portion of this post went on about how JSON is too big and too heavy. If one compares performance of various Thrift-like libraries, they will notice that Thrift is definitely not the fastest. Instead, libraries like Avro, Protobuf, and CapnProto are much faster and also more compact.

However, a major pain point for me across the years has been implementing the actual interface layer. Writing client code to summon an HTTP Request and read the response from the server gets old after a while. This is something that Thrift handles for you. As you can see from the example above, Thrift handles all serialization, deserialization, and message-passing for you. All you have to worry about is defining a schema — Thrift gives the rest for free!

brandonio21

Observations about life and software engineering

Tag Archives: rpc

Simple RPC With Thrift