Tuesday, July 08, 2008

Google OpenSources Protocol Buffers

Google has open sourced Protocol Buffers which is used extensively at Google. It's used for storing data in big tables as well as for RPC communications between services. It's perhaps nothing revolutionary, except that it works and it's fast. The other day I realized there are two very nice features of protobuffers:
  1. You can have a zero sized protobuffer, which can still return (default) values.
  2. Protobuffers can be both forwards and backwards compatible.
I suppose the easiest way to see the advantages is with a sample. Imagine you want to store some settings for a program.
message Settings {
optional bool auto_backup = 1 [default=false];
optional int32 backup_frequency_sec = 2 [default=30];
optional string backup_fname = 3 [default="auto.sav"];
}
By having everything optional and with defaults, even if the settings file is zero bytes long (or doesn't exist) that's still a valid protobuf file and you can create a protobuffer instance that will have the correct defaults. If I add a new optional field to Settings (say backup_directory) old programs that don't understand this field will just skip them. Also, new programs that do know about backup_directory can still read old protobuf files.
Another nice feature is that only the numeric tags are important, you can pretty much freely rename a field and not cause too many problems except a rebuild.
How the protobuf is stored in binary is also very interesting. I had a similar problem way back when I worked for Andyne and needed to store lots of data. They have a nice way of storing varints, but I think they could have tried harder for floats and doubles.
Overall, using protobuffers is far smaller and faster than using, say, XML. I'm glad that I can now use it in my own open source programs instead of XML or Yaml.

3 comments:

Viet said...

Hi Scoot,

Thanks for sharing. I'm curious about the protobuffer and may give a try for my school project.

Btw, correct me if I'm wrong, Google protobuffer has not implemented storage for floats.

Please share if you know any decent ORM in C++ (I prefer C++ over PHP).

Thank you,
Viet.

Scott Kirkwood said...

It has double and floats
here.
You should look at Python instead of C++ or PHP.

james said...

Thank you . To all participant, please have a look at our updated contest post.
more templates easy to download