2. Data Serialization
● The process of translating an object into a format
that can be stored in a memory buffer, file or
transported on a network.
● End goal : Reconstruction in another computer
environment.
● Reverse process: Deserialization
3. Binary Serialization
● Many languages provides built in language
support
● Language specific (Interop issues)
● Example : Java - Serializable marker interface
(increases likelihood of bugs and security holes )
● Item 74: Implement Serializable
judiciously
● Item 78: Consider serialization
proxies instead of serialized instances
5. CROSS PLATFORM SOLUTIONS - XML
(Extensible Markup Language)
● Design goals: simplicity, generality, and usability across
the Internet
● Hierarchical structure, validation via schema (DTD, XSD
etc)
● A common standard with great acceptance.
● Criticism for verbosity and complexity (especially when
namespaces are involved)
6. CROSS PLATFORM SOLUTIONS - JSON
(Javascript object notation)
● Lightweight data- interchange format
● Uses human-readable text to transmit data objects
consisting of attribute–value pairs.
● Remember: xml is markup language and json is
data format
7. Google Data Encoding Solution
Options
«At Google, our mission is organizing all of the
world's information.
We use literally thousands of different data formats
and most of these formats are structured, not flat»
https://opensource.googleblog.com/2008/07/protocol-buffers-googles-data.html
8. Google Data Encoding Solution
Options
« Not efficient enough for this scale.
Writing code to work with the DOM tree can
sometimes become unwieldy.»
Option 1 : Use XML
https://opensource.googleblog.com/2008/07/protocol-buffers-googles-data.html
9. Google Data Encoding Solution
Options
«When we roll out a new version of a server, it
almost always has to start out talking to older
servers.
Also, we use many languages, so we need a portable
solution.»
Option 2 : write the raw bytes of in-memory data
structures to the wire
https://opensource.googleblog.com/2008/07/protocol-buffers-googles-data.html
10. Google Data Encoding Solution
Options
« there was a format for requests and responses
that used hand marshalling/unmarshalling of
requests and responses, and that supported a
number of versions of the protocol....»
Option 3 : Use hand-coded parsing and serialization
routines for each data structure (used solution
before protocol buffers)
https://opensource.googleblog.com/2008/07/protocol-buffers-googles-data.html
11. What are protocol buffers?
A language-neutral, platform-neutral, extensible
way of serializing structured data for use in
communications protocols, data storage, and
more.
Initially developed at Google to deal with an index
server request/response protocol.
Designed and used since 2001 in Google.
Open-sourced since 2008.
12. How do they work?
You define your structured data format in a
descriptor file (.proto file)
You run the protocol buffer compiler for your
application's language on your .proto file to
generate data access classes.
You can even update your data structure without
breaking deployed programs that are compiled
against the "old" format.
14. Message Definition
Messages defined in .proto files
Syntax:
Message [MessageName] { ... }
Can be nested
Will be converted to e.g. a Java class
15. Message Contents
Each message may have
Messages
Enums:
enum <name> {
valuename = value;
}
Fields
Each field is defined as
<rule> <type> <name> = <id> {[<options>]};
Rules : required, optional, repeated
17. Backward / Forward Compatibility
DO NOT change the tag numbers of any existing
fields.
You can delete optional or repeated fields, but you
must not add or delete any required fields.
18. Backward / Forward Compatibility
When adding new field you must use fresh tag
numbers… (i.e. tag numbers that were never used
in this protocol buffer, not even by deleted fields).
A good practice :
Make your deleted fields are reserved.
Protocol buffer compiler complains if reserved
fields are used.
19. Backward / Forward Compatibility
Changing a default value is generally OK …
But remember that default values are never sent
over the wire.
Sender Receiver
Receiver reads value as 20 if not sent by sender
22. Possible Use Cases For Us?
Java, C++, C#
IBM MQ / Solace messages
DB raw data
Log messages to disk
Show as XML / JSON
exe utility associated with protobuf files
Use Cases at Barclays Investment Bank
http://www.slideshare.net/SergeyPodolsky/google-protocol-buffers-56085699
binary
not human readable
but: platform dependend (Little Endian vs. Big Endian?)
+ memory efficient fast to parse
binary
not human readable
but: platform dependend (Little Endian vs. Big Endian?)
+ memory efficient fast to parse
Cross platform solutions are text based
+ human readable (okay, xml...)
+ platform independend (but: still encoding problems!)
+ format can evolve (e.g. additional fields in xml )
waste more memory
slow to parse
From http://www.yegor256.com/2015/11/16/json-vs-xml.html
I believe there are four features XML has that seriously set it apart from JSON or any other simple data format, like YAML for example.
XPath. To get data like the year of publication from the document above, I just send an XPath query: /book/published/year/text(). However, there has to be an XPath processor that understands my request and returns 2004. The beauty of this is that XPath 2.0 is a very powerful query engine with its own functions, predicates, axes, etc. You can literally put any logic into your XPath request without writing any traversing logic in Java, for example. You may ask "How many books were published by David West in 2004?" and get an answer, just via XPath. JSON is not even close to this.
Attributes and Namespaces. You can attach metadata to your data, just like it's done above with the id attribute. The data stays inside elements, just like the name of the book author, for example, while metadata (data about data) can and should be placed into attributes. This significantly helps in organizing and structuring information. On top of that, both elements and attributes can be marked as belonging to certain namespaces. This is a very useful technique during times when a few applications are working with the same XML document.
XML Schema. When you create an XML document in one place, modify it a few times somewhere else, and then transfer it to yet another place, you want to make sure its structure is not broken by any of these actions. One of them may use <year> to store the publication date while another uses <date> with ISO-8601. To avoid that mess in structure, create a supplementary document, which is called XML Schema, and ship it together with the main document. Everyone who wants to work with the main document will first validate its correctness using the schema supplied. This is a sort of integration testing in production. RelaxNG is a similar but simpler mechanism; give it a try if you find XML Schema too complex.
XSL. You can make modifications to your XML document without any Java/Ruby/etc. code at all. Just create an XSL transformation document and "apply" it to your original XML. As an output, you will get a new XML. The XSL language (it is purely functional, by the way) is designed for hierarchical data manipulations. It is much more suitable for this task than Java or any other OOP/procedural approach. You can transform an XML document into anything, including plain text andHTML. Some complain about XSL's complexity, but please give it a try. You won't need all of it, while its core functionality is pretty straight-forward.
From http://apigee.com/about/blog/technology/why-xml-wont-die-xml-vs-json-your-api
JSON is especially good at representing programming-language objects. If you have a JavaScript or Java object, or even a C struct, the structure of the object and all its fields can be easily and quickly converted to JSON, sent over a network, and retrieved on the other end without too much difficulty and (usually) comes out the same on both ends.
But not everything in the world is a programming-language object. Sometimes to describe a complex real-world object we have to combine different descriptions and languages from different places, mash them up, and use them to describe even more complex things. The descriptions of these complex things need to be validated, they need to be commented on, they need to be shared and sometimes annotated with additional data that doesn't affect the original structure.
When the world gets complicated and open-ended like that, what's needed is not a programming-language-format object, but a open-ended, extensible -- umm -- markup language. That's what we have today with XML.