Open main menu

Comparison of data-serialization formats

This is a comparison of data-serialization formats, various ways to convert complex objects to sequences of bits. It does not include markup languages used exclusively as document file formats.

OverviewEdit

Name Creator-maintainer Based on Standardized? Specification Binary? Human-readable? Supports references?e Schema-IDL? Standard APIs Supports Zero-copy operations
Apache Avro Apache Software Foundation N/A No Apache Avro™ 1.8.1 Specification Yes No N/A Yes (built-in) N/A N/A
Apache Parquet Apache Software Foundation N/A No Apache Parquet[1] Yes No No N/A Java, Python No
ASN.1 ISO, IEC, ITU-T N/A Yes ISO/IEC 8824; X.680 series of ITU-T Recommendations Yes
(BER, DER, PER, OER, or custom via ECN)
Yes
(XER, JER, GSER, or custom via ECN)
Partialf Yes (built-in) N/A Yes (OER)
Bencode Bram Cohen (creator)
BitTorrent, Inc. (maintainer)
N/A De facto standard via BitTorrent Enhancement Proposal (BEP) Part of BitTorrent protocol specification Partially
(numbers and delimiters are ASCII)
No No No No N/A
Binn Bernardo Ramos N/A No Binn Specification Yes No No No No Yes
BSON MongoDB JSON No BSON Specification Yes No No No No N/A
CBOR Carsten Bormann, P. Hoffman JSON (loosely) Yes RFC 7049 Yes No Yes
through tagging
Yes
(CDDL)
No Yes
Comma-separated values (CSV) RFC author:
Yakov Shafranovich
N/A Partial
(myriad informal variants used)
RFC 4180
(among others)
No Yes No No No No
Common Data Representation (CDR) Object Management Group N/A Yes General Inter-ORB Protocol Yes No Yes Yes ADA, C, C++, Java, Cobol, Lisp, Python, Ruby, Smalltalk N/A
D-Bus Message Protocol freedesktop.org N/A Yes D-Bus Specification Yes No No Partial
(Signature strings)
Yes
(see D-Bus)
N/A
Efficient XML Interchange (EXI) W3C XML, Efficient XML Yes Efficient XML Interchange (EXI) Format 1.0 Yes Yes
(XML)
Yes
(XPointer, XPath)
Yes
(XML Schema)
Yes
(DOM, SAX, StAX, XQuery, XPath)
N/A
FlatBuffers Google N/A No flatbuffers github page Specification Yes Yes
(Apache Arrow)
Partial
(internal to the buffer)
Yes [2] C++, Java, C#, Go, Python, Rust, JavaScript, PHP, C, Dart, Lua, TypeScript Yes
Fast Infoset ISO, IEC, ITU-T XML Yes ITU-T X.891 and ISO/IEC 24824-1:2007 Yes No Yes
(XPointer, XPath)
Yes
(XML schema)
Yes
(DOM, SAX, XQuery, XPath)
N/A
FHIR Health_Level_7 REST basics Yes Fast Healthcare Interoperability Resources Yes Yes Yes Yes Hapi for FHIR[1] JSON, XML, Turtle No
Ion Amazon JSON No The Amazon Ion Specification Yes Yes No No No N/A
Java serialization Oracle Corporation N/A Yes Java Object Serialization Yes No Yes No Yes N/A
JSON Douglas Crockford JavaScript syntax Yes STD 90/RFC 8259
(ancillary:
RFC 6901,
RFC 6902), ECMA-404, ISO/IEC 21778:2017
No, but see BSON, Smile, UBJSON Yes Yes
(JSON Pointer (RFC 6901);
alternately:
JSONPath, JPath, JSPON, json:select()), JSON-LD
Partial
(JSON Schema Proposal, ASN.1 with JER, Kwalify, Rx, Itemscript Schema), JSON-LD
Partial
(Clarinet, JSONQuery, JSONPath), JSON-LD
No
MessagePack Sadayuki Furuhashi JSON (loosely) No MessagePack format specification Yes No No No No Yes
Netstrings Dan Bernstein N/A No netstrings.txt Yes Yes No No No Yes
OGDL Rolf Veen ? No Specification Yes
(Binary Specification)
Yes Yes
(Path Specification)
Yes
(Schema WD)
N/A
OPC-UA Binary OPC Foundation N/A No opcfoundation.org Yes No Yes No No N/A
OpenDDL Eric Lengyel C, PHP No OpenDDL.org No Yes Yes No Yes
(OpenDDL Library)
N/A
Pickle (Python) Guido van Rossum Python De facto standard via Python Enhancement Proposals (PEPs) [3] PEP 3154 -- Pickle protocol version 4 Yes No No No Yes
([4])
No
Property list NeXT (creator)
Apple (maintainer)
? Partial Public DTD for XML format Yesa Yesb No ? Cocoa, CoreFoundation, OpenStep, GnuStep No
Protocol Buffers (protobuf) Google N/A No Developer Guide: Encoding Yes Partiald No Yes (built-in) C++, C#, Java, Python, Javascript, Go No
S-expressions John McCarthy (original)
Ron Rivest (internet draft)
Lisp, Netstrings Partial
(largely de facto)
"S-Expressions" Internet Draft Yes
("Canonical representation")
Yes
("Advanced transport representation")
No No N/A
Smile Tatu Saloranta JSON No Smile Format Specification Yes No No Partial
(JSON Schema Proposal, other JSON schemas/IDLs)
Partial
(via JSON APIs implemented with Smile backend, on Jackson, Python)
N/A
SOAP W3C XML Yes W3C Recommendations:
SOAP/1.1
SOAP/1.2
Partial
(Efficient XML Interchange, Binary XML, Fast Infoset, MTOM, XSD base64 data)
Yes Yes
(built-in id/ref, XPointer, XPath)
Yes
(WSDL, XML schema)
Yes
(DOM, SAX, XQuery, XPath)
N/A
Structured Data eXchange Formats Max Wildgrube N/A Yes RFC 3072 Yes No No No N/A
Thrift Facebook (creator)
Apache (maintainer)
N/A No Original whitepaper Yes Partialc No Yes (built-in) N/A
UBJSON The Buzz Media, LLC JSON, BSON No [5] Yes No No No No N/A
eXternal Data Representation (XDR) Sun Microsystems (creator)
IETF (maintainer)
N/A Yes STD 67/RFC 4506 Yes No Yes Yes Yes N/A
XML W3C SGML Yes W3C Recommendations:
1.0 (Fifth Edition)
1.1 (Second Edition)
Partial
(Efficient XML Interchange, Binary XML, Fast Infoset, XSD base64 data)
Yes Yes
(XPointer, XPath)
Yes
(XML schema, RELAX NG)
Yes
(DOM, SAX, XQuery, XPath)
N/A
XML-RPC Dave Winer[2] XML No XML-RPC Specification No Yes No No No N/A
YAML Clark Evans,
Ingy döt Net,
and Oren Ben-Kiki
C, Java, Perl, Python, Ruby, Email, HTML, MIME, URI, XML, SAX, SOAP, JSON[3] No Version 1.2 No Yes Yes Partial
(Kwalify, Rx, built-in language type-defs)
No N/A
Name Creator-maintainer Based on Standardized? Specification Binary? Human-readable? Supports references?e Schema-IDL? Standard APIs Supports Zero-copy operations
  • a. ^ The current default format is binary.
  • b. ^ The "classic" format is plain text, and an XML format is also supported.
  • c. ^ Theoretically possible due to abstraction, but no implementation is included.
  • d. ^ The primary format is binary, but a text format is available.[4]
  • e. ^ Means that generic tools/libraries know how to encode, decode, and dereference a reference to another piece of data in the same document. A tool may require the IDL file, but no more. Excludes custom, non-standardized referencing techniques.
  • f. ^ ASN.1 does offer OIDs, a standard format for globally unique identifiers, as well as a standard notation ("absolute reference") for referencing a component of a value. Thus it would be possible to reference a component of an encoded value present in a document by combining an OID (assigned to the document) and an "absolute reference" to the component of the value. However, there is no standard way to indicate that a field contains such an absolute reference. Therefore, a generic ASN.1 tool/library cannot automatically encode/decode/resolve references within a document without help from custom-written program code.
  • g. ^ VelocyPack offers a value type to store pointers to other VPack items. It is allowed if the VPack data resides in memory, but not if stored on disk or sent over a network.
  • h. ^ The primary format is binary, but a text format is available.[5][6]
  • i. ^ The primary format is binary, but text and json formats are available.[7]

Syntax comparison of human-readable formatsEdit

Format Null Boolean true Boolean false Integer Floating-point String Array Associative array/Object
ASN.1
(XML Encoding Rules)
<foo /> <foo>true</foo> <foo>false</foo> <foo>685230</foo> <foo>6.8523015e+5</foo> <foo>A to Z</foo>
<SeqOfUnrelatedDatatypes>
    <isMarried>true</isMarried>
    <hobby />
    <velocity>-42.1e7</velocity>
    <bookname>A to Z</bookname>
    <bookname>We said, "no".</bookname>
</SeqOfUnrelatedDatatypes>
An object (the key is a field name):
<person>
    <isMarried>true</isMarried>
    <hobby />
    <height>1.85</height>
    <name>Bob Peterson</name>
</person>

A data mapping (the key is a data value):

<competition>
    <measurement>
        <name>John</name>
        <height>3.14</height>
    </measurement>
    <measurement>
        <name>Jane</name>
        <height>2.718</height>
    </measurement>
</competition>

a

CSVb nulla
(or an empty element in the row)a
1a
truea
0a
falsea
685230
-685230a
6.8523015e+5a A to Z
"We said, ""no""."
true,,-42.1e7,"A to Z"
42,1
A to Z,1,2,3
Format Null Boolean true Boolean false Integer Floating-point String Array Associative array/Object
Ion

null
null.null
null.bool
null.int
null.float
null.decimal
null.timestamp
null.string
null.symbol
null.blob
null.clob
null.struct
null.list
null.sexp

true false 685230
-685230
0xA74AE
0b111010010101110
6.8523015e5 "A to Z"

'''
A
to
Z
'''
[true, null, -42.1e7, "A to Z"]
{'42': true, 'A to Z': [1, 2, 3]}
Netstringsc 0:,a
4:null,a
1:1,a
4:true,a
1:0,a
5:false,a
6:685230,a 9:6.8523e+5,a 6:A to Z, 29:4:true,0:,7:-42.1e7,6:A to Z,, 41:9:2:42,1:1,,25:6:A to Z,12:1:1,1:2,1:3,,,,a
JSON null true false 685230
-685230
6.8523015e+5 "A to Z"
[true, null, -42.1e7, "A to Z"]
{"42": true, "A to Z": [1, 2, 3]}
OGDL[verification needed] nulla truea falsea 685230a 6.8523015e+5a "A to Z"
'A to Z'
NoSpaces
true
null
-42.1e7
"A to Z"

(true, null, -42.1e7, "A to Z")

42
  true
"A to Z"
  1
  2
  3
42
  true
"A to Z", (1, 2, 3)
Format Null Boolean true Boolean false Integer Floating-point String Array Associative array/Object
OpenDDL ref {null} bool {true} bool {false} int32 {685230}
int32 {0x74AE}
int32 {0b111010010101110}
float {6.8523015e+5} string {"A to Z"} Homogeneous array:
int32 {1, 2, 3, 4, 5}

Heterogeneous array:

array
{
    bool {true}
    ref {null}
    float {-42.1e7}
    string {"A to Z"}
}
dict
{
    value (key = "42") {bool {true}}
    value (key = "A to Z") {int32 {1, 2, 3}}
}
Pickle (Python) N. I01\n. I00\n. I685230\n. F685230.15\n. S'A to Z'\n. (lI01\na(laF-421000000.0\naS'A to Z'\na. (dI42\nI01\nsS'A to Z'\n(lI1\naI2\naI3\nas.
Property list
(plain text format)[8]
N/A <*BY> <*BN> <*I685230> <*R6.8523015e+5> "A to Z" ( <*BY>, <*R-42.1e7>, "A to Z" )
{
    "42" = <*BY>;
    "A to Z" = ( <*I1>, <*I2>, <*I3> );
}
Property list
(XML format)[9][10]
N/A <true /> <false /> <integer>685230</integer> <real>6.8523015e+5</real> <string>A to Z</string>
<array>
    <true />
    <real>-42.1e7</real>
    <string>A to Z</string>
</array>
<dict>
    <key>42</key>
    <true />
    <key>A to Z</key>
    <array>
        <integer>1</integer>
        <integer>2</integer>
        <integer>3</integer>
    </array>
</dict>
Protocol Buffers N/A true false 685230
-685230
20.0855369 "A to Z"
"sdfff2 \000\001\002\377\376\375"
"q\tqq<>q2&\001\377"
field1: "value1"
field1: "value2"
field1: "value3
anotherfield {
  foo: 123
  bar: 456
}
anotherfield {
  foo: 222
  bar: 333
}
thing1: "blahblah"
thing2: 18923743
thing3: -44
thing4 {
  submessage_field1: "foo"
  submessage_field2: false
}
enumeratedThing: SomeEnumeratedValue
thing5: 123.456
[extensionFieldFoo]: "etc"
[extensionFieldThatIsAnEnum]: EnumValue
Format Null Boolean true Boolean false Integer Floating-point String Array Associative array/Object
S-expressions NIL
nil
T
#tf
true
NIL
#ff
false
685230 6.8523015e+5 abc
"abc"
#616263#
3:abc
{MzphYmM=}
|YWJj|
(T NIL -42.1e7 "A to Z") ((42 T) ("A to Z" (1 2 3)))
YAML ~
null
Null
NULL[11]
y
Y
yes
Yes
YES
on
On
ON
true
True
TRUE[12]
n
N
no
No
NO
off
Off
OFF
false
False
FALSE[12]
685230
+685_230
-685230
02472256
0x_0A_74_AE
0b1010_0111_0100_1010_1110
190:20:30[13]
6.8523015e+5
685.230_15e+03
685_230.15
190:20:30.15
.inf
-.inf
.Inf
.INF
.NaN
.nan
.NAN[14]
A to Z
"A to Z"
'A to Z'
[y, ~, -42.1e7, "A to Z"]
- y
-
- -42.1e7
- A to Z
{"John":3.14, "Jane":2.718}
42: y
A to Z: [1, 2, 3]
XMLe and SOAP <null />a true false 685230 6.8523015e+5 A to Z
<item>true</item>
<item xsi:nil="true"/>
<item>-42.1e7</item>
<item>A to Z<item>
<map>
  <entry key="42">true</entry>
  <entry key="A to Z">
    <item val="1"/>
    <item val="2"/>
    <item val="3"/>
  </entry>
</map>
XML-RPC <value><boolean>1</boolean></value> <value><boolean>0</boolean></value> <value><int>685230</int></value> <value><double>6.8523015e+5</double></value> <value><string>A to Z</string></value>
<value><array>
  <data>
  <value><boolean>1</boolean></value>
  <value><double>-42.1e7</double></value>
  <value><string>A to Z</string></value>
  </data>
  </array></value>
<value><struct>
  <member>
    <name>42</name>
    <value><boolean>1</boolean></value>
    </member>
  <member>
    <name>A to Z</name>
    <value>
      <array>
        <data>
          <value><int>1</int></value>
          <value><int>2</int></value>
          <value><int>3</int></value>
          </data>
        </array>
      </value>
    </member>
</struct>
  • a. ^ Omitted XML elements are commonly decoded by XML data binding tools as NULLs. Shown here is another possible encoding; XML schema does not define an encoding for this datatype.
  • b. ^ The RFC CSV specification only deals with delimiters, newlines, and quote characters; it does not directly deal with serializing programming data structures.
  • c. ^ The netstrings specification only deals with nested byte strings; anything else is outside the scope of the specification.
  • d. ^ PHP will unserialize any floating-point number correctly, but will serialize them to their full decimal expansion. For example, 3.14 will be serialized to 3.140000000000000124344978758017532527446746826171875.
  • e. ^ XML data bindings and SOAP serialization tools provide type-safe XML serialization of programming data structures into XML. Shown are XML values that can be placed in XML elements and attributes.
  • f. ^ This syntax is not compatible with the Internet-Draft, but is used by some dialects of Lisp.

Comparison of binary formatsEdit

Format Null Booleans Integer Floating-point String Array Associative array/Object
ASN.1
(BER, PER or OER encoding)
NULL type BOOLEAN:
  • BER: as 1 byte in binary form;
  • PER: as 1 bit;
  • OER: as 1 byte
INTEGER:
  • BER: variable-length big-endian binary representation (up to 2^(2^1024) bits);
  • PER Unaligned: a fixed number of bits if the integer type has a finite range; a variable number of bits otherwise;
  • PER Aligned: a fixed number of bits if the integer type has a finite range and the size of the range is less than 65536; a variable number of octets otherwise;
  • OER: one, two, or four octets (either signed or unsigned) if the integer type has a finite range that fits in that number of octets; a variable number of octets otherwise
REAL:

base-10 real values are represented as character strings in ISO 6093 format;

binary real values are represented in a binary format that includes the mantissa, the base (2, 8, or 16), and the exponent;

the special values NaN, -INF, +INF, and negative zero are also supported

Multiple valid types (VisibleString, PrintableString, GeneralString, UniversalString, UTF8String) data specifications SET OF (unordered) and SEQUENCE OF (guaranteed order) user definable type
Binn \x00 True: \x01
False: \x02
big-endian 2's complement signed and unsigned 8/16/32/64 bits single: big-endian binary32
double: big-endian binary64
UTF-8 encoded, null terminated, preceded by int8 or int32 string length in bytes Typecode (one byte) + 1-4 bytes size + 1-4 bytes items count + list items Typecode (one byte) + 1-4 bytes size + 1-4 bytes items count + key/value pairs
BSON Null type – 0 bytes for value True: one byte \x01
False: \x00
int32: 32-bit little-endian 2's complement or int64: 64-bit little-endian 2's complement double: little-endian binary64 UTF-8 encoded, preceded by int32 encoded string length in bytes BSON embedded document with numeric keys BSON embedded document
Concise Binary Object Representation (CBOR) \xf6 True: \xf5
False: \xf4
Small positive number \x00-\x17, small negative number \x20-\x37 (abs(N) <= 23)

8-bit: positive \x18\xhh, negative \x38\xhh
16-bit: positive \x19<uint16_t>, negative \x39<uint16_t>
32-bit: positive \x1A<uint32_t>, negative \x3A<uint32_t>
64-bit: positive \x1B<uint64_t>, negative \x3B<uint64_t>
Negative number x encoded as ~x (binary inversion) or as (-x-1)
Byte order – Big-endian

Typecode (one byte) + IEEE half/single/double Typecode with length (like integer coding) and content.

Bytestring and UTF-8 have different typecode

Typecode with count (like integer coding) and items Typecode with pairs count (like integer coding) and pairs
Efficient XML Interchange (EXI) xsi:nil element (1-4 bits depending on context) 1 bit. 0–12 bits (log2 range) bits for integers with defined ranges less than 4096. Extensible sequence of octets with infinite range for larger or undefined ranges. Also supports custom representations. Scalable floating point representation requiring 18 to 88 bits depending on magnitude. Also supports IEEE and custom representations. Length prefixed sequence of Unicode code points with partitioned string tables for efficient representation of repeated items. The length and code points are represented as variable length unsigned integers where values under 128 require 1 octet each. Also supports custom representations. Repeated elements or length-prefixed list of values. Also supports custom representations. Ordered (sequence) or unordered (all) group of named elements.
FlatBuffers Encoded as absence of field in parent object True: one byte \x01
False: \x00
little-endian 2's complement signed and unsigned 8/16/32/64 bits floats: little-endian binary32

doubles: little-endian binary64

UTF-8 encoded, preceded by 32 bit integer length of string in bytes Vectors of any other type, preceded by 32 bit integer length of number of elements Tables (schema defined types) or Vectors sorted by key (maps / dictionaries)
MessagePack \xc0 True: \xc3
False: \xc2
Single byte "fixnum" (values -32..127)

or typecode (one byte) + big-endian (u)int8/16/32/64

Typecode (one byte) + IEEE single/double Typecode + up to 15 bytes
or
typecode + length as uint8/16/32 + bytes;
encoding is unspecified[15]
As "fixarray" (single-byte prefix + up to 15 array items)

or typecode (one byte) + 2–4 bytes length + array items

As "fixmap" (single-byte prefix + up to 15 key-value pairs)

or typecode (one byte) + 2–4 bytes length + key-value pairs

Netstrings 0:, True: 1:1,

False: 1:0,

OGDL Binary
Property list
(binary format)
Protocol Buffers Variable encoding length signed 32-bit: varint encoding of "ZigZag"-encoded value (n << 1) XOR (n >> 31)

Variable encoding length signed 64-bit: varint encoding of "ZigZag"-encoded (n << 1) XOR (n >> 63)
Constant encoding length 32-bit: 32 bits in little-endian 2's complement
Constant encoding length 64-bit: 64 bits in little-endian 2's complement

floats: little-endian binary32

doubles: little-endian binary64

UTF-8 encoded, preceded by varint-encoded integer length of string in bytes Repeated value with the same tag N/A
Smile \x21 True: \x23
False: \x22
Single byte "small" (values -16..15 encoded using \xc0 - \xdf),

zigzag-encoded varints (1–11 data bytes), or BigInteger

IEEE single/double, BigDecimal Length-prefixed "short" Strings (up to 64 bytes), marker-terminated "long" Strings and (optional) back-references Arbitrary-length heterogenous arrays with end-marker Arbitrary-length key/value pairs with end-marker
Structured Data eXchange Formats (SDXF) big-endian signed 24-bit or 32-bit integer big-endian IEEE double either UTF-8 or ISO 8859-1 encoded list of elements with identical ID and size, preceded by array header with int16 length chunks can contain other chunks to arbitrary depth
Thrift

Any XML based representation can be compressed, or generated as, using EXI - Efficient XML Interchange, which is a "Schema Informed" (as opposed to schema-required, or schema-less) binary compression standard for XML.

See alsoEdit

ReferencesEdit

  1. ^ "HAPI FHIR - The Open Source FHIR API for Java". hapifhir.io.
  2. ^ "A Brief History of SOAP". www.xml.com.
  3. ^ Ben-Kiki, Oren; Evans, Clark; Net, Ingy döt (2009-10-01). "YAML Ain't Markup Language (YAML) Version 1.2". The Official YAML Web Site. Retrieved 2012-02-10.
  4. ^ "text_format.h - Protocol Buffers". Google Developers.
  5. ^ "Cap'n Proto serialization/RPC system: core tools and C++ library - capnproto/capnproto". 2 April 2019 – via GitHub.
  6. ^ "Cap'n Proto: The capnp Tool". capnproto.org.
  7. ^ "Fast Binary Encoding is ultra fast and universal serialization solution for C++, C#, Go, Java, JavaScript, Kotlin, Python, Ruby: chronoxor/FastBinaryEncoding". 2 April 2019 – via GitHub.
  8. ^ "NSPropertyListSerialization class documentation". www.gnustep.org.
  9. ^ "Documentation Archive". developer.apple.com.
  10. ^ "Documentation Archive". developer.apple.com.
  11. ^ Oren Ben-Kiki; Clark Evans; Brian Ingerson (2005-01-18). "Null Language-Independent Type for YAML Version 1.1". YAML.org. Retrieved 2009-09-12.
  12. ^ a b Oren Ben-Kiki; Clark Evans; Brian Ingerson (2005-01-18). "Boolean Language-Independent Type for YAML Version 1.1". YAML.org. Clark C. Evans. Retrieved 2009-09-12.
  13. ^ Oren Ben-Kiki; Clark Evans; Brian Ingerson (2005-02-11). "Integer Language-Independent Type for YAML Version 1.1". YAML.org. Clark C. Evans. Retrieved 2009-09-12.
  14. ^ Oren Ben-Kiki; Clark Evans; Brian Ingerson (2005-01-18). "Floating-Point Language-Independent Type for YAML Version 1.1". YAML.org. Clark C. Evans. Retrieved 2009-09-12.
  15. ^ "MessagePack is an extremely efficient object serialization library. It's like JSON, but very fast and small.: msgpack/msgpack". 2 April 2019 – via GitHub.

External linksEdit