Dataclasses Avro Schema

If you are immerse in the data streaming world, probably you had faced the serialization problem. There are different techniques/frameworks to achieve this, for example Thrift, Protocol Buffers or Apache Avro.

Personally, I am using Avro serialization and I always had to came up with avro schemas based on desired payload keeping in mind fields specification and attributes. This is not a heavy task for simple uses cases, but when we have complex types, data relationships (nested schemas) or custom types the process gets a bit complicated. I asked myself, what if we can generate the avro schemas based on a python class? Most of the time the desired payload that we want get after deserialization is based on a Python class. The ending results was:

Dataclasses Avro Schema, Generate Avro Schemas from a Python class 😀

Let's see an example. Suppose that we want an avro schema that represents a User:

{
    "type": "record",
    "name": "User",
    "fields" : [
        {"name": "name", "type": "string"},
        {"name": "age", "type": "int"},
        {"name": "has_pets", "type": "boolean"},
        {"name": "money", "type": "float"}
    ],
    "doc": "User(name: str, age: int, has_pets: bool, money: float)"
}

Instead of remember all fields specifications, we can write the python class to get the schema:

from dataclasses_avroschema.schema_generator import SchemaGenerator

class User:
    name: str
    age: int
    has_pets: bool
    money: float

SchemaGenerator(User).avro_schema()

{
  "type": "record",
  "name": "User",
  "fields": [
    {"name": "name", "type": "string"},
    {"name": "age", "type": "int"},
    {"name": "has_pets", "type": "boolean"},
    {"name": "money", "type": "float"}
  ],
  "doc": "User(name: str, age: int, has_pets: bool, money: float)"
}'

Super simple and straightforward. We have all this features:

  • Primitive types: int, long, float, boolean, string and null support
  • Complex types: enum, array, map, fixed, unions and records support
  • Logical Types: date, time, datetime, uuid support
  • Schema relations (oneToOne, oneToMany)
  • Recursive Schemas
  • Generate Avro Schemas from faust.Record

So, if you need an avro schema, give a chance to dataclasses-avroschema 😉

Comments

Comments powered by Disqus