Skip to content

Model Generator

This section describe how to convert python classes from an avro schema (avsc files). This is the inverse process that the library aims to.

ModelGenerator converts an avro schema to classes

Avro schema --> Python class

This class will be in charge of render all the python types in a proper way. The rendered result is a string that contains proper identation, decorators, imports and any extras so the result can be saved in a file and it will be ready to use.

Example

from dataclasses_avroschema import ModelGenerator

model_generator = ModelGenerator()

schema = {
    "type": "record",
    "namespace": "com.kubertenes",
    "name": "AvroDeployment",
    "fields": [
        {"name": "image", "type": "string"},
        {"name": "replicas", "type": "int"},
        {"name": "port", "type": "int"},
    ],
}

result = model_generator.render(schema=schema)

# save the result in a file
with open("models.py", mode="+w") as f:
    f.write(result)

Then explore the module models.py, the result must be

Code generated
import dataclasses

from dataclasses_avroschema import AvroModel
from dataclasses_avroschema import types


@dataclasses.dataclass
class AvroDeployment(AvroModel):
    image: str
    replicas: types.Int32
    port: types.Int32

    class Meta:
        namespace = "com.kubertenes"

Note

In future releases it will be possible to generate models for other programming langagues like java and rust

Note

You can also use dc-avro to generate the models from the command line

Mapping avro fields to python fields summary

Avro Type Python Type
string str
int long
boolean bool
float double
null None
bytes bytes
array typing.List
map typing.Dict
fixed types.confixed
enum enum.Enum
int types.Int32
float types.Float32
union typing.Union
record Python class
date datetime.date
time-millis datetime.time
time-micros types.TimeMicro
timestamp-millis datetime.datetime
timestamp-micros types.DateTimeMicro
decimal types.condecimal
uuid uuid.UUID
Avro Type Python Type
string str
int long
boolean bool
float double
null None
bytes bytes
array typing.List
map typing.Dict
fixed types.confixed
enum str, enum.Enum
int types.Int32
float types.Float32
union typing.Union
record Python class
date datetime.date
time-millis datetime.time
time-micros types.TimeMicro
timestamp-millis datetime.datetime
timestamp-micros types.DateTimeMicro
decimal types.condecimal
uuid uuid.UUID

Render a Python module

It's also possible to generate a Python module containing classes from multiple schemas using render_module.

from dataclasses_avroschema import ModelGenerator, ModelType

model_generator = ModelGenerator()

user_schema = {
    "type": "record",
    "name": "User",
    "fields": [
        {"name": "name", "type": "string", "default": "marcos"},
        {"name": "age", "type": "int"},
    ],
}
address_schema = {
    "type": "record",
    "name": "Address",
    "fields": [
        {"name": "street", "type": "string"},
        {"name": "street_number", "type": "long"},
    ],
}

result = model_generator.render_module(schemas=[user_schema, address_schema], model_type=ModelType.DATACLASS.value)

with open("models.py", mode="+w") as f:
    f.write(result)

Then, the end result is:

# models.py
from dataclasses_avroschema import AvroModel
from dataclasses_avroschema import types
import dataclasses


@dataclasses.dataclass
class User(AvroModel):
    age: types.Int32
    name: str = "marcos"


@dataclasses.dataclass
class Address(AvroModel):
    street: str
    street_number: int

Generating a single module from multiple schemas is useful for example to group schemas that belong to the same namespace.

LogicalTypes

Native logicalTypes are supported by dataclasses-avroschema but custom ones are not. If you defined a custom logicalType then the fallback is used when generating the field. In the next example we have a logicalType defined as url, which is not a native one, then the model generated will use string

from dataclasses_avroschema import ModelGenerator, ModelType

model_generator = ModelGenerator()


schema = {
    "type": "record",
    "name": "TestEvent",
    "fields": [
        {
            "name": "regular",
            "type": {
                "type": "string",
                "logicalType": "url"
            },
            "doc": "Urls"
        }
    ],
}

print(model_generator.render(schema=schema))

"""
from dataclasses_avroschema import AvroModel
import dataclasses


@dataclasses.dataclass
class TestEvent(AvroModel):
    regular: str = dataclasses.field(metadata={'doc': 'Urls'})
"""

Render Pydantic models

It is also possible to render BaseModel (pydantic) and AvroBaseModel (avro + pydantic) models as well. The end result will also include the necessaty imports and the use of pydantic.Field in case that it is needed:

For example:

schema = {
    "type": "record",
    "name": "User",
    "fields": [
        {"name": "name", "type": "string"},
        {"name": "age", "type": "long"},
        {"name": "friend", "type": ["null", "User"], "default": None},
        {"name": "relatives", "type": {"type": "array", "items": "User", "name": "relative"}, "default": []},
        {"name": "teammates", "type": {"type": "map", "values": "User", "name": "teammate"}, "default": {}},
        {"name": "money", "type": {"type": "bytes", "logicalType": "decimal", "precision": 10, "scale": 3}},
    ],
}

and then render the result:

from dataclasses_avroschema import ModelGenerator, ModelType

model_generator = ModelGenerator()
result = model_generator.render(schema=schema, model_type=ModelType.PYDANTIC.value)

# save the result in a file
with open("models.py", mode="+w") as f:
    f.write(result)

# models.py
from pydantic import BaseModel
from pydantic import Field
from pydantic import condecimal
import typing


class User(BaseModel):
    name: str
    age: int
    money: condecimal(max_digits=10, decimal_places=3)
    friend: typing.Optional[typing.Type["User"]] = None
    relatives: typing.List[typing.Type["User"]] = Field(default_factory=list)
    teammates: typing.Dict[str, typing.Type["User"]] = Field(default_factory=dict)
from dataclasses_avroschema import ModelGenerator

model_generator = ModelGenerator()
result = model_generator.render(schema=schema, model_type=ModelType.AVRODANTIC.value)

# save the result in a file
with open("models.py", mode="+w") as f:
    f.write(result)

# models.py
from dataclasses_avroschema.pydantic import AvroBaseModel
from pydantic import Field
from pydantic import condecimal
import typing


class User(AvroBaseModel):
    name: str
    age: int
    money: condecimal(max_digits=10, decimal_places=3)
    friend: typing.Optional[typing.Type["User"]] = None
    relatives: typing.List[typing.Type["User"]] = Field(default_factory=list)
    teammates: typing.Dict[str, typing.Type["User"]] = Field(default_factory=dict)

Note

Use the dataclasses_avroschema.BaseClassEnum to specify the base class

Note

decimal.Decimal are created using pydantic condecimal

Note

uuid types are created using pydantic.UUID4

Malformed schemas

Some times there are valid avro schemas but we could say that it is "malformed", for example the following schema has a field name called Address which is exactly the same name as the record Address.

{
  "type": "record",
  "name": "User",
  "fields": [
    {
      "name": "name",
      "type": "string"
    },
    {
      "name": "age",
      "type": "long"
    },
    {
      "name": "Address",  # The field name is the same as the record name
      "type": [
        "null",
        {
            "type": "record",
            "name": "Address",
            "fields": [
            {
                "name": "name",
                "type": "string"
            }
            ]
        },
      ],
      "default": None,
    }
  ]
}

If we try to generate the python models that correspond with the previous schema we end up with the following models. The result is correct because it translate to python what the schema represents, but if we checked the annotations we see that Address is overshadowed

from dataclasses_avroschema import AvroModel
import dataclasses
import typing


@dataclasses.dataclass
class Address(AvroModel):
    name: str


@dataclasses.dataclass
class User(AvroModel):
    name: str
    age: int
    Address: typing.Optional[Address] = None

# Address` is `overshadowed` !!!
print(User.__annotations__)
# >>> {'name': str, 'age': int, 'Address': NoneType}

# We do not want this!!!
print(User.fake())
# >>> User(name='ftXgdDSUzdUIamiiHOiS', age=2422, Address=None)  

If we rename the field name Address to address in the schema:

{
  "type": "record",
  "name": "User",
  "fields": [
    {
      "name": "name",
      "type": "string"
    },
    {
      "name": "age",
      "type": "long"
    },
    {
      "name": "address",  # RENAMED!!!
      "type": [
        "null",
        {
            "type": "record",
            "name": "Address",
            "fields": [
            {
                "name": "name",
                "type": "string"
            }
            ]
        },
      ],
      "default": None,
    }
  ]
}

we get a proper result:

from dataclasses_avroschema import AvroModel
import dataclasses
import typing


@dataclasses.dataclass
class Address(AvroModel):
    name: str


@dataclasses.dataclass
class User(AvroModel):
    name: str
    age: int
    address: typing.Optional[Address] = None

print(User.__annotations__)
# >>> {'name': str, 'age': int, 'address': typing.Optional[__main__.Address]}

print(User.fake())
# >>> User(name='JBZdhEWdXwFLQitWCjkc', age=3406, address=Address(name='AhlQsvXnkpcPZJvRSXLr'))

Schema with invalid python identifiers

avro schemas could contain field names that are not valid python identifiers, for example street-name. If we have the following avro schema the python model generated from it will generate valid identifiers, in this case and street_name and street_number

from dataclasses_avroschema import ModelGenerator


schema = {
    "type": "record",
    "name": "Address",
    "fields": [
        {"name": "street-name", "type": "string"},
        {"name": "street-number", "type": "long"}
    ]
}

model_generator = ModelGenerator()
result = model_generator.render(schema=schema)

# save the result in a file
with open("models.py", mode="+w") as f:
    f.write(result)

Then the result will be:

# models.py
from dataclasses_avroschema import AvroModel
import dataclasses


@dataclasses.dataclass
class Address(AvroModel):
    street_name: str
    street_number: int

Warning

If you try to generate the schema from the model, both schemas won't match. You might have to use the case functionality

Field order

Sometimes we have to work with schemas that were created by a third party and we do not have control over them. Those schemas can contain optional fields declared before required fields, which means that and invalid model will be generated. To avoid this problem the field_order property is used in the generation process. For example the following schema contains the field has_pets (optional) before required fields:

from dataclasses_avroschema import ModelGenerator


schema = {
  "type": "record",
  "name": "User",
  "fields": [
    {"name": "has_pets", "type": "boolean", "default": False},
    {"name": "name", "type": "string"},
    {"name": "age", "type": "long"},
    {"name": "money", "type": "double", "default": 100.3}
  ],
  "doc": "My User Class",
}

model_generator = ModelGenerator()
result = model_generator.render(schema=schema)

# save the result in a file
with open("models.py", mode="+w") as f:
    f.write(result)

Then the result will be:

# models.py
from dataclasses_avroschema import AvroModel
import dataclasses


@dataclasses.dataclass
class User(AvroModel):
    """
    My User Class
    """
    name: str
    age: int
    has_pets: bool = False
    money: float = 100.3

    class Meta:
        field_order = ['has_pets', 'name', 'age', 'money']

Rendering Enums

Because avro enums are represented by a python class, it is also possible to render them in isolation, for example:

from dataclasses_avroschema import ModelGenerator


enum_schema = {
    "type": "enum",
    "name": "Color",
    "symbols": [
        "red",
        "blue",
    ],
    "default": "blue",
}

model_generator = ModelGenerator()
result = model_generator.render(schema=enum_schema)

print(result)

Resulting in

import enum


class Color(enum.Enum):
    RED = "red"
    BLUE = "blue"

    class Meta:
        default = "blue"
import enum


class Color(str, enum.Enum):
    RED = "red"
    BLUE = "blue"

    @enum.nonmember
    class Meta:
        default = "blue"

Enums and case sensitivity

Sometimes there are schemas that contains the symbols which are case sensivity, for example "symbols": ["P", "p"]. Having something like that is NOT reccomended at all because it is meaninless, really hard to undestand the intention of it. Avoid it!!!

When the schema generator encounter this situation it can not generated the proper enum with uppercases key so it will use the symbol without any transformation

from dataclasses_avroschema import ModelGenerator, ModelType

schema = {
    "type": "record",
    "name": "User",
    "fields": [
        {
            "name": "unit_multi_player",
            "type": {
                "type": "enum",
                "name": "unit_multi_player",
                "symbols": ["Q", "q"],
            },
        }
    ],
}

model_generator = ModelGenerator()
result = model_generator.render(schema=schema, model_type=ModelType.DATACLASS.value)

# save the result in a file
with open("models.py", mode="+w") as f:
    f.write(result)

Then the result will be:

# models.py
from dataclasses_avroschema import AvroModel
import dataclasses
import enum


class UnitMultiPlayer(enum.Enum):
    Q = "Q"
    q = "q"


@dataclasses.dataclass
class User(AvroModel):
    unit_multi_player: UnitMultiPlayer
# models.py
from dataclasses_avroschema import AvroModel
import dataclasses
import enum


class UnitMultiPlayer(str, enum.Enum):
    Q = "Q"
    q = "q"


@dataclasses.dataclass
class User(AvroModel):
    unit_multi_player: UnitMultiPlayer

As the example shows the second enum member UnitMultiPlayer.p is not in uppercase otherwise will collide with the first member UnitMultiPlayer.P

Original schema string

Ideally, the schema from the generated model must perfectly match the original schema, unfortunately that is not always the case when avro types, that have inner names (arrays, enums, fixed and maps), are used.

To counteract a potential mismatch when referring to the schema using GeneratedModel.avro_schema(), which returns a generated schema based on the model. It is possible to specify to include the original schema string when using the ModelGenerator specifying include_original_schema=True

from dataclasses_avroschema import ModelGenerator, ModelType

schema = {
    "type": "record",
    "namespace": "com.kubertenes",
    "name": "AvroDeployment",
    "fields": [
        {"name": "image", "type": "string"},
        {"name": "replicas", "type": "int"},
        {"name": "port", "type": "int"},
    ],
}

model_generator = ModelGenerator()
result = model_generator.render(schema=schema, model_type=ModelType.DATACLASS.value, include_original_schema=True)

# save the result in a file
with open("models.py", mode="+w") as f:
    f.write(result)

Then the result will be:

# models.py
import dataclasses

from dataclasses_avroschema import AvroModel
from dataclasses_avroschema import types


@dataclasses.dataclass
class AvroDeployment(AvroModel):
    image: str
    replicas: types.Int32
    port: types.Int32

    class Meta:
        namespace = "com.kubertenes"
        original_schema = '{"type": "record", "namespace": "com.kubertenes", "name": "AvroDeployment", "fields": [{"name": "image", "type": "string"}, {"name": "replicas", "type": "int"}, {"name": "port", "type": "int"}]}'

As the example shows, the Meta class of AvroDeployment, now contains an "original_schema" field AvroDeployment.Meta.original_schema, which can be referred to instead.