Skip to content

Commands

This section describes all the commands supported by this library together with dataclasses-avroschema. To show the commands we will work with the following schema:

{
  "type": "record",
  "name": "UserAdvance",
  "fields": [
    {
      "name": "name",
      "type": "string"
    },
    {
      "name": "age",
      "type": "long"
    },
    {
      "name": "pets",
      "type": {
        "type": "array",
        "items": "string",
        "name": "pet"
      }
    },
    {
      "name": "accounts",
      "type": {
        "type": "map",
        "values": "long",
        "name": "account"
      }
    },
    {
      "name": "favorite_colors",
      "type": {
        "type": "enum",
        "name": "FavoriteColor",
        "symbols": [
          "BLUE",
          "YELLOW",
          "GREEN"
        ]
      }
    },
    {
      "name": "has_car",
      "type": "boolean",
      "default": false
    },
    {
      "name": "country",
      "type": "string",
      "default": "Argentina"
    },
    {
      "name": "address",
      "type": [
        "null",
        "string"
      ],
      "default": null
    },
    {
      "name": "md5",
      "type": {
        "type": "fixed",
        "name": "md5",
        "size": 16
      }
    }
  ]
}

Note

All the commands can be executed using a path or a url

Validate schema

The previous schema is a valid one. If we assume that we have a schema.avsc in the file system which contains the previous schema we can validate it:

dc-avro validate-schema --path schema.avsc

resulting in

Valid schema!! 👍

{
    'type': 'record',
    'name': 'UserAdvance',
    'fields': [
        {'name': 'name', 'type': 'string'},
        {'name': 'age', 'type': 'long'},
        {'name': 'pets', 'type': {'type': 'array', 'items': 'string', 'name': 'pet'}},
        {'name': 'accounts', 'type': {'type': 'map', 'values': 'long', 'name': 'account'}},
        {'name': 'favorite_colors', 'type': {'type': 'enum', 'name': 'FavoriteColor', 'symbols': ['BLUE', 'YELLOW', 'GREEN']}},
        {'name': 'has_car', 'type': 'boolean', 'default': False},
        {'name': 'country', 'type': 'string', 'default': 'Argentina'},
        {'name': 'address', 'type': ['null', 'string'], 'default': None},
        {'name': 'md5', 'type': {'type': 'fixed', 'name': 'md5', 'size': 16}}
    ]
}

If the previous schema is stored in a schema registry, for example in https://schema-registry/schema/1 we can validate it using the --url:

dc-avro validate-schema --url https://schema-registry/schema/1

resulting in

Valid schema!! 👍

{
    'type': 'record',
    'name': 'UserAdvance',
    'fields': [
        {'name': 'name', 'type': 'string'},
        {'name': 'age', 'type': 'long'},
        {'name': 'pets', 'type': {'type': 'array', 'items': 'string', 'name': 'pet'}},
        {'name': 'accounts', 'type': {'type': 'map', 'values': 'long', 'name': 'account'}},
        {'name': 'favorite_colors', 'type': {'type': 'enum', 'name': 'FavoriteColor', 'symbols': ['BLUE', 'YELLOW', 'GREEN']}},
        {'name': 'has_car', 'type': 'boolean', 'default': False},
        {'name': 'country', 'type': 'string', 'default': 'Argentina'},
        {'name': 'address', 'type': ['null', 'string'], 'default': None},
        {'name': 'md5', 'type': {'type': 'fixed', 'name': 'md5', 'size': 16}}
    ]
}

If a schema is invalid, for example the following one:

{
  "type": "record",
  "name": "UserAdvance",
  "fields": [
    {
      "name": "name",
      "type": "string"
    },
    {
      "name": "age",
      "type": "long"
    },
    {
      "name": "pets",
      "type": {
        "type": "array",
        "items": "string",
        "name": "pet"
      }
    },
    {
      "name": "accounts",
      "type": {
        "type": "map",
        "values": "long",
        "name": "account"
      }
    },
    {
      "name": "favorite_colors",
      "type": {
        "type": "enum",
        "name": "FavoriteColor",
        "symbols": [
          "BLUE",
          "YELLOW",
          "GREEN"
        ]
      }
    },
    {
      "name": "has_car",
      "type": "boolean",
      "default": 1 #!!!ERROR!!!
    },
    {
      "name": "country",
      "type": "string",
      "default": "Argentina"
    },
    {
      "name": "address",
      "type": [
        "null",
        "string"
      ],
      "default": 10
    },
    {
      "name": "md5",
      "type": {
        "type": "fixed",
        "name": "md5",
        "size": 16
      }
    }
  ]
}

The result will be:

InvalidSchema: Schema {'type': 'record', 'name': 'UserAdvance', 'fields': [{'name': 'name', 'type': 'string'}, {'name': 'age',
'type': 'long'}, {'name': 'pets', 'type': {'type': 'array', 'items': 'string', 'name': 'pet'}}, {'name': 'accounts', 'type':
{'type': 'map', 'values': 'long', 'name': 'account'}}, {'name': 'favorite_colors', 'type': {'type': 'enum', 'name':
'FavoriteColor', 'symbols': ['BLUE', 'YELLOW', 'GREEN']}}, {'name': 'has_car', 'type': 'boolean', 'default': 1}, {'name':
'country', 'type': 'string', 'default': 'Argentina'}, {'name': 'address', 'type': ['null', 'string'], 'default': 10}, {'name':
'md5', 'type': {'type': 'fixed', 'name': 'md5', 'size': 16}}]} is not valid.
 Error: `Default value <1> must match schema type: boolean`

Lint

To check several avro schemas you can use following command

dc-avro lint tests/schemas/example.avsc tests/schemas/example_v2.avsc

and get the following output:

👍 Total valid schemas: 2
tests/schemas/example.avsc
tests/schemas/example_v2.avsc

For incorrect schema the run is following:

dc-avro lint tests/schemas/invalid_example.avsc

and corresponding output:

💥 File: tests/schemas/invalid_example.avsc
Schema {'type': 'record', 'name': 'UserAdvance', 'fields': [{'name': 'name', 'type': 'string'}, {'name': 'age', 'type': 'long'}, {'name': 'pets', 'type': {'type': 
'array', 'items': 'string', 'name': 'pet'}}, {'name': 'accounts', 'type': {'type': 'map', 'values': 'long', 'name': 'account'}}, {'name': 'favorite_colors', 'type': 
{'type': 'enum', 'name': 'FavoriteColor', 'symbols': ['BLUE', 'YELLOW', 'GREEN']}}, {'name': 'has_car', 'type': 'boolean', 'default': 1}, {'name': 'country', 'type': 
'string', 'default': 'Argentina'}, {'name': 'address', 'type': ['null', 'string'], 'default': 10}, {'name': 'md5', 'type': {'type': 'fixed', 'name': 'md5', 'size': 
16}}]} is not valid.
 Error: `Default value <1> must match schema type: boolean`
╭─────────────────────────────── Traceback (most recent call last) ────────────────────────────────╮
│ ~/dc-avro/dc_avro/main.py:176 in lint                                 │
│                                                                                                  │
│   173          console.print(":boom: File: " + error_path)                                    │
│   174          console.print(f"[red]{error}[/red]")                                           │
│   175       app.pretty_exceptions_show_locals = False                                          │
│  176       raise InvalidSchema(error_msg)                                                     │
│   177                                                                                            │
╰──────────────────────────────────────────────────────────────────────────────────────────────────╯
InvalidSchema: Total errors detected: 1

Pre-commit

Add the following lines to your .pre-commit-config.yaml file to enable avro schemas linting

  - repo: https://github.com/marcosschroh/dc-avro.git
    rev: 0.7.0
    hooks:
      - id: lint-avsc
        additional_dependencies: [typing_extensions]

Generate models from schemas

Python models can be generated using the command generate-model. This command also works with path and url. It is also possible to provide the --model-type that will be used in the models. This models can be [dataclass|pydantic|avrodantic]

dc-avro generate-model --path tests/schemas/example.avsc

from dataclasses_avroschema import AvroModel
from dataclasses_avroschema import types
import dataclasses
import enum
import typing


class FavoriteColor(enum.Enum):
    BLUE = "BLUE"
    YELLOW = "YELLOW"
    GREEN = "GREEN"


@dataclasses.dataclass
class UserAdvance(AvroModel):
    name: str
    age: int
    pets: typing.List
    accounts: typing.Dict
    favorite_colors: FavoriteColor
    md5: types.Fixed = types.Fixed(16)
    has_car: bool = False
    country: str = "Argentina"
    address: typing.Optional = None

    class Meta:
        field_order = ['name', 'age', 'pets', 'accounts', 'favorite_colors', 'has_car', 'country', 'address', 'md5']
dc-avro generate-model --path tests/schemas/example.avsc --model-type pydantic

from dataclasses_avroschema import types
from pydantic import BaseModel
import enum
import typing


class FavoriteColor(enum.Enum):
    BLUE = "BLUE"
    YELLOW = "YELLOW"
    GREEN = "GREEN"


class UserAdvance(BaseModel):
    name: str
    age: int
    pets: typing.List
    accounts: typing.Dict
    favorite_colors: FavoriteColor
    md5: types.Fixed = types.Fixed(16)
    has_car: bool = False
    country: str = "Argentina"
    address: typing.Optional = None

    class Meta:
        field_order = ['name', 'age', 'pets', 'accounts', 'favorite_colors', 'has_car', 'country', 'address', 'md5']
dc-avro generate-model --path tests/schemas/example.avsc --model-type avrodantic

from dataclasses_avroschema import types
from pydantic import BaseModel
import enum
import typing


class FavoriteColor(enum.Enum):
    BLUE = "BLUE"
    YELLOW = "YELLOW"
    GREEN = "GREEN"


class UserAdvance(AvroBaseModel):
    name: str
    age: int
    pets: typing.List
    accounts: typing.Dict
    favorite_colors: FavoriteColor
    md5: types.Fixed = types.Fixed(16)
    has_car: bool = False
    country: str = "Argentina"
    address: typing.Optional = None

    class Meta:
        field_order = ['name', 'age', 'pets', 'accounts', 'favorite_colors', 'has_car', 'country', 'address', 'md5']

Note

If you want to save the result to a local file you can execute dc-avro generate-model --path schema.avsc > my-models.py

Serialize data with schema

We can serialize the data with schemas either in avro or avro-json, for example:

Event
{'name': 'bond', 'age': 50, 'pets': ['dog', 'cat'], 'accounts': {'key': 1}, 'has_car': False, 'favorite_colors': 'BLUE', 'country': 'Argentina', 'address': None, 'md5': b'u00ffffffffffffx'}
dc-avro serialize "{'name': 'bond', 'age': 50, 'pets': ['dog', 'cat'], 'accounts': {'key': 1}, 'has_car': False, 'favorite_colors': 'BLUE', 'country': 'Argentina', 'address': None, 'md5': b'u00ffffffffffffx'}" --path ./tests/schemas/example.avsc

b'\x08bondd\x04\x06dog\x06cat\x00\x02\x06key\x02\x00\x00\x00\x12Argentina\x00u00ffffffffffffx'
dc-avro serialize "{'name': 'bond', 'age': 50, 'pets': ['dog', 'cat'], 'accounts': {'key': 1}, 'has_car': False, 'favorite_colors': 'BLUE', 'country': 'Argentina', 'address': None, 'md5': b'u00ffffffffffffx'}" --path ./tests/schemas/example.avsc --serialization-type avro-json

b'{"name": "bond", "age": 50, "pets": ["dog", "cat"], "accounts": {"key": 1}, "favorite_colors": "BLUE", "has_car": false, "country":
"Argentina", "address": null, "md5": "u00ffffffffffffx"}'

Note

The data provided to the command must be wrapped in quotes as it is interpreted as a string and then converted to a python dict

Deserialize data with schema

We can deserialize the data with schemas either in avro or avro-json, for example:

dc-avro deserialize 'b"\x08bondd\x04\x06dog\x06cat\x00\x02\x06key\x02\x00\x00\x00\x12Argentina\x00u00ffffffffffffx"' --path ./tests/schemas/example.avsc

{
    'name': 'bond',
    'age': 50,
    'pets': ['dog', 'cat'],
    'accounts': {'key': 1},
    'favorite_colors': 'BLUE',
    'has_car': False,
    'country': 'Argentina',
    'address': None,
    'md5': b'u00ffffffffffffx'
}
dc-avro deserialize '{"name": "bond", "age": 50, "pets": ["dog", "cat"], "accounts": {"key": 1}, "favorite_colors": "BLUE", "has_car": false, "country":  "Argentina", "address": null, "md5": "u00ffffffffffffx"}' --path ./tests/schemas/example.avsc --serialization-type avro-json

{
    'name': 'bond',
    'age': 50,
    'pets': ['dog', 'cat'],
    'accounts': {'key': 1},
    'favorite_colors': 'BLUE',
    'has_car': False,
    'country': 'Argentina',
    'address': None,
    'md5': b'u00ffffffffffffx'
}

Note

For avro deserialization you have to include the character b in the string to indicate that the actual value is bytes

View diff between schemas

Sometimes it is useful to see the difference between avsc files, specially for the avro schema evolution. You need to specify the source and target schema. Both of them can be using the path or url

Example:

The v1 schema version is in the schema registry:

{
  "type": "record",
  "name": "UserAdvance",
  "fields": [
    {
      "name": "name",
      "type": "string"
    },
    {
      "name": "age",
      "type": "long"
    },
    {
      "name": "pets",
      "type": {
        "type": "array",
        "items": "string",
        "name": "pet"
      }
    },
    {
      "name": "accounts",
      "type": {
        "type": "map",
        "values": "long",
        "name": "account"
      }
    },
    {
      "name": "favorite_colors",
      "type": {
        "type": "enum",
        "name": "FavoriteColor",
        "symbols": [
          "BLUE",
          "YELLOW",
          "GREEN"
        ]
      }
    },
    {
      "name": "has_car",
      "type": "boolean",
      "default": false
    },
    {
      "name": "country",
      "type": "string",
      "default": "Argentina"
    },
    {
      "name": "address",
      "type": [
        "null",
        "string"
      ],
      "default": null
    },
    {
      "name": "md5",
      "type": {
        "type": "fixed",
        "name": "md5",
        "size": 16
      }
    }
  ]
}

Then a PR has been opened with the UserAdvance v2:

{
  "type": "record",
  "name": "UserAdvance",
  "fields": [
    {
      "name": "name",
      "type": "string"
    },
    {
      "name": "age",
      "type": "long"
    },
    {
      "name": "pets",
      "type": {
        "type": "array",
        "items": "string",
        "name": "pet"
      }
    },
    {
      "name": "accounts",
      "type": {
        "type": "map",
        "values": "long",
        "name": "account"
      }
    },
    {
      "name": "favorite_colors",
      "type": {
        "type": "enum",
        "name": "FavoriteColor",
        "symbols": [
          "BLUE",
          "YELLOW",
          "GREEN"
        ]
      }
    },
    {
      "name": "has_car",
      "type": "boolean",
      "default": false
    },
    {
      "name": "country",
      "type": "string",
      "default": "Netherlands"
    },
    {
      "name": "address",
      "type": [
        "null",
        "string"
      ],
      "default": null
    }
  ]
}
  • We can see that the default value for country has been updated from Argentina to Netherlads
  • The field md5 has been removed

If we run the schema-diff command we have the following result:

dc-avro schema-diff --source-path ./tests/schemas/example.avsc --target-path  ./tests/schemas/example_v2.avsc

{
    'values_changed': {"root['fields'][6]['default']": {'new_value': 'Netherlands', 'old_value': 'Argentina'}},
    'iterable_item_removed': {"root['fields'][8]": {'name': 'md5', 'type': {'type': 'fixed', 'name': 'md5', 'size': 16}}}
}

Generate fake data from schema

Generate one sample from a given schema:

dc-avro generate-data ./tests/schemas/example.avsc

To generate many fake data, add the count parameter:

dc-avro generate-data ./tests/schemas/example.avsc --count 3

Keep in mind that you can provide a filepath or a url

Help:

$ dc-avro generate-data --help

Usage: dc-avro generate-data [OPTIONS] [RESOURCE]

 Generate fake data for a given avsc schema

╭─ Arguments ─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
│   resource      [RESOURCE]  Path or URL to the avro schema [default: None]                                                              │
╰─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
╭─ Options ───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
│ --count        INTEGER  Number of data to generate, more than one prints a list [default: 1]                                            │
│ --help                  Show this message and exit.                                                                                     │
╰─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯