avro Adjustment to the function that determines whether to import AvroTypeGen in the code to be generated

The sort.Search() function is not producing the desired result for the logic inside the shouldImportAvroTypeGen() function.

Assuming that the definitions slice contains two elements, the first being an element that requires the inclusion of the import and the second as an enumerator (which does not require the import). When iterating through the elements of the namespace.Definitions map, when the code is iterating over the element that needs the import, the sort.Search() function will internally try to iterate through the slice from back to front, since the last element of the slice is not the element that needs the import and since the comparison is being made with ==, the sort.Search() function will return a new index (which should be used for a future ordered insertion in the slice).

In short, I think that the sort package is not the best option for what is intended to be done in the shouldImportAvroTypeGen() function.

This tweak seems to be much simpler to understand and makes more sense for the purpose of the shouldImportAvroTypeGen() function.

Fix to the problem reported in this issue: https://github.com/heetch/avro/issues/130

Aug 31 '24 00:08 thiagodpf

Have you found a case where this actually goes wrong? From a very brief look at the code (away from my laptop!) it looks like parseFile sorts the definitions slice before returning it so it looks ok to use sort.Search to me.

Aug 31 '24 17:08 rogpeppe

Hi @rogpeppe ! Here is my .avsc

{
    "type": "record",
    "name": "BusinessModelChangeAudit",
    "namespace": "br.com.ifd.content",
    "fields": [
        {
            "name": "context",
            "type": [
                "null",
                "string"
            ],
            "default": null,
            "doc": "information about the execution of the transaction"
        },
        {
            "name": "created_at",
            "doc": "The time that this event has created (epoch)",
            "type": "long",
            "logicalType": "timestamp-millis"
        },
        {
            "name": "current_providers",
            "doc": "current providers related to the event",
            "type": {
                "type": "array",
                "items": "string"
            }
        },
        {
            "name": "previous_providers",
            "doc": "previous providers related to the event",
            "type": {
                "type": "array",
                "items": "string"
            }
        },
        {
            "name": "product_external_id",
            "type": "string",
            "doc": "identifier product salesforce"
        },
        {
            "name": "response_transmitted",
            "type": "boolean",
            "default": false,
            "doc": "indicates whether the return was reported to salesforce"
        },
        {
            "name": "status",
            "type": {
                "type": "enum",
                "name": "status",
                "symbols": [
                    "PENDING",
                    "SUCCESS",
                    "FAILURE",
                    "ABORTED",
                    "REJECTED",
                    "INTERMEDIATE"
                ]
            },
            "doc": "transaction status"
        },
        {
            "name": "step",
            "type": "string",
            "doc": "This field indicates at which step the intermediate status occurs during a process or workflow"
        },
        {
            "name": "transaction_id",
            "doc": "transaction unique identifier",
            "type": "string",
            "logicalType": "UUID"
        },
        {
            "name": "updated_at",
            "doc": "The time that this event has updated (epoch)",
            "type": "long",
            "logicalType": "timestamp-millis"
        }
    ]
}

and here is the generated code: (missing import "github.com/heetch/avro/avrotypegen")

// Code generated by avrogen. DO NOT EDIT.

package bmca

import (
	"fmt"
	"strconv"
)

type BusinessModelChangeAudit struct {
	// information about the execution of the transaction
	Context *string `json:"context"`

	// The time that this event has created (epoch)
	Created_at int64 `json:"created_at"`

	// current providers related to the event
	Current_providers []string `json:"current_providers"`

	// previous providers related to the event
	Previous_providers []string `json:"previous_providers"`

	// identifier product salesforce
	Product_external_id string `json:"product_external_id"`

	// indicates whether the return was reported to salesforce
	Response_transmitted bool `json:"response_transmitted"`

	// transaction status
	Status Status `json:"status"`

	// This field indicates at which step the intermediate status occurs during a process or workflow
	Step string `json:"step"`

	// transaction unique identifier
	Transaction_id string `json:"transaction_id"`

	// The time that this event has updated (epoch)
	Updated_at int64 `json:"updated_at"`
}

// AvroRecord implements the avro.AvroRecord interface.
func (BusinessModelChangeAudit) AvroRecord() avrotypegen.RecordInfo {
	return avrotypegen.RecordInfo{
		Schema: `{"fields":[{"default":null,"doc":"information about the execution of the transaction","name":"context","type":["null","string"]},{"doc":"The time that this event has created (epoch)","logicalType":"timestamp-millis","name":"created_at","type":"long"},{"doc":"current providers related to the event","name":"current_providers","type":{"items":"string","type":"array"}},{"doc":"previous providers related to the event","name":"previous_providers","type":{"items":"string","type":"array"}},{"doc":"identifier product sales force","name":"product_external_id","type":"string"},{"default":false,"doc":"indicates whether the return was reported to salesforce","name":"response_transmitted","type":"boolean"},{"doc":"transaction status","name":"status","type":{"name":"status","symbols":["PENDING","SUCCESS","FAILURE","ABORTED","REJECTED","INTERMEDIATE"],"type":"enum"}},{"doc":"This field indicates at which step the intermediate status occurs during a process or workflow","name":"step","type":"string"},{"doc":"transaction unique identifier","logicalType":"UUID","name":"transaction_id","type":"string"},{"doc":"The time that this event has updated (epoch)","logicalType":"timestamp-millis","name":"updated_at","type":"long"}],"name":"br.com.ifd.content.BusinessModelChangeAudit","type":"record"}`,
		Required: []bool{
			1: true,
			2: true,
			3: true,
			4: true,
			6: true,
			7: true,
			8: true,
			9: true,
		},
	}
}

type Status int

const (
	StatusPending Status = iota
	StatusSuccess
	StatusFailure
	StatusAborted
	StatusRejected
	StatusIntermediate
)

var _Status_strings = []string{
	"PENDING",
	"SUCCESS",
	"FAILURE",
	"ABORTED",
	"REJECTED",
	"INTERMEDIATE",
}

// String returns the textual representation of Status.
func (e Status) String() string {
	if e < 0 || int(e) >= len(_Status_strings) {
		return "Status(" + strconv.FormatInt(int64(e), 10) + ")"
	}
	return _Status_strings[e]
}

// MarshalText implements encoding.TextMarshaler
// by returning the textual representation of Status.
func (e Status) MarshalText() ([]byte, error) {
	if e < 0 || int(e) >= len(_Status_strings) {
		return nil, fmt.Errorf("Status value %d is out of bounds", e)
	}
	return []byte(_Status_strings[e]), nil
}

// UnmarshalText implements encoding.TextUnmarshaler
// by expecting the textual representation of Status.
func (e *Status) UnmarshalText(data []byte) error {
	// Note for future: this could be more efficient.
	for i, s := range _Status_strings {
		if string(data) == s {
			*e = Status(i)
			return nil
		}
	}
	return fmt.Errorf("unknown value %q for Status", data)
}

Sep 02 '24 11:09 thiagodpf

I believe the problem in the current algorithm is not related to the ordering of the definitions slice, but rather with the == comparator is being used within the sort.Search() function, when in fact (I believe) the comparator >= should be used if you want to keep using the sort.Search() function (although there would be side effects to think about :sweat_smile:).

I don't know if it will help, but below is a screenshot where I managed to reproduce the problem, notice the order of the elements within the namespace.Definitions map and the if condition (since the sort.Search() function did not find the index and returned the size of the slice):

Sep 02 '24 12:09 thiagodpf

I was reviewing the description I left in the pull request and I saw that I gave too much emphasis to the iteration through namespace.Definitions map, when in fact the problem is in the condition within the sort.Search() function (my bad :sweat_smile:) so I thought it best to edit it.

I think (just my opinion) that the sort.Search() function is not the most suitable for this case since it seems to be better used in "comparable" values (less than, greater than, etc...) but not with raw strings.

I even thought that the sort.SearchStrings() could be useful to us, but the problem is that when it doesn't find the element in the slice, it will return the index where the element should be inserted, but it's not what we want at that point...

Still it wouldn't be a big problem given the current if conditions later on... :thinking: I can redo the PR to use sort.SearchStrings() if you prefer, what do you think?

Sep 02 '24 13:09 thiagodpf