iceberg-go icon indicating copy to clipboard operation
iceberg-go copied to clipboard

v0.3.1: `NewAddSchemaUpdate` no longer takes `lastColumnID` → deleting highest-ID column triggers `invalid_metadata` (last-assigned-field-id regression)

Open anvsk opened this issue 5 months ago • 16 comments

Apache Iceberg version

main (development)

Please describe the bug 🐞

Environment

  • iceberg-go: v0.3.1
  • Catalog: REST catalog (standard Iceberg REST API)
  • Go: 1.21+
  • OS: Linux/Windows (reproducible across)

Summary

In v0.3.0, table.NewAddSchemaUpdate(*Schema, lastColumnID, initial) allowed callers to ensure the table’s last-assigned-field-id stayed monotonic even when deleting the column that previously had the highest field ID.

In v0.3.1, the API changed to table.NewAddSchemaUpdate(*Schema) (no lastColumnID). When we only delete the highest-ID column(s) and add no new columns, the library appears to derive last-assigned-field-id from the new schema’s max field id, which decreases. The REST catalog then rejects the commit with:

invalid_metadata: The specified metadata is not valid

Deleting a first/middle column works; deleting the tail (max-ID) column fails.

Note: This repro excludes partition/sort references (i.e., we are not deleting a column referenced by the default spec or sort order).

Steps to Reproduce

  1. Start with a table whose current schema has fields, e.g.:

    • a (id=1), b (id=2), c (id=3). No partition/sort references to c.
  2. Build a new schema that removes c and keeps a/b with the same field IDs (we don’t touch IDs).

  3. Submit two updates in one commit:

    • AddSchema using table.NewAddSchemaUpdate(newSchema)
    • SetCurrentSchema using table.NewSetCurrentSchemaUpdate(newSchemaID) (We obtain newSchemaID by running b := table.MetadataBuilderFromBase(meta); id, _ := b.AddSchema(newSchema).)
  4. Include concurrency requirements (optional but recommended):

    • AssertTableUUID(meta.UUID())
    • AssertLastAssignedFieldID(oldLastID) where oldLastID is 3 in this example.
  5. CommitTable(...)fails with invalid_metadata.

Minimal code sketch (v0.3.1 style):

meta := tbl.Metadata()
oldLast := highestID(tbl.Schema()) // returns 3 in the example

// Build new schema that keeps a(id=1), b(id=2) only (delete c(id=3))
newSchema := buildSchemaKeepAB(tbl.Schema()) // preserves existing IDs

// Precompute new schema-id
b := table.MetadataBuilderFromBase(meta)
newSchemaID, err := b.AddSchema(newSchema)
if err != nil { panic(err) }

// Prepare updates (v0.3.1 API)
add := table.NewAddSchemaUpdate(newSchema)           // no lastColumnID parameter anymore
set := table.NewSetCurrentSchemaUpdate(newSchemaID)

reqs := []table.Requirement{
    table.AssertTableUUID(meta.UUID()),
    table.AssertLastAssignedFieldID(int(oldLast)),   // oldLast == 3
}
_, _, err = cat.CommitTable(ctx, tbl, reqs, []table.Update{add, set})
// => invalid_metadata when only deleting tail/highest-ID columns

Expected Behavior

  • Deleting columns (including the highest-ID column) should be allowed as long as:

    • We do not change existing field IDs of the kept columns.
    • last-assigned-field-id does not decrease (i.e., remains the previous value).
  • In v0.3.0, passing lastColumnID=oldLast ensured monotonicity and commits succeeded.

Actual Behavior

  • With v0.3.1, NewAddSchemaUpdate cannot accept lastColumnID.
  • When we only delete the max-ID column and add no new columns, the commit is rejected with invalid_metadata—apparently because the derived last-assigned-field-id regresses to the new schema’s max ID.

Analysis

  • Iceberg requires last-assigned-field-id to be monotonic (never decreases).
  • In the “delete-tail-columns only” scenario, the current last-assigned-field-id is the old max (e.g., 3). The new schema’s max becomes smaller (e.g., 2). If the client or server infers the counter from the new schema’s max, it violates monotonicity → invalid_metadata.

Workarounds

  • Add a sentinel (dummy) column in the same update with ID = oldLast + 1 (e.g., __compat_padding_...), nullable, never used. This keeps the new schema’s max ≥ old max. Or, more practically, add a real new column in the same change so max ID increases.
  • (Less ideal) Maintain a fork that restores the older API (NewAddSchemaUpdate(schema, lastColumnID, initial)) or custom-craft the REST payload to set last-column-id = oldLast.
  • Of course still ensure you’re not deleting a field referenced by partition spec or sort order (not the case in this repro).

Proposal

  • API / behavior options:

    1. Re-introduce a way to set lastColumnID (or an equivalent parameter) on AddSchema in the Go client; or
    2. Have the client compute last-assigned-field-id as max(oldLastID, max(newSchema.FieldIDs)) so it never regresses; or
    3. Provide a dedicated update or requirement to explicitly set/preserve last-assigned-field-id without requiring a dummy column.
  • Docs: Clarify in v0.3.1 migration notes that callers must ensure the counter doesn’t regress when deleting the highest-ID column, and suggest recommended patterns.

Additional Context

  • The same flow succeeds if we delete a middle/first column (the new schema’s max ID stays the same).
  • The same flow succeeds if we add at least one new column (the new schema’s max ID increases).

Happy to provide a tiny repro program if needed. Thanks!

anvsk avatar Aug 20 '25 08:08 anvsk

Hi @anvsk, I think https://github.com/apache/iceberg-go/pull/539 fixes this, could you verify?

twuebi avatar Aug 21 '25 11:08 twuebi

Hi @anvsk, I think #539 fixes this, could you verify?

i try and failed again

The new code seems to add a validation/guard only. When the input schema only deletes the tail column(s) (i.e., the column(s) with the highest field ID) and we don’t add any new columns, the client now fails that new check immediately and aborts. In other words, the last-assigned-field-id “regression” is detected, but there’s still no way for the client to preserve the previous lastAssignedFieldId for this commit, so the update remains impossible.

Either:

  1. Re-introduce a way to pass or pin lastAssignedFieldId for AddSchema, or

  2. Have the client compute it as max(oldLastId, max(newSchema.FieldIDs)) so it never regresses, or

  3. Provide a dedicated update to explicitly set/preserve lastAssignedFieldId during schema evolution.

anvsk avatar Aug 22 '25 01:08 anvsk

Let me try and reproduce the issue, the current AddSchema function does a max between old and newSchema:

func (b *MetadataBuilder) AddSchema(schema *iceberg.Schema) (*MetadataBuilder, error) {
	newSchemaID := b.reuseOrCreateNewSchemaID(schema)

	if _, err := b.GetSchemaByID(newSchemaID); err == nil {
		if b.lastAddedSchemaID == nil || *b.lastAddedSchemaID != newSchemaID {
			b.updates = append(b.updates, NewAddSchemaUpdate(schema))
			b.lastAddedSchemaID = &newSchemaID
		}

		return b, nil
	}

	b.lastColumnId = max(b.lastColumnId, schema.HighestFieldID())

	schema.ID = newSchemaID

	b.schemaList = append(b.schemaList, schema)
	b.updates = append(b.updates, NewAddSchemaUpdate(schema))
	b.lastAddedSchemaID = &newSchemaID

	return b, nil
}

twuebi avatar Aug 22 '25 07:08 twuebi

This test is passing @anvsk:

func schema() iceberg.Schema {
	return *iceberg.NewSchema(
		0,
		iceberg.NestedField{ID: 1, Name: "x", Type: iceberg.PrimitiveTypes.Int64, Required: true},
		iceberg.NestedField{ID: 2, Name: "y", Type: iceberg.PrimitiveTypes.Int64, Required: true, Doc: "comment"},
		iceberg.NestedField{ID: 3, Name: "z", Type: iceberg.PrimitiveTypes.Int64, Required: true},
	)
}

func builderWithoutChanges(formatVersion int) MetadataBuilder {
	tableSchema := schema()
	partitionSpec := partitionSpec()
	sortOrder := sortOrder()

	builder, err := NewMetadataBuilder()
	if err != nil {
		panic(err)
	}
	_, err = builder.SetFormatVersion(formatVersion)
	if err != nil {
		panic(err)
	}
	_, err = builder.AddSortOrder(&sortOrder, true)
	if err != nil {
		panic(err)
	}
	_, err = builder.AddPartitionSpec(&partitionSpec, true)
	if err != nil {
		panic(err)
	}
	_, err = builder.AddSchema(&tableSchema)
	if err != nil {
		panic(err)
	}
	meta, err := builder.Build()
	if err != nil {
		panic(err)
	}
	builder, err = MetadataBuilderFromBase(meta)
	if err != nil {
		panic(err)
	}

	return *builder
}

func TestAddTwoSchemas(t *testing.T) {
	builder := builderWithoutChanges(2)
	schema2 := iceberg.NewSchema(
		0,
		iceberg.NestedField{ID: 1, Name: "x", Type: iceberg.PrimitiveTypes.Int64, Required: true},
		iceberg.NestedField{ID: 2, Name: "y", Type: iceberg.PrimitiveTypes.Int64, Required: true, Doc: "comment"},
	)
	_, err := builder.AddSchema(schema2)
	require.NoError(t, err)
	require.Len(t, builder.schemaList, 2)
	meta, err := builder.Build()
	require.NoError(t, err)
	require.NotNil(t, meta)
}

twuebi avatar Aug 22 '25 07:08 twuebi

Could you provide a complete reproducer for current main?

twuebi avatar Aug 22 '25 07:08 twuebi

This test I hacked into table/transaction_test.go is also passing:

func (s *SparkIntegrationTestSuite) TestAddSchemaRegression() {
	icebergSchema := iceberg.NewSchema(0,
		iceberg.NestedField{ID: 1, Name: "foo", Type: iceberg.PrimitiveTypes.Bool},
		iceberg.NestedField{ID: 2, Name: "bar", Type: iceberg.PrimitiveTypes.String},
		iceberg.NestedField{ID: 3, Name: "baz", Type: iceberg.PrimitiveTypes.Int32},
	)

	tbl, err := s.cat.CreateTable(s.ctx, catalog.ToIdentifier("default", "go_test_set_properties"), icebergSchema)
	s.Require().NoError(err)
	meta := tbl.Metadata()
	oldLast := tbl.Schema().HighestFieldID()

	icebergSchema2 := iceberg.NewSchema(0,
		iceberg.NestedField{ID: 1, Name: "foo", Type: iceberg.PrimitiveTypes.Bool},
		iceberg.NestedField{ID: 2, Name: "bar", Type: iceberg.PrimitiveTypes.String},
	)

	add := table.NewAddSchemaUpdate(icebergSchema2)
	set := table.NewSetCurrentSchemaUpdate(-1)
	reqs := []table.Requirement{
		table.AssertTableUUID(meta.TableUUID()),
		table.AssertLastAssignedFieldID(oldLast),
	}

	_, _, err = s.cat.CommitTable(s.ctx, tbl, reqs, []table.Update{add, set})
	s.Require().NoError(err)
}

twuebi avatar Aug 22 '25 08:08 twuebi

This test I hacked into table/transaction_test.go is also passing:

func (s *SparkIntegrationTestSuite) TestAddSchemaRegression() { icebergSchema := iceberg.NewSchema(0, iceberg.NestedField{ID: 1, Name: "foo", Type: iceberg.PrimitiveTypes.Bool}, iceberg.NestedField{ID: 2, Name: "bar", Type: iceberg.PrimitiveTypes.String}, iceberg.NestedField{ID: 3, Name: "baz", Type: iceberg.PrimitiveTypes.Int32}, )

tbl, err := s.cat.CreateTable(s.ctx, catalog.ToIdentifier("default", "go_test_set_properties"), icebergSchema) s.Require().NoError(err) meta := tbl.Metadata() oldLast := tbl.Schema().HighestFieldID()

icebergSchema2 := iceberg.NewSchema(0, iceberg.NestedField{ID: 1, Name: "foo", Type: iceberg.PrimitiveTypes.Bool}, iceberg.NestedField{ID: 2, Name: "bar", Type: iceberg.PrimitiveTypes.String}, )

add := table.NewAddSchemaUpdate(icebergSchema2) set := table.NewSetCurrentSchemaUpdate(-1) reqs := []table.Requirement{ table.AssertTableUUID(meta.TableUUID()), table.AssertLastAssignedFieldID(oldLast), }

_, _, err = s.cat.CommitTable(s.ctx, tbl, reqs, []table.Update{add, set}) s.Require().NoError(err) }

Image thank you ! i debug my code , but not go into your code func , i am using the catalog of rest , it just do a post request,so it lost the last column id set , and return "invalid_metadata",so i think table.addSchemaUpdate should add field lastColumnID

anvsk avatar Aug 25 '25 07:08 anvsk

Not sure I follow - does the server reject the commit? What catalog are you using?

twuebi avatar Aug 25 '25 08:08 twuebi

Not sure I follow - does the server reject the commit? What catalog are you using?

rest catalog,the request payload without lastcolumnid

anvsk avatar Aug 25 '25 09:08 anvsk

Which part of your setup is rejecting the request? The client-side rest catalog implementation of iceberg-go? The rest catalog server you are using?

The last-column-id field as part of table update is deprecated and marked for removal: https://editor-next.swagger.io/?url=https://raw.githubusercontent.com/apache/iceberg/main/open-api/rest-catalog-open-api.yaml

Image

twuebi avatar Aug 25 '25 09:08 twuebi

Which part of your setup is rejecting the request? The client-side rest catalog implementation of iceberg-go? The rest catalog server you are using?

The last-column-id field as part of table update is deprecated and marked for removal: https://editor-next.swagger.io/?url=https://raw.githubusercontent.com/apache/iceberg/main/open-api/rest-catalog-open-api.yaml

Image

i use the rest catalog server https://s3tables.ap-southeast-1.amazonaws.com/iceberg so ,this problem should be solved by the server side

anvsk avatar Aug 25 '25 14:08 anvsk

s3tables is returning bad-request when iceberg-go is not sending last-column-id?

twuebi avatar Aug 26 '25 09:08 twuebi

s3tables is returning bad-request when iceberg-go is not sending last-column-id?

yes,return the error "invalid_metadata: The specified metadata is not valid"

anvsk avatar Aug 27 '25 05:08 anvsk

The error looks like a go error, could you set a breakpoint in doPost and get the raw response, e.g. via:

diff --git a/catalog/rest/rest.go b/catalog/rest/rest.go
index 63c7389..7683757 100644
--- a/catalog/rest/rest.go
+++ b/catalog/rest/rest.go
@@ -322,7 +322,8 @@ func doPost[Payload, Result any](ctx context.Context, baseURI *url.URL, path []s
        if rsp.StatusCode != http.StatusOK {
                return ret, handleNon200(rsp, override)
        }
-
+       content, err := io.ReadAll(rsp.Body)
+       fmt.Println(rsp.StatusCode, string(content))
        if rsp.ContentLength == 0 {
                return
        }

twuebi avatar Aug 27 '25 10:08 twuebi

The error looks like a go error, could you set a breakpoint in doPost and get the raw response, e.g. via:

diff --git a/catalog/rest/rest.go b/catalog/rest/rest.go index 63c7389..7683757 100644 --- a/catalog/rest/rest.go +++ b/catalog/rest/rest.go @@ -322,7 +322,8 @@ func doPost[Payload, Result any](ctx context.Context, baseURI *url.URL, path []s if rsp.StatusCode != http.StatusOK { return ret, handleNon200(rsp, override) }

  •   content, err := io.ReadAll(rsp.Body)
    
  •   fmt.Println(rsp.StatusCode, string(content))
      if rsp.ContentLength == 0 {
              return
      }
    
Image change the last column ,this is success ,but when i delete the last column :

Image

anvsk avatar Aug 28 '25 03:08 anvsk

That's strange, the catalog receives only a list of updates & requirements, not sure why it would return The specified metadata is not valid instead of rejecting the update, given that s3tables is probably used a bunch, we should be compatible to them. I guess we should re-introduce last-column-id as optional.

twuebi avatar Aug 28 '25 17:08 twuebi

That's strange, the catalog receives only a list of updates & requirements, not sure why it would return The specified metadata is not valid instead of rejecting the update, given that s3tables is probably used a bunch, we should be compatible to them. I guess we should re-introduce last-column-id as optional.

so,can you support last-column-id as optional

anvsk avatar Nov 17 '25 12:11 anvsk

I'm not sure what s3tables' iceberg implementation is based on - if it is based on iceberg-java, then the reason for the error is likely this: https://github.com/apache/iceberg/issues/13850 with a proposed fix of: https://github.com/apache/iceberg/pull/14151

I'm not opposed to re-adding the parameter as optional until this is sorted out in iceberg-java, you could create a PR to this end. The proper solution is to fix this in iceberg-java.

twuebi avatar Nov 17 '25 12:11 twuebi