Jeff Klukas

Results 33 issues of Jeff Klukas

Currently, AWScala modules specify aws-java-sdk components as transitive dependencies, with a global version set for the entire AWScala project. There are, however, many scenarios where a user might want to...

See https://github.com/mozilla/telemetry-airflow/pull/982#discussion_r423706798 We have multiple places now where we set a group of options together in order to ensure bigquery_etl_query replaces a whole table rather than just a partition. It's...

There's interest in developing some stronger safeguards around pipelines that feed publicly released data. Pipeline code generally does not live in this repository, but this repo does serve as a...

We should include at least some brief documentation about flush manager in https://mozilla.github.io/gcp-ingestion/ingestion-edge/

One known failure mode for ingestion-beam is rate limiting from the BQ API when we list datasets/tables in order to check whether destination tables exist. See https://mozilla-hub.atlassian.net/browse/DSRE-194 and https://github.com/mozilla/bigquery-backfill/pull/15 Currently,...

In `ParseUri` we don't normalize the doctype and namespace names. Instead, we rely on `SchemaStore` to normalize those attributes when looking up schemas. This means that the `metadata.uri.document_namespace` field in...

We are still seeing some instances of this message in the ContextualServicesSender Dataflow job in production ([example](https://console.cloud.google.com/logs/query;query=insertId%3D%228216686316375255748:267276:0:565727%22%20resource.type%3D%22dataflow_step%22%20resource.labels.job_id%3D%222021-07-15_09_54_23-16490022926510501865%22%20logName%3D%22projects%2Fmoz-fx-data-beam-prod-11f7%2Flogs%2Fdataflow.googleapis.com%252Fworker%22%20resource.labels.step_id%3D%2528%22WriteErrorOutput%2FBigQueryIO.Write%2FStreamingInserts%2FStreamingWriteTables%2FStreamingWrite%2FBatchedStreamingWrite.ViaBundleFinalization%2FParMultiDo%2528BatchAndInsertElements%2529%22%2529%20timestamp%20%3E%3D%20%222021-07-15T16:54:24.046Z%22%20severity%3E%3DDEFAULT;timeRange=2021-07-16T15:32:11.213Z%2F2021-07-16T15:32:11.213Z;cursorTimestamp=2021-07-16T15:32:11.212Z?authuser=0&project=moz-fx-data-beam-prod-11f7)): ``` "*~*~*~ Channel ManagedChannelImpl{logId=59, target=bigquerystorage.googleapis.com:443} was not shutdown properly!!! ~*~*~* Make sure to...

Impression-stats and other docTypes have a `release` top-level field that should be used in the pipeline as input to normalized_channel. Currently, they have null normalized_channel.

pipeline metadata

The GUD datasets currently include only the release version of Fenix and they ignore any data in the `org_mozilla_fenix_nightly_stable` dataset. We should probably build in that support. But it brings...

In discussion with @6a68, we've identified a subset of the FxA log data (amplitudeEvent messages) that we want to process in real time via Pub/Sub, but which don't need to...

pipeline metadata