google-cloud-python icon indicating copy to clipboard operation
google-cloud-python copied to clipboard

append_rows call is slow it take between 1-2 secs avg time

Open slice-amandata opened this issue 2 years ago • 1 comments

append_rows call is slow it take between 1-2 secs avg time . sample code i'm using

   ```
 for row in [data]:
            # print(row,type(row))
            message = self._get_proto_message(table_id)
            # unknown_fields = ParseDict(row, message).unknown_fields
            # print(unknown_fields)
            for field_name, value in row.items():
                # print(field_name)
                # print(type(value))
                # if field_name in unknown_fields :
                #     continue
                if field_name == "createdAt":
                    timestamp_format = "%Y-%m-%dT%H:%M:%S.%fZ"
                    corrected_value = value.strip("'")  # Remove single quotes
                    timestamp_datetime = datetime.strptime(
                        corrected_value, timestamp_format
                    )

                    # Convert datetime to Epoch timestamp in microseconds
                    timestamp_microseconds = int(
                        timestamp_datetime.timestamp() * 1e6
                    )

                    setattr(message, field_name, timestamp_microseconds)
                else:
                    if field_name in ["firstTimeFilter"]:
                        if value == "True":
                            setattr(message, field_name, True)
                        else:
                            setattr(message, field_name, False)
                    elif field_name.startswith("u_"):
                        # message[field_name] = []
                        repeated_field = getattr(message, field_name, None)
                        for x in value:
                            if repeated_field is not None:
                                repeated_field.extend(x)
                    else:
                        setattr(message, field_name, value)

            serialized_rows.append(message.SerializeToString())
        stream_name = self.write_stream.name
     

        proto_schema = ProtoSchema()
        proto_descriptor = descriptor_pb2.DescriptorProto()
        self._copy_proto_descriptor(proto_descriptor, table_id)
        proto_schema.proto_descriptor = proto_descriptor
        proto_data = AppendRowsRequest.ProtoData()
        proto_data.writer_schema = proto_schema

        request = AppendRowsRequest()
        proto_rows = ProtoRows()
        proto_rows.serialized_rows = serialized_rows
        proto_data = AppendRowsRequest.ProtoData()
        proto_data.rows = proto_rows
        proto_data.writer_schema = proto_schema
        request.proto_rows = proto_data
        request.write_stream = stream_name

        start = time.time()

        await self.write_client.append_rows(iter([request]))
        print(""" V1 {} """.format(time.time() - start))

slice-amandata avatar Dec 06 '23 17:12 slice-amandata

This issue was transferred from python-bigquery-storage to google-cloud-python as part of the work for https://github.com/googleapis/google-cloud-python/issues/10991.

parthea avatar Aug 22 '25 11:08 parthea