clock is not working
The groupByInterval is not working with clock.
Code clock = clocks.uniform(sqlContext, frequency="1day", offset="0ns", begin_date_time="2016-01-01", end_date_time="2017-01-01") intervalized = flint_ShipMode.groupByInterval(clock)
print(type(ShipMode)) ShipMode.printSchema() ShipMode.count()
class 'pyspark.sql.dataframe.DataFrame'> root |-- uid: string (nullable = true) |-- time_in_ms: string (nullable = true) |-- datetimestamp: timestamp (nullable = true) |-- found: float (nullable = true) |-- name: string (nullable = true)
Out[19]: 18070796
Error: java.lang.NoSuchMethodError: org.apache.spark.sql.catalyst.expressions.codegen.ExprCode.value()Ljava/lang/String;
Hi Jonatan, we're working on getting this corrected. In general, we're seeing some incompatibilities with Clocks.
Hi @kevrasm, Any new development with respect to clock? I am using 5.2ML Beta (Spark 2.4/Scala 2.11) and I am seeing very strange behaviors when trying to use flint_0_6_0_databricks.jar.
clocks.uniform gives a correct output when specifying a frequency in ms instead of s (there is a 1000 factor mistake) / ex: I want a one day interval and I use: clock_day = clocks.uniform(sqlContext, '86400ms') instead of clock_day = clocks.uniform(sqlContext, '86400s')
When trying to use summarizeIntervals with the correct clock, the aggregate result is calculated in an interval that is shifted of more than 8h with respect to the one specified by the clock timeserie...
I tried to compensate for it with a clock offset without success...
@jonatan-klock @5mdd There was PR submitted to correct this just a couple days ago, should be merged soon.
@kevrasm Can you give us a status? How soon is soon in days, for example?
@5mdd The update has already been merged. There's a new jar under lib folder
Thanks @kevrasm ... why is the version of the jar the same as previous (0_6_0) ?