databricks-accelerators icon indicating copy to clipboard operation
databricks-accelerators copied to clipboard

summarizers is not working

Open 5mdd opened this issue 7 years ago • 0 comments

Thanks @kevrasm for solving the clock issue. I tried to use the new jar but I am facing another issue with databricks 5.2 ML. After successfully creating a clock, I wanted to use a summarizer with the function summarizeIntervals but it failed with the following error:

/local_disk0/spark-34261885-5939-47e4-b37c-fc95545a6b47/userFiles-25527d91-086d-4a90-839f-09b97f09c196/addedFile5376141714691461041dbfs__FileStore_jars_785cdf36_8307_41eb_9f3d_a9d1a89ab416_flint_0_6_0_databricks-7358e.jar/ts/flint/dataframe.py in summarizeIntervals(self, clock, summarizer, key, inclusion, rounding) 1071 else: 1072 with traceback_utils.SCCallSiteSync(self._sc) as css: -> 1073 return self._summarizeIntervals_builtin(clock, summarizer, key, inclusion, rounding) 1074 1075 def _summarizeIntervals_udf(self, clock, columns,

/local_disk0/spark-34261885-5939-47e4-b37c-fc95545a6b47/userFiles-25527d91-086d-4a90-839f-09b97f09c196/addedFile5376141714691461041dbfs__FileStore_jars_785cdf36_8307_41eb_9f3d_a9d1a89ab416_flint_0_6_0_databricks-7358e.jar/ts/flint/dataframe.py in _summarizeIntervals_builtin(self, clock, summarizer, key, inclusion, rounding) 1093 scala_key, 1094 inclusion, -> 1095 rounding) 1096 1097 return TimeSeriesDataFrame._from_tsrdd(tsrdd, self.sql_ctx)

/databricks/spark/python/lib/py4j-0.10.7-src.zip/py4j/java_gateway.py in call(self, *args) 1255 answer = self.gateway_client.send_command(command) 1256 return_value = get_return_value( -> 1257 answer, self.gateway_client, self.target_id, self.name) 1258 1259 for temp_arg in temp_args:

/databricks/spark/python/pyspark/sql/utils.py in deco(*a, **kw) 61 def deco(*a, **kw): 62 try: ---> 63 return f(*a, **kw) 64 except py4j.protocol.Py4JJavaError as e: 65 s = e.java_exception.toString()

/databricks/spark/python/lib/py4j-0.10.7-src.zip/py4j/protocol.py in get_return_value(answer, gateway_client, target_id, name) 326 raise Py4JJavaError( 327 "An error occurred while calling {0}{1}{2}.\n". --> 328 format(target_id, ".", name), value) 329 else: 330 raise Py4JError(

Py4JJavaError: An error occurred while calling o557.summarizeIntervals. : java.lang.NoClassDefFoundError: Could not initialize class com.twosigma.flint.rdd.function.group.Intervalize$ at com.twosigma.flint.rdd.OrderedRDD.intervalize(OrderedRDD.scala:560) at com.twosigma.flint.timeseries.TimeSeriesRDDImpl.summarizeIntervals(TimeSeriesRDD.scala:1605) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244) at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:380) at py4j.Gateway.invoke(Gateway.java:295) at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132) at py4j.commands.CallCommand.execute(CallCommand.java:79) at py4j.GatewayConnection.run(GatewayConnection.java:251) at java.lang.Thread.run(Thread.java:748)

The same with the function groupByInterval. I tried to run the following example: https://github.com/twosigma/flint/tree/master/example without success. It failed at summarizers level: sp500_decayed_return = sp500_joined_return.summarizeWindows( window = windows.past_absolute_time('7day'), summarizer = summarizers.ewma('previous_day_return', alpha=0.5) )

What is so special about databricks that makes the two sigma version not compatible ?

5mdd avatar Feb 01 '19 08:02 5mdd