EnverOsmanov comments

Results 13 comments of


                                            EnverOsmanov

GC overhead while ingesting big excel files

@amazoyer , you could try to set "maxRowsInMemory", it is more gentle to Garbage Collector. I think it happens when the file is not only have a lot of rows,...

Error reading xlsx file (MIN_INFLATE_RATIO exceeded)

Interesting.. Have you tried to set this ratio? ```scala ZipSecureFile.setMinInflateRatio(0) adv2 = spark.read .format("com.crealytics.spark.excel") .option("header", "true") .option("inferSchema", "true") .load("test.xlsx") ``` [1](https://stackoverflow.com/questions/44897500/using-apache-poi-zip-bomb-detected)

Error reading xlsx file (MIN_INFLATE_RATIO exceeded)

Also you can try to use streaming version of reader by adding "maxRowsInMemory" option.

Logs with Warning for Currency format

I was testing today the new version. Here are the issues I found and small notes: 1) The latest version (3.1.2_0.15.0) doesn't shade apache POI. Probably because spark-excel has version...

Logs with Warning for Currency format

(I will leave my note here, maybe it would be useful). I did try 3.1.2_0.15.1 (#465 ), but my program still failed. java.lang.NoClassDefFoundError: shadeio/poi/schemas/vmldrawing/XmlDocument ``` java.lang.NoClassDefFoundError: shadeio/poi/schemas/vmldrawing/XmlDocument at shadeio.poi.xssf.usermodel.XSSFVMLDrawing.read(XSSFVMLDrawing.java:132) at...

AppSystemDict: Error in Init() of interface 'RenderDeviceMgr001'!

I have this issue with Wayland. And no issues with X11. Other: Ubuntu 23.10 nvidia 545 UPD. Also there are following issues with Wayland: Gnome settings shows "Software Rendering" in...

Use takeWhile method from Range

If `coldInd` would be unsorted `Seq[Int]` and some columns would be missing, it should break test cases. Benchmarks: .iterator.filter => 31 seconds .view.filter => 2 minutes [Here](https://github.com/crealytics/spark-excel/compare/fix/big-slow-files-v2...debug-enver?expand=1) is the code...

Use takeWhile method from Range

The alternative approach to avoid iteration over full `colInd` is in [API V1](https://github.com/crealytics/spark-excel/blob/eb21b38bcb11d681f2847369699ab97f7cf30c34/src/main/scala/com/crealytics/spark/excel/DataLocator.scala#L109): ``` .map(_.cellIterator().asScala.filter(c => colInd.contains(c.getColumnIndex)).toVector) ``` But I'm not exactly sure what was the idea behind the change...

Use takeWhile method from Range

If `colInd` would be unsorted `Seq[Int]` we should sort it once. Otherwise we would be filtering on full collection for each row.

Sbt assembly taking too long, sbt 1.0.4, scalaVersion 2.11.8

Very similar issue already exists - https://github.com/sbt/sbt-assembly/issues/68