ImportExcel icon indicating copy to clipboard operation
ImportExcel copied to clipboard

Performance Issues large XLSX Files

Open 6v4tq8mhlO23a opened this issue 8 months ago • 1 comments

Hey,

I've encountered a significant performance issue with the Import-Excel function when processing large .xlsx files. The problem lies in the way rows are added to $xlbook["$targetSheetname"] using the += operator.

My example test file:

Image

A low effort fix that, I did implement locally for me, would probably be to create a Generic List Object, add the Rows to that and after all rows are done add them to $xlbook to decrease the number of deletion and recreates of $xlbook.

$TempList_GenericList = New-Object System.Collections.Generic.List[System.Object]

...
...
...

foreach ($P in $PropertyNames) {

            $newRow.Add(($P.Value),($sheet.Cells[$R, $P.Column].Value))

        }
    }
    # Add to a generic list first to minimize the number of additions to $xlbook,
    # as frequent direct additions do not scale well and can degrade performance.

    $TempList_GenericList.Add([PSCustomObject]$NewRow)

    }

$xlBook["$targetSheetname"] += $TempList_GenericList.ToArray()
#endregion
}

I measured the performance afterwards and got the following results:

Without change:

20k lines

Seconds : 5 Milliseconds : 621

40k lines

Seconds : 19 Milliseconds : 282

60k lines

Seconds : 41 Milliseconds : 205

80k lines

Minutes : 1 Seconds : 22 Milliseconds : 897

With the change:

20k lines

Seconds : 6 Milliseconds : 215

40k lines

Seconds : 8 Milliseconds : 429

60k lines

Seconds : 12 Milliseconds : 747

80k lines

Seconds : 16 Milliseconds : 541

I did try to create a pull requests but the automated test failed and I couldn't really figure out why. Maybe someone has a proper way to fix this issue.

Thanks!

6v4tq8mhlO23a avatar May 16 '25 21:05 6v4tq8mhlO23a

The results are volatile but overall the list is more efficient than the array. I didn't make a chart based on the measurement results, everything is clear anyway. A common note - stop using += operator for arrays and strings.

Test results

# cols:1350 / rows:35
Series     List    Array Improve
------     ----    ----- -------
     1 507,9442 748,8621 32,17%
     2 476,2543 586,0759 18,74%
     3 539,2738 542,3491 0,57%  
     4 542,9553 554,3554 2,06%  
     5 523,3862 551,5522 5,11%  
     6   537,15 564,7404 4,89%  
     7 523,6423 542,2952 3,44%  
     8  535,031 543,2503 1,51%  
     9 552,4126 680,9985 18,88% 
    10 497,4945 562,1684 11,50% 

---------------------
Average (cols:1350 / rows:35)
   Array : 587,66475 ms
    List : 523,55442 ms

Boost (List VS Array) : 10,91%

# cols:1350 / rows:5
Series     List    Array Improve
------     ----    ----- -------
     1 241,2374 300,3785 19,69%
     2 229,5275 237,4511 3,34%
     3 190,7072 258,0878 26,11% 
     4 189,8088 248,0929 23,49% 
     5 190,5102 248,3713 23,30% 
     6 188,5891 252,7286 25,38% 
     7 202,7182 249,8699 18,87% 
     8 198,5839 247,8714 19,88% 
     9 215,5303  304,873 29,30% 
    10 202,3222 238,3871 15,13% 

---------------------
Average (cols:1350 / rows:5)
   Array : 258,61116 ms
    List : 204,95348 ms

Boost (List VS Array) : 20,75%

# cols:3500 / rows:5
Series     List    Array Improve
------     ----    ----- -------
     1 323,3446 766,4272 57,81%
     2 295,4586 685,7955 56,92%
     3 303,3124  692,016 56,17% 
     4 311,2255  693,439 55,12% 
     5 297,6776 688,6137 56,77% 
     6 301,0413 686,1947 56,13% 
     7 296,4555 691,0341 57,10% 
     8 394,5347 689,1016 42,75% 
     9 352,7458 716,6641 50,78% 
    10 307,4767 708,9691 56,63% 

---------------------
Average (cols:3500 / rows:5)
   Array : 701,8255 ms
    List : 318,32727 ms

Boost (List VS Array) : 54,64%

# cols:3500 / rows:35
Series      List     Array Improve
------      ----     ----- -------
     1 1086,5057 1580,0172 31,23%
     2 1180,6539 1412,6752 16,42%
     3   1206,12 1398,5028 13,76% 
     4 1163,7545   1412,23 17,59% 
     5 1162,8817 1407,8096 17,40% 
     6 1167,5634 1402,7096 16,76% 
     7 1181,2987  1422,231 16,94% 
     8 1183,6401 1396,5472 15,25% 
     9 1094,2327 1441,6657 24,10% 
    10 1074,0799 1485,3096 27,69% 

---------------------
Average (cols:3500 / rows:35)
   Array : 1435,96979 ms
    List : 1150,07306 ms

Boost (List VS Array) : 19,91%

# cols:350 / rows:35
Series     List    Array Improve
------     ----    ----- -------
     1 304,2844 343,7668 11,49%
     2 235,5216 311,7102 24,44%
     3 223,2488 230,5379 3,16%  
     4 296,8589  214,359 -38,49%
     5 215,6153  309,579 30,35% 
     6 220,3707 229,7253 4,07%  
     7 305,9634 225,1877 -35,87%
     8 229,7927 312,3741 26,44% 
     9 273,2802 289,5258 5,61%  
    10 240,1694 249,0166 3,55%  

---------------------
Average (cols:350 / rows:35)
   Array : 271,57824 ms
    List : 254,51054 ms

Boost (List VS Array) : 6,28%

My test workshop

Import-Excel - modified function Import-Excel1 - standard function

Import-Excel1:

line 148 $xlBook["$targetSheetname"] = @() line 238 $xlBook["$targetSheetname"] += [PSCustomObject]$NewRow - This is an array operation performance issue.

Import-Excel:

$xlBook["$targetSheetname"] = [System.Collections.Generic.List[object]]::new() $xlBook["$targetSheetname"].Add([PSCustomObject]$NewRow)

Script

$series = 10
$rows   = 35
$cols   = 350
$rebuild = 1
if ($rebuild) {
    Write-Host 'Generating test file...'
	$xlsx = 'test-Import-Excel.xlsx'
	if (Test-Path $xlsx) {Remove-Item $xlsx}
	$properties = (1..$rows).foreach{"Column$_"}
	(1..$cols).foreach{
		$row = '' | Select-Object $properties
		$properties.foreach{
			$row.$_ = Get-Random
		}
		$row
	} | Export-Excel -Path $xlsx
}

Write-Host 'Starting test...'
$refdata = {Import-Excel $xlsx} # using List
$testdata = {Import-Excel1 $xlsx} # using Array
$reflabel = 'List'
$testlabel = 'Array'
$sb = {
    param ($count)
    if ($count -lt 1) {$count = 1}
	$refseries  = [System.Collections.Generic.List[object]]::new()
	$testseries = [System.Collections.Generic.List[object]]::new()
	write-host "$count series"
	1..$count | ForEach-Object {
	  $row = [pscustomobject]@{
		Series     = $_
		$reflabel  = ''
		$testlabel = ''
		Improve    = ''
	  }
  
	  Start-Sleep -Milliseconds 100
	  $refms = (Measure-Command $refdata).TotalMilliseconds
	  $refseries.add($refms)
	  $row.$reflabel = $refms
  
	  Start-Sleep -Milliseconds 100
	  $testms = (Measure-Command $testdata).TotalMilliseconds
	  $testseries.add($testms)
	  $row.$testlabel = $testms

	  $row.Improve = '{0:P}' -f ( 1 - $refms/$testms ) 

	  $row
	}
  
	write-host "`n---------------------`nAverage (cols:$cols / rows:$rows)"
	$refavg = ($refseries | Measure-Object -Average).Average
	$testavg = ($testseries | Measure-Object -Average).Average
	write-host ('  {0,6} : {1} ms' -f $testlabel,$testavg)
	write-host ('  {0,6} : {1} ms' -f $reflabel,$refavg)
	write-host
	write-host ("Boost ($reflabel VS $testlabel) : {0:P}" -f (1 - $refavg/$testavg))
}
&$sb $series

scriptingstudio avatar May 17 '25 12:05 scriptingstudio

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

stale[bot] avatar Jun 27 '25 02:06 stale[bot]