Performance Issues large XLSX Files
Hey,
I've encountered a significant performance issue with the Import-Excel function when processing large .xlsx files. The problem lies in the way rows are added to $xlbook["$targetSheetname"] using the += operator.
My example test file:
A low effort fix that, I did implement locally for me, would probably be to create a Generic List Object, add the Rows to that and after all rows are done add them to $xlbook to decrease the number of deletion and recreates of $xlbook.
$TempList_GenericList = New-Object System.Collections.Generic.List[System.Object]
...
...
...
foreach ($P in $PropertyNames) {
$newRow.Add(($P.Value),($sheet.Cells[$R, $P.Column].Value))
}
}
# Add to a generic list first to minimize the number of additions to $xlbook,
# as frequent direct additions do not scale well and can degrade performance.
$TempList_GenericList.Add([PSCustomObject]$NewRow)
}
$xlBook["$targetSheetname"] += $TempList_GenericList.ToArray()
#endregion
}
I measured the performance afterwards and got the following results:
Without change:
20k lines
Seconds : 5 Milliseconds : 621
40k lines
Seconds : 19 Milliseconds : 282
60k lines
Seconds : 41 Milliseconds : 205
80k lines
Minutes : 1 Seconds : 22 Milliseconds : 897
With the change:
20k lines
Seconds : 6 Milliseconds : 215
40k lines
Seconds : 8 Milliseconds : 429
60k lines
Seconds : 12 Milliseconds : 747
80k lines
Seconds : 16 Milliseconds : 541
I did try to create a pull requests but the automated test failed and I couldn't really figure out why. Maybe someone has a proper way to fix this issue.
Thanks!
The results are volatile but overall the list is more efficient than the array.
I didn't make a chart based on the measurement results, everything is clear anyway.
A common note - stop using += operator for arrays and strings.
Test results
# cols:1350 / rows:35
Series List Array Improve
------ ---- ----- -------
1 507,9442 748,8621 32,17%
2 476,2543 586,0759 18,74%
3 539,2738 542,3491 0,57%
4 542,9553 554,3554 2,06%
5 523,3862 551,5522 5,11%
6 537,15 564,7404 4,89%
7 523,6423 542,2952 3,44%
8 535,031 543,2503 1,51%
9 552,4126 680,9985 18,88%
10 497,4945 562,1684 11,50%
---------------------
Average (cols:1350 / rows:35)
Array : 587,66475 ms
List : 523,55442 ms
Boost (List VS Array) : 10,91%
# cols:1350 / rows:5
Series List Array Improve
------ ---- ----- -------
1 241,2374 300,3785 19,69%
2 229,5275 237,4511 3,34%
3 190,7072 258,0878 26,11%
4 189,8088 248,0929 23,49%
5 190,5102 248,3713 23,30%
6 188,5891 252,7286 25,38%
7 202,7182 249,8699 18,87%
8 198,5839 247,8714 19,88%
9 215,5303 304,873 29,30%
10 202,3222 238,3871 15,13%
---------------------
Average (cols:1350 / rows:5)
Array : 258,61116 ms
List : 204,95348 ms
Boost (List VS Array) : 20,75%
# cols:3500 / rows:5
Series List Array Improve
------ ---- ----- -------
1 323,3446 766,4272 57,81%
2 295,4586 685,7955 56,92%
3 303,3124 692,016 56,17%
4 311,2255 693,439 55,12%
5 297,6776 688,6137 56,77%
6 301,0413 686,1947 56,13%
7 296,4555 691,0341 57,10%
8 394,5347 689,1016 42,75%
9 352,7458 716,6641 50,78%
10 307,4767 708,9691 56,63%
---------------------
Average (cols:3500 / rows:5)
Array : 701,8255 ms
List : 318,32727 ms
Boost (List VS Array) : 54,64%
# cols:3500 / rows:35
Series List Array Improve
------ ---- ----- -------
1 1086,5057 1580,0172 31,23%
2 1180,6539 1412,6752 16,42%
3 1206,12 1398,5028 13,76%
4 1163,7545 1412,23 17,59%
5 1162,8817 1407,8096 17,40%
6 1167,5634 1402,7096 16,76%
7 1181,2987 1422,231 16,94%
8 1183,6401 1396,5472 15,25%
9 1094,2327 1441,6657 24,10%
10 1074,0799 1485,3096 27,69%
---------------------
Average (cols:3500 / rows:35)
Array : 1435,96979 ms
List : 1150,07306 ms
Boost (List VS Array) : 19,91%
# cols:350 / rows:35
Series List Array Improve
------ ---- ----- -------
1 304,2844 343,7668 11,49%
2 235,5216 311,7102 24,44%
3 223,2488 230,5379 3,16%
4 296,8589 214,359 -38,49%
5 215,6153 309,579 30,35%
6 220,3707 229,7253 4,07%
7 305,9634 225,1877 -35,87%
8 229,7927 312,3741 26,44%
9 273,2802 289,5258 5,61%
10 240,1694 249,0166 3,55%
---------------------
Average (cols:350 / rows:35)
Array : 271,57824 ms
List : 254,51054 ms
Boost (List VS Array) : 6,28%
My test workshop
Import-Excel - modified function
Import-Excel1 - standard function
Import-Excel1:
line 148
$xlBook["$targetSheetname"] = @()
line 238
$xlBook["$targetSheetname"] += [PSCustomObject]$NewRow - This is an array operation performance issue.
Import-Excel:
$xlBook["$targetSheetname"] = [System.Collections.Generic.List[object]]::new()
$xlBook["$targetSheetname"].Add([PSCustomObject]$NewRow)
Script
$series = 10
$rows = 35
$cols = 350
$rebuild = 1
if ($rebuild) {
Write-Host 'Generating test file...'
$xlsx = 'test-Import-Excel.xlsx'
if (Test-Path $xlsx) {Remove-Item $xlsx}
$properties = (1..$rows).foreach{"Column$_"}
(1..$cols).foreach{
$row = '' | Select-Object $properties
$properties.foreach{
$row.$_ = Get-Random
}
$row
} | Export-Excel -Path $xlsx
}
Write-Host 'Starting test...'
$refdata = {Import-Excel $xlsx} # using List
$testdata = {Import-Excel1 $xlsx} # using Array
$reflabel = 'List'
$testlabel = 'Array'
$sb = {
param ($count)
if ($count -lt 1) {$count = 1}
$refseries = [System.Collections.Generic.List[object]]::new()
$testseries = [System.Collections.Generic.List[object]]::new()
write-host "$count series"
1..$count | ForEach-Object {
$row = [pscustomobject]@{
Series = $_
$reflabel = ''
$testlabel = ''
Improve = ''
}
Start-Sleep -Milliseconds 100
$refms = (Measure-Command $refdata).TotalMilliseconds
$refseries.add($refms)
$row.$reflabel = $refms
Start-Sleep -Milliseconds 100
$testms = (Measure-Command $testdata).TotalMilliseconds
$testseries.add($testms)
$row.$testlabel = $testms
$row.Improve = '{0:P}' -f ( 1 - $refms/$testms )
$row
}
write-host "`n---------------------`nAverage (cols:$cols / rows:$rows)"
$refavg = ($refseries | Measure-Object -Average).Average
$testavg = ($testseries | Measure-Object -Average).Average
write-host (' {0,6} : {1} ms' -f $testlabel,$testavg)
write-host (' {0,6} : {1} ms' -f $reflabel,$refavg)
write-host
write-host ("Boost ($reflabel VS $testlabel) : {0:P}" -f (1 - $refavg/$testavg))
}
&$sb $series
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.