MIOpen icon indicating copy to clipboard operation
MIOpen copied to clipboard

Performance stage update (Resnet50)

Open alexandraBara opened this issue 2 years ago • 3 comments

Split Resnet50 file by data type and batchsize to parallelize the CI stage. Added try_catch for the k_time comparison to let CI pass when a stage fails (to be removed when a stable testing node is found)

alexandraBara avatar Apr 27 '23 15:04 alexandraBara

line 345 in [Jenkinsfile] still not fixed, was merged by another PR. each try catch block should have

catch (org.jenkinsci.plugins.workflow.steps.FlowInterruptedException e){
        echo "The job was cancelled or aborted"
        throw e
}

I removed the redundant line. The first try-catch for the wget is for when the test branch goes through the exercise the first time and there wont be any archived files. The code will just push current results to archive and compare on the next run. The second try-catch that results in SUCCESS is due to lack of stable testing system where one or more ktime always goes below the threshold. For now we have agreed to leave it in.

alexandraBara avatar May 31 '23 13:05 alexandraBara

Let's maybe disable this perf stage by default for now? I see several pipelines stuck in this stage: http://micimaster.amd.com/blue/organizations/jenkins/MLLibs%2FMIOpen/detail/PR-2163/3/pipeline

junliume avatar Jun 04 '23 06:06 junliume

@alexandraBara Can you please take a look and suggest what are the next steps here.

JehandadKhan avatar Oct 30 '23 15:10 JehandadKhan