Small performance improvements to some schedulers
Small performance improvements to dpmsolver_multistep and euler_ancestral_discrete by avoiding redundant calculations.
Vectorized the loop in lms_discrete.
Some example benchmarks with code below:
DPMSolverMultistepScheduler
100 steps: before: 3.6375 sec after: 3,3583 sec
LMSDiscreteScheduler
100 steps: input shapes = [10,3,512,512] before: 4.6478 sec after: 4.5001 sec
100 steps with input shapes= [1,3,512,512] before: 0.4579 sec after: 0.3554 sec
import time
scheduler = LMSDiscreteScheduler()
model_output = torch.randn(torch.Size([10, 3, 512, 512]))
sample = torch.randn(torch.Size([10, 3, 512, 512]))
scheduler.set_timesteps(100)
avg = []
num_test = 10
for i in range(num_test):
start_time = time.perf_counter_ns()
for step in scheduler.timesteps:
scheduler.step(model_output, step, sample)
duration_ns = time.perf_counter_ns() - start_time
avg.append(duration_ns)
print("duration_ns", sum(avg) / num_test)
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint.
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint.
Uff a bit worried here. Not that small changes is the sequence of operations can lead to large changes in the output . The very high precision tests are only run for the slow tests.
Also not 100% sure how important this is as the majority of the time will always be taken by the model's forward pass no?
I'm sorry, but I'm not really in favor here :-/
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.
Please note that issues that do not follow the contributing guidelines are likely to be ignored.