airflow icon indicating copy to clipboard operation
airflow copied to clipboard

AIP-84 Migrate /object/grid_data from views to FastAPI

Open bugraoz93 opened this issue 1 year ago • 17 comments

closes: #42595


^ Add meaningful description above Read the Pull Request Guidelines for more information. In case of fundamental code changes, an Airflow Improvement Proposal (AIP) is needed. In case of a new dependency, check compliance with the ASF 3rd Party License Policy. In case of backwards incompatible changes please leave a note in a newsfragment file, named {pr_number}.significant.rst or {issue_number}.significant.rst, in newsfragments.

bugraoz93 avatar Nov 25 '24 03:11 bugraoz93

I addressed the comments. Thanks for the review! Some parts turned out to be more complex than I imagined. It still needs small touches and should be ready with unit tests soon. :sweat_smile:

bugraoz93 avatar Nov 27 '24 02:11 bugraoz93

@pierrejeambrun this is ready for review. I removed the WIP earlier but forgot to ping you again :sweat_smile: Please when you have time, thanks!

bugraoz93 avatar Dec 01 '24 10:12 bugraoz93

Great.

Looking good, just a few suggestions / improvement on the code, but nothing blocking.

I'll also let Brent double check that the interface corresponds to the UI specifications.

That sounds amazing! Let me know if I have missed anything, and we can fix it at an early stage. I will review the comments and adjust the code accordingly in the coming hours. Many thanks for the review and the comments!

bugraoz93 avatar Dec 03 '24 16:12 bugraoz93

Great work! This will power so much of the new UI!

In the old UI we also included the note for both dag runs and task instances. We should still include it for the new UI.

I was writing my message and saw your review now :slightly_smiling_face: Thanks for the review, Brent!

bugraoz93 avatar Dec 03 '24 16:12 bugraoz93

Can you also plug the new common filters include_upstream and include_downstream in structure_data (structure.py)

pierrejeambrun avatar Dec 05 '24 10:12 pierrejeambrun

Can you also plug the new common filters include_upstream and include_downstream in structure_data (structure.py)

I already included this in the previous commit, updating that endpoint params according to changes on those command upstream|downstream variables :)

bugraoz93 avatar Dec 08 '24 21:12 bugraoz93

After more manual testing. It looks like we have a few issues.

Using the example_task_group Dag:

  • section_1 is correct if everything runs normally. But manually mark a child task as failed. Then the states dict is correct but overall_state was not updated
  • section_2 the task_count is correct at 2 if you assume its nested task group is a single task but the list of states shows 4 states which would be correct if you ignore task groups and only count the actual number of tasks no matter how nested they may be.

Thanks a lot for the review and additional tests! I have adjusted the code accordingly. Indeed it was missing recursive task_groups and overall_state calculation had problem with the order of the loops :sweat_smile:

bugraoz93 avatar Dec 09 '24 22:12 bugraoz93

This is coming along well! Thanks for all your patience as I test this endpoint out more and more.

My pleasure! I am happy that we ensuring everything will run as expected before shipping this one. Thanks for having been testing multiple times throughout the PR! :) I will take a look at these soon. Thanks!

bugraoz93 avatar Dec 11 '24 18:12 bugraoz93

We only want to count its immediate children and their states. So section_2, should only have a count of 2 and states of failed: 1, success: 1

I misunderstood your earlier message from this one related to the representation of nested task groups. I have now included smaller steps to ensure the correct calculation.

bugraoz93 avatar Dec 12 '24 19:12 bugraoz93

I fixed the mypy check, I am surprised how my local pre-commit couldn't catch that. Additionally, changed the child_state, I agree, it can reduce confusion

bugraoz93 avatar Dec 12 '24 20:12 bugraoz93

Thanks so much. We are nearly there!

  1. Can you check that limit is actually being respected? In my manual testing, I was trying to only fetch 14 runs but I always got the default of 100. @pierrejeambrun would you have any ideas here?

  2. Can we add a test case for a triply nested dag group? Because it looks like we're only bubbling a failed state up twice.

Current UI: Screenshot 2024-12-13 at 11 26 56 AM

Legacy UI: Screenshot 2024-12-13 at 11 28 39 AM

bbovenzi avatar Dec 13 '24 16:12 bbovenzi

  • Can you check that limit is actually being respected? In my manual testing, I was trying to only fetch 14 runs but I always got the default of 100. @pierrejeambrun would you have any ideas here?
  • Can we add a test case for a triply nested dag group? Because it looks like we're only bubbling a failed state up twice.

Thanks for the comment! I will include test cases for more nested groups and test the limit accordingly.

A few things to adjust before we can merge. Also some part of the logic are re-implementation (state propagation bug mentioned by Brent), I would suggest to try to keep things similar to the legacy implementation that has proven to be correct for many edge cases. Unless some stuff need improving of course, but the legacy code seems cleaner to me at this point. Maybe just because i'm more used to it.

Thanks for the detailed review! I found the legacy implementation more complex to understand, to be honest :sweat_smile: For example, I couldn't easily grasp these edge cases (mostly for Mapped objects) which didn't included in couple of iterations (getting used to is the indication here :+1: ). We have covered most edge cases here too. Sorry if methods without comments made any confusion and took your time more while reviewing! I am not against converting the new logic to legacy implementation. If you think this could bring complexity, let's do that. Which area would you like to see the legacy implementation? I copied most of the logic from the legacy code to here other than building up the task map and decoupling the logic into multiple methods which was mostly populated under dag_to_grid.

I will check these comments in detail today and start making the changes.

bugraoz93 avatar Dec 16 '24 14:12 bugraoz93

I found the legacy implementation more complex to understand

I'm likely biased because i'm familiar with the old one. A fresh eye is always welcome and if you feel that way I'm sure others will find this new implementation easier to grasp, i'll just get used to it :).

As long as we have tests covering those edge cases we should be fine.

pierrejeambrun avatar Dec 16 '24 15:12 pierrejeambrun

Honestly, I don't think I did a great job with dag_to_grid so it can definitely be improved. Either way, its a lot of complicated recursive logic so let's make sure we have tests to help us out.

bbovenzi avatar Dec 16 '24 20:12 bbovenzi

Amazing, thanks both of you for the open-minded approach and constructive feedback! :) I really appreciate the great work done here. Handling recursive logic like this isn’t easy. This work would take much longer without that logic already in place. I am focusing on adding the necessary tests and checking the areas you’ve highlighted. The PR will be updated soon.

bugraoz93 avatar Dec 17 '24 16:12 bugraoz93

Can you check that limit is actually being respected? In my manual testing, I was trying to only fetch 14 runs but I always got the default of 100. @pierrejeambrun would you have any ideas here?

I have checked and it seems like it is working fine. I have included a test case for it.

Can we add a test case for a triply nested dag group? Because it looks like we're only bubbling a failed state up twice.

Indeed, this needs to be recalculated since the entire parent state is updated accordingly. I have included that small calculation to cascade this to all recursive parent groups. I have also included test cases for nested loops. I have tested locally with a case that is exactly similar to the picture you sent with 3 depth.

As long as we have tests covering those edge cases we should be fine.

I have included multiple test cases. I have also updated the code according to the comments. I will resolve them accordingly.

Only one specific thing to mention in general is I have included the SortParam OR method. Since the value won't be null, the only thing to compare is the generated way which falls to the primary key. This made me eliminate the unnecessary method and make or with order_by | SortParam(...., <first_element_ordering>). If you think this isn't generic enough to put on SortParam, I can move the OR logic to the grid method.


Sorry for the delay! Many thanks for the tests and reviews!

bugraoz93 avatar Dec 20 '24 22:12 bugraoz93

I double-checked the parameters created with filter_param_factory, and they weren’t working as expected. 😕 To address this, I needed to adjust the run_types and run_states parameter names to run_type and state, respectively. It seems that filter_param_factory requires the parameter names to match the column names to function correctly.

Additionally, I have included the remaining two filter tests to ensure full coverage. With this update, all filters have been tested.

Thanks!

bugraoz93 avatar Dec 21 '24 05:12 bugraoz93

Code and test cases look good, beside the couple of nits, ready to merge.

Amazing news, just pushed the changes and resolved the threads. Thanks a lot for your detailed review!

bugraoz93 avatar Dec 23 '24 13:12 bugraoz93

cc: @bbovenzi just merged. Hoping this can enable further development for the front end side.

pierrejeambrun avatar Dec 23 '24 14:12 pierrejeambrun