data-engineer-roadmap Concurrency models are missing

For a modern data engineer knowledge of concurrency models is important.

A data engineer should know the difference between concurrency and parallelism.
A data engineer should know the difference between task parallelism and data parallelism.
Threads vs. processes. Example in Python: libraries threading vs multiprocessing, what are the differences, and what problems does Python have with threading.
A pretty typical scenario for modern data integration: call n APIs each x sec / min / hours. How to do that with a good performance? One of the ways would be to use asynchronous programming.
Actor model might be good to know as well.
DAG (example: Apache Airflow) vs state machines (example: Amazon Step Functions) vs ... . Is actually covered by 'Data structures and algorithms', but maybe would be good to mention this as an example of how knowledge of them might be helpful for a data engineer.
Parallel programming using techniques like CUDA on GPU.
Functional programming is also 'nice to have' (but not obligatory).

If you agree on at least some of the points, I can prepare the text.

Feb 19 '21 11:02 Vlad-Radz

Hey, these are really good points! I'll def consider adding these to the image when I update it next time. Feel free to create a PR and add it to the markdown version. Thanks a lot for the contribution!

Apr 10 '21 14:04 alexandraabbas

Hey, thanks for the feedback! I will create the markdown version, sure!

Apr 17 '21 19:04 Vlad-Radz