torchtitan issues

reload existing llama checkpoints

10

tianyu-l

enhancement

add config option to only produce tensorboard logs on rank 0

tianyu-l

enhancement

[fused_rmsnorm] Avoid querying device inside forward

Stack from [ghstack](https://github.com/ezyang/ghstack) (oldest at bottom): * __->__ #301 * #300 * #161 Get sm_count another way to work around issues with meta-device tracing Note: this PR isn't strictly safe...

wconstab

CLA Signed

add doc for adding custom dataset

per user request, we don't currently have any info on how to do this. (basically extend the hf_dataset file).

lessw2020

documentation

enhancement

Add Pipeline Parallel (and 2D PP+FSDP) support

Stack from [ghstack](https://github.com/ezyang/ghstack) (oldest at bottom): * #340 * #337 * __->__ #318 runs PP+DP and PP+TP without issue, runs PP+TP+DP with decreasing loss, but fails DCP save Supports only...

wconstab

CLA Signed

Make Transformer tolerate missing layers for PP

Stack from [ghstack](https://github.com/ezyang/ghstack) (oldest at bottom): * #318 * __->__ #322 * #321 A few small changes here lets manual PP frontend 'reconfigure' a whole transformer model to a stage's...

wconstab

CLA Signed

Refactor freqs_cis slice to be safer for PP

Stack from [ghstack](https://github.com/ezyang/ghstack) (oldest at bottom): * #318 * #322 * __->__ #321 Unchanged: we precompute freqs_cis for max_seqlen, >> seqlen for a given batch. Changed: instead of slicing self.freqs_cis...

wconstab

CLA Signed

selective compilation - norm layers only

2

This PR adds the option to selectively compile just the norm layers only, and is mainly targeted at RMSNorm. By compiling just the norm layers when using rmsnorm, we get...

lessw2020

CLA Signed

Add support of DDP and CompiledAutograd.

Stack from [ghstack](https://github.com/ezyang/ghstack) (oldest at bottom): * __->__ #319

fegin

CLA Signed

Question on Model Init

7

I noticed that there are two parts of implementation that are related to model initialization. ### Instancing the model with meta tensor https://github.com/pytorch/torchtitan/blob/f72a2a0da0bdfc394faaab9b3c0f35d0b6f5be50/train.py#L177-L181 ### Doing explicit model initalization https://github.com/pytorch/torchtitan/blob/f72a2a0da0bdfc394faaab9b3c0f35d0b6f5be50/train.py#L209-L210 The...

XinDongol

question

torchtitan
torchtitan copied to clipboard

Metadata

reload existing llama checkpoints

add config option to only produce tensorboard logs on rank 0

[fused_rmsnorm] Avoid querying device inside forward

add doc for adding custom dataset

Add Pipeline Parallel (and 2D PP+FSDP) support

Make Transformer tolerate missing layers for PP

Refactor freqs_cis slice to be safer for PP

selective compilation - norm layers only

Add support of DDP and CompiledAutograd.

Question on Model Init

← Metadata

Owner

Metadata

torchtitan torchtitan copied to clipboard

Metadata

← Metadata

Owner

Metadata

torchtitan
torchtitan copied to clipboard