Dodo

Results 4 comments of Dodo

- 这个参数接收的`train_files`和`validation_files`是文件名的`list`,所以可以是单个 / 多个文件,在代码里面使用`load_dataset`接收这两个`list`: ```python raw_datasets = load_dataset( extension, data_files=data_files, cache_dir=os.path.join(training_args.output_dir, 'dataset_cache'), use_auth_token=True if model_args.use_auth_token else None, **dataset_args, ) ``` - 也可以只输入一个train文件,此时需要同时输入`validation_split_percentage`参数,此时会据此进行数据集划分.

Hi @KashiwaByte101 , Thanks so much for the quick and detailed reply! I really appreciate you taking the time to explain the token estimation logic. Your calculation method is super...