kitops icon indicating copy to clipboard operation
kitops copied to clipboard

added support for huggingface dataset repositories

Open arnab2001 opened this issue 3 months ago • 2 comments

Description

add Hugging Face dataset support to kit import by detecting dataset URLs and hitting the correct api/datasets/resolve endpoints, so both models and datasets can be packaged

Testing

go test ./... KITOPS_HOME=$(mktemp -d) ./kit import https://huggingface.co/datasets/nvidia/ProfBench --ref main --tag test/profbench:latest --tool hf /kit unpack test/profbench:latest --dir "$(mktemp -d)"

Screenshot 2025-11-04 at 6 04 19 PM

Linked issues

closes #1004

AI-Assisted Code

  • [x] This PR contains AI-generated code that I have reviewed and tested
  • [x] I take full responsibility for all code in this PR, regardless of how it was created

arnab2001 avatar Nov 04 '25 12:11 arnab2001

Thanks for the review @amisevsk, Its my short sight that i missed these things. I am working on resolving these comments

arnab2001 avatar Nov 05 '25 03:11 arnab2001

  • Created typed enums: RepositoryType in pkg/lib/hf/repo.go and hfRepoType in pkg/cmd/kitimport/hfimport.go, now no string literals ("dataset", "model") are used in comparisons. All comparisons use typed constants now

  • Seperation: extractRepoFromURL() in util.go is now generic and handles GitHub/Git URLs only, parseHuggingFaceRepo() in hfimport.go handles all HuggingFace-specific logic (including dataset detection). extractRepoFromURL() is no longer mixing concerns

  • Both enums use int with iota as suggested Proper mapping function mapRepoType() converts between the two enum types

@amisevsk please check and let me know if theres any other changes are required, thank you for your time.

arnab2001 avatar Nov 05 '25 04:11 arnab2001