markitdown icon indicating copy to clipboard operation
markitdown copied to clipboard

Pass through links to .md files on GitHub

Open ewired opened this issue 9 months ago • 1 comments

If I use markitdown on a link to a README.md file on GitHub, it currently includes the entire header and footer navigation of GitHub's web interface, when I really only want the contents of the README.md. Should markitdown be able to convert the GitHub URL into a raw CDN URL and put it into the conversion pipeline, pass it through unaltered, or keep the current behavior?

ewired avatar Apr 22 '25 01:04 ewired

I'm not sure this is the responsibility of markitdown to do this and could open up a precedence for loads of url manipulations. I would suggest a little helper function in python to parse your input before passing them to the markitdown class. Here's a simple, untested, function you could use or start with.

def fetch_github_file_content(github_url):
     raw_url = github_url.replace("github.com", "raw.githubusercontent.com").replace("/blob/", "/")
     response = requests.get(raw_url)
     if response.status_code == 200:
         return response.text
     else:
         print("Failed to fetch data. Status code:", response.status_code)

mat-0 avatar May 04 '25 10:05 mat-0