Pass through links to .md files on GitHub
If I use markitdown on a link to a README.md file on GitHub, it currently includes the entire header and footer navigation of GitHub's web interface, when I really only want the contents of the README.md. Should markitdown be able to convert the GitHub URL into a raw CDN URL and put it into the conversion pipeline, pass it through unaltered, or keep the current behavior?
I'm not sure this is the responsibility of markitdown to do this and could open up a precedence for loads of url manipulations. I would suggest a little helper function in python to parse your input before passing them to the markitdown class. Here's a simple, untested, function you could use or start with.
def fetch_github_file_content(github_url):
raw_url = github_url.replace("github.com", "raw.githubusercontent.com").replace("/blob/", "/")
response = requests.get(raw_url)
if response.status_code == 200:
return response.text
else:
print("Failed to fetch data. Status code:", response.status_code)