Microsoft releases a new Python tool for converting files and office documents to Markdown

Microsoft releases a new Python tool for converting files and office documents to Markdown

Home » News » Microsoft releases a new Python tool for converting files and office documents to Markdown
Table of Contents

Markdown is a well-liked light-weight markup language with plain textual content formatting syntax designed to be straightforward to learn, write, and perceive. Markdown makes it straightforward for AI algorithms to parse and perceive the construction of textual content because of its constant and predictable syntax. It’s also broadly supported by common instruments, together with GitHub, Jupyter notebooks, and extra.

Microsoft not too long ago launched an open-source instrument known as MarkItDown on GitHub. MarkItDown is a Python library for changing recordsdata and workplace paperwork to Markdown. The transformed recordsdata can then be used for indexing, textual content evaluation, and extra. Microsoft’s MarkItDown library presently helps the next file codecs:

  • PDF (.pdf)
  • PowerPoint (.pptx)
  • Phrase (.docx)
  • Excel (.xlsx)
  • Pictures (EXIF metadata, and OCR)
  • Audio (EXIF metadata, and speech transcription)
  • HTML (particular dealing with of Wikipedia, and so forth.)
  • Varied different text-based codecs (csv, json, xml, and so forth.)

Builders may also configure the MarkItDown library to make use of Giant Language Fashions to explain photographs. To do that, they must set mlm_client and mlm_model parameters to the MarkItDown object as under:

from markitdown import MarkItDown

from openai import OpenAI

shopper = OpenAI()

md = MarkItDown(mlm_client=shopper, mlm_model=”gpt-4o”)

outcome = md.convert(“instance.jpg”)

print(outcome.text_content)

For the reason that MarkItDown library is accessible underneath the MIT open-source license, builders can freely use, modify, and distribute it. The one requirement is that they embody the unique license and copyright discover of their distribution.

Builders can obtain the MarkItDown Python library right here. They’ll additionally set up it utilizing the “pip set up markitdown” command, or from the supply utilizing “pip set up -e” command.

NEW: Microsoft simply dropped a library for changing Workplace recordsdata to markdown.

It is tremendous quick and straightforward to make use of.

I constructed an app so that you can attempt it out. Right here it’s changing a boilerplate pptx. pic.twitter.com/NrG6C5DCaq

— matt palmer (@mattppal) December 13, 2024

If you’re not a developer, you possibly can check out the MarkItDown library as an online app right here.

author avatar
roosho Senior Engineer (Technical Services)
I am Rakib Raihan RooSho, Jack of all IT Trades. You got it right. Good for nothing. I try a lot of things and fail more than that. That's how I learn. Whenever I succeed, I note that in my cookbook. Eventually, that became my blog. 
share this article.

ADVERTISEMENT

ADVERTISEMENT

Enjoying my articles?

Sign up to get new content delivered straight to your inbox.

Please enable JavaScript in your browser to complete this form.
Name