• #503: The PyArrow Revolution

  • Apr 28 2025
  • Length: 1 hr and 9 mins
  • Podcast

#503: The PyArrow Revolution

  • Summary

  • Pandas is at a the core of virtually all data science done in Python, that is virtually all data science. Since it's beginning, Pandas has been based upon numpy. But changes are afoot to update those internals and you can now optionally use PyArrow. PyArrow comes with a ton of benefits including it's columnar format which makes answering analytical questions faster, support for a range of high performance file formats, inter-machine data streaming, faster file IO and more. Reuven Lerner is here to give us the low-down on the PyArrow revolution. Episode sponsors NordLayer Auth0 Talk Python Courses Links from the show Reuven: github.com/reuven Apache Arrow: github.com Parquet: parquet.apache.org Feather format: arrow.apache.org Python Workout Book: manning.com Pandas Workout Book: manning.com Pandas: pandas.pydata.org PyArrow CSV docs: arrow.apache.org Future string inference in Pandas: pandas.pydata.org Pandas NA/nullable dtypes: pandas.pydata.org Pandas `.iloc` indexing: pandas.pydata.org DuckDB: duckdb.org Pandas user guide: pandas.pydata.org Pandas GitHub issues: github.com Watch this episode on YouTube: youtube.com Episode transcripts: talkpython.fm --- Stay in touch with us --- Subscribe to Talk Python on YouTube: youtube.com Talk Python on Bluesky: @talkpython.fm at bsky.app Talk Python on Mastodon: talkpython Michael on Bluesky: @mkennedy.codes at bsky.app Michael on Mastodon: mkennedy
    Show more Show less
adbl_web_global_use_to_activate_webcro768_stickypopup

What listeners say about #503: The PyArrow Revolution

Average customer ratings

Reviews - Please select the tabs below to change the source of reviews.