I’m excited to introduce a new web-based language agnostic tool that simplifies the process of converting web-based tabular data into code that could be used to importing the data into data science languages like R, Python, and Julia. This tool is especially useful for data scientists and analysts who frequently work with data from the web; but often find it cumbersome to extract and convert one-off data into a format that can be easily imported into their data analysis environment.
This project was inspired by the {datapasta}
R package by Miles McBain and contributors. We sought to bring similar functionality to a web-based, multi-language environment to avoid needing a package port for each language.
Overview
At its core, this tool is designed to eliminate the friction between finding tabular data on the web and getting it into your analysis environment. When you find a table on any webpage, you can simply highlight it and, then, copy it and paste it directly onto the tool’s webpage.
From there, tool automatically processes the table, detecting data types and cleaning column names, then generates the appropriate code for your chosen programming environment. Plus, you can preview the pasted table right underneath the generated code to ensure that the tool has correctly parsed the data!
For R users, the tool generates code compatible with popular data analysis frameworks like Base R, Tidyverse’s Tibble, data.table, or R Polars implementations. For Python users you can can choose between Pandas, Polars, or datatable. Lastly, Julia users get code formatted for DataFrames.jl
. This flexibility means you can seamlessly integrate one-off web data into your existing workflow, regardless of your preferred environment, without relying on a web scraper.
Let me try it already!
Interested in using the tool? Of course you are!
Click here to access the tool.
Once there, follow these simple steps:
- Copy any HTML table from a webpage (
Ctrl/⌘ + C
) - Paste directly into the tool (
Ctrl/⌘ + V
) - Select your preferred language and framework
- Copy the generated code
- Paste it into your analysis environment
Voila! You’ve successfully imported web-based data into your analysis environment!
Fin
I hope you find this tool useful in your data science workflow. If you have any feedback or suggestions, please feel free to reach out to me on socials.