Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Visualizations for machine learning datasets (github.com/pair-code)
178 points by happy-go-lucky on Oct 8, 2017 | hide | past | favorite | 7 comments


I'm not sure why this is described as being for "machine learning datasets" when it actually sounds useful for any exploratory data analysis, regardless of what the user intends to do with the data. Anyway, this looks super slick, and I definitely want to try it out in a jupyter notebook.


Probably for the same reason that predictive models are described as powered by “AI” instead of by “statistical learning”: Some terms are simply more in vogue even if they’re more vague.


For nice looking basic statistics of your dataset the pandas-profiling library will do the trick https://github.com/JosPolfliet/pandas-profiling


The front page https://pair-code.github.io/facets/ is more useful, because you can load up your own data and try it out.


They have an example of using the tool on the quick draw dataset: https://pair-code.github.io/facets/quickdraw.html


Does this also work for larger numbers of features? (like, 2000, as opposed to 6-9 in the demo)


Only if your dataset is really small. It only supports up to a low millions of points




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: