Kernel dies in a notebook that manipulates a large Pandas dataframe
At a glance
The community member is working with large Pandas dataframes (5-10M rows) and is unsure whether to allocate more resources, make the notebook more resource-efficient, or take some other action. The community members in the comments suggest providing the stacktrace and creating a reproducible notebook (using synthetic data if necessary) to help diagnose the issue. They also mention that some performance improvements have been made to Pandas, so the community member should check if they are still seeing issues with the latest version (0.9.32 or above).
The notebook loads a couple of Pandas dataframes, each with 5-10M rows, filters each them down to 3-5M rows, samples 10% of them, and plots various charts. Unsure whether to allocate more resources, make my notebook more resource efficient, or something else. I can provide the stacktrace if helpful.
The stacktracr would be helpful. If you could make a notebook that reproduces the issue (using synthetic data if your data is private) that would be very helpful