Get help from the marimo community

Updated 2 weeks ago

Is it possible to make variables available by default in marimo notebooks? kedro + marimo

Hi, I work with kedro at work. Kedro is a framework to build DS/ML pipelines, I don't love it but it's what my company uses and I think we can make it work, it's pretty customisable at least.

Kedro is very geared towards jupyter notebook users. They rely on magic command to load special kedro variables such as the catalog, conf file, etc. I use to love magic when I got to know jupyter but nowadays I really don't like anymore.

The point is, this magic makes some variables available in the global namespace so that the user doesn't have to worry about loading the right config/catalog. This seems like a patchy fix, because to be honest, they don't have a robust solution to load these files from a simple python script or for example marimo notebook. They also have special commands kedro jupyter lab and kedro jupyter notebook that "inject" these variables for the user before hand. See a thread of the issues with Kedro configs + marimo here: https://kedro-org.slack.com/archives/C03RKP2LW64/p1736872610928369

I wonder if it'd be possible to have a similar kedro marimo edit command to integrate marimo into this framework.

PS: With all the work you've done on data flows and DAGs I think you actually are in a prime position to create a clean ML pipeline framework, would love to hear your thoughts on that. @Akshay @Myles Scolnick
M
l
5 comments
thanks for sharing @lucha6 and would be awesome to have better integration with Kedro. you are probably the expert in know how they can integrated. magic commands usually just map back to python, so maybe there is an util you can import to do this, eg..
Plain Text
import kedro
kedro.init()

is magics the only way to integrate with kedro? could you maybe file a ticket upstream and we can chime in?
I'll post this as a gh issue in the kedro repo
Thanks @Myles Scolnick I found out that they currently have this magic to load the kedro config variables in the global namespace, they recommend against using it directly as a function, I imagine it is because it would fail if you run get_ipython outside an ipython or jupyter session so it's not written to work from everywhere. They have a way to determine the kedro project root programmatically using this function _find_kedro_project which could be used to set up these variables in a cleaner more programmatic way from anywhere in the kedro project like:

Plain Text
from kedro.framework.session import KedroSession
from kedro.framework.startup import bootstrap_project
from kedro.framework.project import configure_project
from kedro.utils import _find_kedro_project
from pathlib import Path

# Get project root
notebook_path = Path(__file__).resolve().parent
project_root = _find_kedro_project(notebook_path)

# Now create the session
with KedroSession.create(project_path=project_root) as session:
    context = session.load_context()
    catalog = context.catalog
    weekly_sales = catalog.load("weekly_product_subgroup_sales")


rather than having to hardcode or determine the path to the root project dynamically, e.g. Path(__file__).parent.parent
Thank you, good to hear the feedback from the team
Add a reply
Sign up and join the conversation on Discord