Get help from the marimo community

Updated 6 months ago

programmatically interact with marimo

At a glance

The community members discuss the feasibility of interacting with the Jupyter kernel directly in a programmatic way, without going through the user interface. The consensus is that this is not currently possible, as the internal APIs are subject to drastic changes and are not yet standardized.

The community members explore alternative approaches, such as using libraries like Marimo and Ipyflow, which take different approaches to parsing and analyzing the structure of Jupyter notebooks. Marimo uses static analysis, while Ipyflow relies on runtime analysis, which the community members suggest has limitations and edge cases.

The community members also discuss the challenges of abstracting the Jupyter server-style API, noting that the internal abstractions are still rapidly evolving, and they are not yet ready to commit to a stable public API. However, they suggest that this may be possible in the future as the project matures.

Useful resources
Is it possible to interact with kernel directly in a programmatic way without going through the UI? I want my python backend server to be able to add, remove and edit cells, get state, get dependency/DAG structure of a notebook.
TY
A
f
15 comments
No, that's not possible today. All those operations are part of an internal API that is subject to change drastically, and it's unfortunately too early to standardize.
Out of curiosity, what is your use case?
thanks for the prompt reply
this is for a small project im working on that involves running python code in a reproducible fashion. it's for the same kind of use cases people use notebooks for (explore -> pipeline)
the crux of what I'm interested in is playing around with the UI/frontend. basically not a notebook. so i was hoping there was a clean way to manipulate code cells and get info about the relationship between them.

I really like what you're doing with marimo, I also checked out ipyflow and hex.tech

I think ipyflow might give me some ability to do this if I interact with it through REST-(ish?) API of the jupyter server.
I've looked at the internal (marimo/_ast) modules and see what you mean. I'm a bit scared of touching that, a lot going on that I don't understand.

in any case, last night this got me interested in thinking about even how these systems parse the python code to figure this out and I've made a simple static analysis thingy. excited about adding more features and playing around with it.

I think I might keep doing this from scratch, would be educational. Very grateful for the Ipyflow papers, gives a lot of insight on how this works.
Actually I'm curious, are all three of you players in the space (marimo, hex, ipyflow) doing the DAG parsing in the same way fundamentally?

It seems only marimo and ipyflow are open source and I wonder how you see the comparison.
Yes, I think that would definitely be very educational! The parsing is in marimo/_ast/visitor.py.
marimo and ipyflow take very different approaches. ipyflow uses runtime analysis, trying to react to mutations, whereas marimo relies exclusively on static analysis -- meaning the DAG only takes into consideration variable definitions and variable references. In particular, we don't track mutations to objects at all, and we don't do any runtime tracing of your code. This is intentional, because tracing and trying to detect mutations is a losing battle; it's just a fundamentally impossible task in Python, there will always be edge cases that you can't cover leading to a poor development experience. I used to work on TensorFlow, and there was a team that tried to do runtime tracing of Python code to detect mutations, and it just didn't really work. In contrast, in marimo, it's easy for the user to understand how their DAG will be formed.
One day we'll write a paper about it. I've written a small amount about how all this works in this blog: https://marimo.io/blog/lessons-learned
Wow this is so helpful thank you so much!
I did watch a jupytercon presentation about ipyflow and he mentioned runtime analysis and also intuitively thought that would be really hard.

So if it is a losing battle, how would you evaluate what the state of ipyflow is? Are they missing edge cases?

Also on your point about not being able to abstract out the jupyter-server-style api (what I asked originally). What do you feel is complicated about it? Isn't that abstraction naturally separate from everything else and would be neatly packageable?
So if it is a losing battle, how would you evaluate what the state of ipyflow is? Are they missing edge cases?

Missing edge cases, yes. There's also a runtime overhead, something between 2-4x I believe. I talked with Stephen Macke somewhat recently, and he believes the static approach we've taken is better for both users and developers. I spoke with Chris Lattner and he also endorses the static approach β€” that's as good of an endorsement as you can get in my book πŸ™‚
Also on your point about not being able to abstract out the jupyter-server-style api (what I asked originally). What do you feel is complicated about it? Isn't that abstraction naturally separate from everything else and would be neatly packageable?

In theory it's of course doable. But Myles and I are developing really rapidly, and our internal abstractions change rapidly β€” though of course our public API is relatively stable. We can't abstract out the internals into a public API until we're ready to commit to not changing it, we don't want to break our users' code. So it's not that it's hard to expose our APIs, it's just too early.
If you like you can also check out a talk I gave on marimo at north bay Python: https://www.youtube.com/watch?v=9R2cQygaoxQ&t=1s
Ty this is very helpful!
Add a reply
Sign up and join the conversation on Discord