practical copy/paste clipboard model of recent web HTML5 browsers?

Question

^{the reader is expected to be an advanced Linux developer, having read ALP and having developed advanced GUI applications on Linux using GTK or Qt; notice that sadly I am not a native English speaker (but French).}

I have difficulties in understanding the conceptual model of the copy/paste clipboard in recent HTML5 browsers (e.g. a Firefox 60.8, that is /usr/bin/firefox, or a Chrome 75.0, on Linux/Debian, released this year in 2019). This is in the context of the Bismon applied research project, with a low TRL, which provides some domain specific, dynamic and transpiled language (also called Bismon), already has a Web interface generic machinery conceptually inspired by ocsigen, and is orthogonally persistent.

In X11, the model (see ICCCM & EWMH) starts by negotiating a common data format and knows about WM_CLIENT_MACHINE and _NET_WM_PID. This is why we can copy/paste images and rich text from Firefox to Libreoffice, even if they run on different X11 client hosts.

But let's suppose I have two instances (running on two different Linux hosts) of the same single-page web application : it is bismon, a GPLv3+ "research prototype" software which is some specialized HTTP server above libonion, with already existing components generating C - bismon is a transpiler -, JavaScript & HTML5 (the CSS being handwritten by me) running in two different tabs of the same Linux browser. Both tabs are running some syntactic editor (in their specialized bismon web servers), so are manipulating abstract syntax trees (textually representable in some textual serialized format, conceptually like S-expressions, XML, YAML or JSON - and without loss of generality it could be exactly some JSON). And I want to copy/paste one abstract syntax sub-tree from one tab to another. My continuously updated Bismon draft report gives further details, notably in its chapter 4. There is an already working, but very incomplete, Web interface in bismon commit 980c2d6ff2df2 with a working login form (similar in functionality to the StackOverflow login form) setting some HTTP session cookie, in practice quite a random and unique one, such as BISMONCOOKIE=n000041R970099188t330716425o_6IHYL1fOROi_58xJPnBLCTe. Every user (so every Bismon web browser tab) is allowed to interact, in a single page application fashion, only after having successfully logged in (conceptually analog to StackOverflow login procedure). Hence, exactly like I could have two StackOverflow users and login to them in two different web browser tabs, I might have two or three Bismon web browser tabs logged in (from Bismon's perpective) differently. Each of these tabs is a single page application browser tab (with a different and unique BISMONCOOKIE). Here is an already working example of Bismon login form (with ./bismon serving, for HTTP thru libonion, on port 8086 of localhost): . A single physical person is running the firefox browser on a Linux workstation (and a single Xorg display server showing that browser X11 window) with several tabs. Later, several real physical persons (Alice, Bill and me, the static analysis expert) could use different laptops (running Linux) to access the same (or even several different) Bismon process using HTTP. The hard case is probably with two different Bismon servers accessed from the same browser and physical person (that want to copy/paste content from one Bismon process to another one).

Here is a figure (its SVG source is here) showing an ideal dreamed situation (at end of 2020):

But today in 2019, Bill and the static analysis expert are actually the same physical person (me Basile) using the same single firefox browser (running on one powerful Debian workstation) in two different tabs (and Alice could also be impersonated by me, in a third tab). And I want to copy/paste a structured content from one tab (where I have Bismon-logged in as Bill) to another one (where I have Bismon-logged in as the static analysis expert).

After a successful login with the above form, the tab have a Bismon user (technically having some web session Bismon object associated with a contributor object, as explained in the Bismon draft report §1.6.3 and §4.2; the web session object is referred by the BISMONCOOKIE), and Bismon gives the following generated XHTML5:

 <?xml version="1.0" encoding="utf-8"?>
 <!DOCTYPE html>
 <html xmlns="http://www.w3.org/1999/xhtml"> 

 <head >  

 <title >Bismon</title>  
 <script src="/jscript/jquery.js" type="text/javascript"/>  
 <script src='/jscript/jquery-ui.js' type='text/javascript'/>  
 <script src='/bismon-root-jsmodule.js' type="text/javascript"/>  
 <script src='/jscript/bismon-hwroot.js' type="module"/>  
 <link href='/themes/first-theme.css' rel='stylesheet' type='text/css'/>  
 <link href='/css/jquery-ui.css' rel='stylesheet' type='text/css'/> 
 </head> 
 <body >  

 <h1 >Bismon</h1>  
 <nav class='bmcl_topnav' id='topnav_8LMWqayq5sW_9G2xsSpA0yS' >   

 &#9755;   
 <button class='bmcl_topbut' id='topbut_4m9twhDXB7k_88CBTgLfGvs' >  App </button>  
 </nav>  
 <p class='bmcl_hellopara' id='hellop_0uAT1v6dH9d_1o3q8wzbV7K' >  Hello Basile Starynkevitch  your web session is  <tt class='bmcl_cookie'>BISMONCOOKIE=n000001R59317675t289012178o_5FKgTFl64f2_2h8Y79EvsK7</tt> </p>  
 <ul class='bmcl_topmenu ui-menu' id='topmenu_2hnb4LnCzga_48CQrsBJofR' >   

 <li class='bmcl_topmenutitle ui-menu-item ui-state-disabled' id='topmtitle_6G1xOyeten5_7SqZ4EcQe8T' ><div >application : </div></li>   
 <li class='bmcl_topmenuitem ui-menu-item' id='topmitem_1SiDnlyQRR6_5meHUV4d3iF' ><div >dump</div></li>   
 <li class='bmcl_topmenuitem ui-menu-item' id='topmitem_9ZmJrhdpjae_79WiEHOVpbE' ><div >exit</div></li>   
 <li class='bmcl_topmenuitem ui-menu-item' id='topmitem_2nguorns5mY_2UnseYw0xRf' ><div >quit</div></li>  
 </ul> 
 </body>
 </html>
 <!-- end root-web-handler o_webex=_7rOPSVsyZnS_31DSTvb99w7; o_websess=_5FKgTFl64f2_2h8Y79EvsK7 at 2019 Jul 26, 05:15:35.52 MEST -->

A quite generic existing infrastructure in Bismon is capable of generating quite arbitrary XHTML5 (with SVG!) code like above (from some Bismon specific runtime data). A generic infrastructure also exists in Bismon to generate JavaScript code (transpiled from some Bismon specific domain specific language).

My ambition is to code, in my Bismon system, something with a fancy web interface, capable of editing some abstract syntax tree, perhaps appearing in the Web browser tab in a way close to the below figure (taken from wikipedia):

In the future, the Bismon user would have a tab with a content similar to above figure, and might, for example, click on the while box, and conveniently replace it with some until box. That idea (of syntax-oriented visual editors) is not new: Centaur implemented a similar idea in the 1980s. I want to implement a similar thing in Bismon using Web technologies. And I want to copy/paste, from one tab to another of the same Firefox browser, entire, well formed, abstract syntax sub-trees (or, at the conceptual level, well written S-expressions representing such AST subtrees)

The general use case is several Bismon processes A, B, .... Each of them is HTTP-serving and single-page-application filling browsers tabs TA (for A), TB (for B), ... I want to copy/paste some AST part (an abstract syntax subtree) from TA to TB. The same human person could be logged in (thru the login form shown above) as three different Bismon users and using three different tabs TA, TB, TC.

question

How should I design such a thing? FWIW, every software involved - Bismon, web browsers, etc...- is (contractually, in the H2020 project funding that work) open source software on Linux. And Bismon is in july 2019 at TRL 2 and might, if all goes well, reach TRL 3 at end of 2020.

Notice that I am not asking about AJAX code manipulating the DOM, I am asking about the concepts explaining copy/pasting (of some structured tree-like data, expressible in XML or in S-exprs or JSON, and displayed as nested HTML5 or SVG DOM elements) between two different tabs of the same browser. Also, I would like that the copy source and paste destination web tabs (hence their different web servers) to communicate some data which has no visual appearance (preferably even without any display:none HTML5 element).

In other words, I am trying to find and understand the equivalent of ICCCM & EWMH for web technologies, about copy/pasting between two tabs of the same recent Firefox (or Chrome) browser on Linux. My feeling (just a guess) is that it is frowned upon (for security reasons) to copy/paste between two different tabs, but I don't know the details. I did found this W3C clipboard API but I am guessing most of it is not yet implemented today. What is exactly available today in practice on recent Linux browsers? Also, real-life code examples (working with Firefox 60.7 on Debian/Linux/x86-64) are welcome!

My question could be rephrased as: how to copy/paste, using Linux with a recent Xorg and some EWMH compliant window manager only (I don't care about other OSes at all!), some textual format content (probably JSON, but it could be my own Bismon textual format) with its MIME type from one tab (driven by Bismon on Linux host A) to another tab (single page web application tab of Bismon on Linux host B) of the same browser? Ideally, I would prefer not changing the DOM at all (exactly in the same spirit of EWMH), but if possible I don't want a visual change of it (since the actual DOM modification would be controlled by Bismon AJAX or WebSocket handshakes or exchanges).

The several tabs are illustrated in the figure bismon-monitor.svg. In that figure, in some weird cases, Alice, Bob, and the left-side static analysis expert, could be impersonated by just me Basile, Bismon-logged in three times as 3 different Bismon-users, using three different tabs on the same Firefox browser (on Linux), and the Bismon server (or bismon monitor on that figure; in weird cases, we could even imagine 2 or 3 Bismon monitor processes running on different machines...) is also running on Linux and serving HTTP using libonion, and I want to copy/paste semantic contents representing complex ASTs (Bismon objects, in my parlance) from one browser tab to another one. If I was using GTK or Qt I would be able to code that without issues (since both have a flexible, generic, well document, clipboard & copy/paste related API).

From a user point of view, I am almost asking about the detailed design of some collaborative software tool, using Web technologies, and capable of editing some sophisticated proof (or mathematical text or wiki with formulae) within a small team.

My draft report has dozen more pages about my ideas (and references to systems as old as Centaur and Mentor related to them). I want to implement these ideas using modern Web technologies in my bismon GPLv3+ system. If I was using GTK or Qt, implementing these ideas is just a matter of coding (using also ssh -X or similar stuff). But I am less familiar with web technologies, however, Google docs is capable of copy/pasting like I dream of.

I was further thinking of copy/pasting HTML elements, from a browser tab TA interacting with bismon process A running on port 8086 of localhost to another bismon process B, running on port 8087 of localhost and shown in browser tab TB. Such copied HTML elements might contain <a href='http://localhost:8086/somequery?param1=val1&param2=val2'><span class='some_cl'>some <b>content</b></span></a> etc..? Could that work?

Don't forget that this is a research project with a very low TRL. I can make it work with even one browser (the latest Linux Firefox or Chrome being my personal preference)

To summarize my question :

what are the ideas of the design of copy/pasting from one browser tab to another one some complex structured contents in Google Docs or in TinyMCE (with several HTTP wiki servers involved!) ? How would you, the hypothetical software architect of such applications, guide the junior developer coding them?

I heard that it might be difficult for security reasons. The intuition is that a malicious web site (running in different browser tab) should not be even able to copy the credit card number I have just filled in another browser tab used for the legitimate web interface to my bank account.

PS: I am today july 2019 a quite senior software developer, aged 60, (with a PhD in CS from 1990) coding professionally since 1985, but today, as a new web developer, I am still a newbie in that area (but I have some academic knowledge about HTTP, cookies, HTML5, DOM, AJAX, JavaScript, ... but very few concrete practical coding experience)

PPS. See also this.

I don't understand the relation between the Linux kernel (or libc) and the copy/paste abilities. IMHO they depend only on the browser version. I am ready to install the latest browsers (since EWMH is quite old, and firefox or chrome browsers are hopefully EWMH compliant) — Basile Starynkevitch, Jul 25 '19 at 09:23
Are you trying copy/paste across OS user sessions/machines? You make a couple of comments that suggest something to that effect, maybe? — JimmyJames, Jul 25 '19 at 15:41
If it's correct that you are trying to have data shared across machines and users, can you explain why using the clipboard is important? I'm not seeing why that needs to be involved. In other words, I feel like you are talking about two different (mostly orthogonal) things: delivering data across the web and working with the OS clipboard. — JimmyJames, Jul 25 '19 at 16:02
No different OSes. Every software involved is contractually some open source software running on Linux. — Basile Starynkevitch, Jul 25 '19 at 17:55
Sorry, I wasn't clear. I mean are you talking about networked users? I'm a little unsure about "Alice, Bob, and the left-side static analysis expert (that is me), could be using three different tabs on the same Firefox browser". Do you mean 3 users at different computers? — JimmyJames, Jul 25 '19 at 18:13
I improved even more my answer. Remember, it is a research project (so quite low TRL) — Basile Starynkevitch, Jul 26 '19 at 02:23
@BasileStarynkevitch OK more detail is good but I'm honestly more puzzled. I don't see what copying between two tabs has to do with what host the page is loaded from. If I copy some text from one tab to another, all of that is local. There's no interaction with the server unless the browser or server sends a message. What is the relevance of this detail to the problem you are trying to solve? — JimmyJames, Jul 26 '19 at 14:17
@BasileStarynkevitch Also, I'm a little baffled by what you mean by "... could be using three different tabs on the *same* Firefox browser" What do you mean by 'same browser' here? Three people sitting at the same computer? Firefox is moving to [multiprocess](https://developer.mozilla.org/en-US/docs/Mozilla/Firefox/Multiprocess_Firefox) so each tab will effectively it's own heavyweight process (like Chromium/Chrome). Only the UI rendering process is shared across tabs. I think you are including this detail for a reason but it's unclear what it means to you. — JimmyJames, Jul 26 '19 at 14:24
Yes, firefox is multiprocess. But different tabs are sometimes sharing data! How exactly? Which data is exactly shared? — Basile Starynkevitch, Jul 29 '19 at 13:40
@BasileStarynkevitch I'm still trying to understand the use case. What does it mean to say Alice and Bob and you are 'using the same browser'? Why would you worry about the clipboard when you could just have the model at the server and render it on as many browsers/tabs on as many client machines as you want/need? What is the point of the extra complexity? — JimmyJames, Jul 30 '19 at 13:51
Because in the general cases, you have *several* Bismon processes A, B, ... and you copy/paste structured abstract syntax trees from one tab (connected to Bismon process A) to another tab (related to Bismon process B) — Basile Starynkevitch, Jul 31 '19 at 03:47
@BasileStarynkevitch You still haven't explained what it means to have three users on 'the same browser'. I've asked several times now. Is there a reason you don't want to explain what that means? — JimmyJames, Jul 31 '19 at 19:27
@BasileStarynkevitch And how does that make it the 'same browser'? The key words here are 'same browser'. What is the 'same' here? — JimmyJames, Aug 02 '19 at 15:50
two different tabs of the same browser is what I really want — Basile Starynkevitch, Aug 02 '19 at 18:34
@BasileStarynkevitch That's plainly obvious. What I don't understand is how 3 users are using 3 tabs on the *same browser*. It seems nonsensical but I could be missing something. Are there three people in front of a single computer with three screens? Are you talking about some sort of dumb terminal setup? I doubt that's what you actually mean but that's what you are literally saying unless there's something you haven't explained. — JimmyJames, Aug 02 '19 at 21:10
It could be the same physical person logged in as three different users. Likewise I sometimes use more than one [uid](https://en.wikipedia.org/wiki/User_identifier) on my Linux laptop, and I could use [setuid](https://en.wikipedia.org/wiki/Setuid) techniques in my code — Basile Starynkevitch, Aug 03 '19 at 01:40
I got a little confused by referring to the users as 'Alice', 'Bob' and yourself. It would be a little more clear if you just wrote something like "using three tabs logged in with different users on the same browser". Thanks for clarifying. — JimmyJames, Aug 05 '19 at 14:10
I did edit my question with such a clarification: "The general use case is several Bismon "... paragraph — Basile Starynkevitch, Aug 05 '19 at 14:16
The sentence in question is "In that figure, in some weird cases, Alice, Bob, and the left-side static analysis expert (that is in practice me, Basile), could be using three different tabs on the same Firefox browser (on Linux)". There's nothing about this that suggests that 'Alice' and 'Bob' are your alter-egos. It's a little confusing, that's all. — JimmyJames, Aug 05 '19 at 14:19
Alice Bob and me are three "users", very much like Unix users w.r.t. to the kernel. A Linux kernel knows users only by their *uid*. By analogy I (or the "static analysis expert") am the "root". And Bismon knows users only by their objid. More details available in section §4.2 of my draft report. It is difficult for me (a non-native English speaker) to summarize a 60 page technical report in a single SE question — Basile Starynkevitch, Aug 05 '19 at 14:26
I'm not sure of why you are telling something so basic. I explained what was confusing to me. It's not that. — JimmyJames, Aug 05 '19 at 14:28
Then I misunderstood your comment. Sorry about that. Again I am French, not in USA or UK. So I don't understand all the nuances of English. Try to comment in French -or Russian- if you can (joking, obviously) — Basile Starynkevitch, Aug 05 '19 at 14:30
Unfortunately, after many trips to Quebec, my ability with French is still limited to making some sense of French signage. The sentence I quote above refers to Alice, Bob, and yourself logging into the same browser. You stated that the way three users would be logged in is if you were logged into three tabs as different users. Either 'Alice' and 'Bob' are your other user accounts or you still haven't addressed what that means. I'm probably too hung up on it, I think I know what you are trying to do. Please let me know if my answer makes sense. — JimmyJames, Aug 05 '19 at 14:41
Added reference to §1.6.3 and §4.2 sections of my report. I don't have enough skills to summarize a 60 pages tech report (written by me in English, a foreign language to me) in a few paragraphs in the above question — Basile Starynkevitch, Aug 05 '19 at 14:48
I'm only offering this as help: the question I have been asking is really simple. I think perhaps you are misusing the term 'browser' as it is generally understood in English: typically, in the context you are using it, it means a process (or set of related processes) running on a single user's machine. Perhaps you are confusing this with a 'single-page web application'? — JimmyJames, Aug 05 '19 at 17:02
For me, `/usr/bin/firefox` is the browser. The fact that it uses several Linux processes is an implementation detail. I'm hoping to ignore them. From my point of view, assume that it was started in some `zsh` shell (under `xterm` or `gnome-terminal`) or in my `.xinitrc` — Basile Starynkevitch, Aug 05 '19 at 19:16
Look again at what I wrote. You are still missing the point. A browser is a process (or processes) on **one machine**. A single-page application can present a single model to multiple users on *different browsers*. You make it sound like Alice, Bob, and you are going to all be sitting at the same terminal. Even if that's what you plan to do, it seems irrelevant to what you are asking. — JimmyJames, Aug 05 '19 at 19:23
The physical person is sitting in front of a Linux workstation running a single `firefox` with *several* tabs (and has logged into them with various Bismon users). I just improved slightly more my question. The copy/pasting occurs between two different such tabs — Basile Starynkevitch, Aug 05 '19 at 19:30
Right and are 'Alice' and 'Bob' those other user names? The way it's written it sounds like you have 3 people at one terminal. I'm talking about the maybe 8th paragraph from the bottom. More confusingly the picture you refer to shows 3 browsers so how are they 'the same browser'? It also shows two different heads which implies they aren't just your user names. — JimmyJames, Aug 05 '19 at 19:37
The figure (I spent 10 hours to make it,so I am reluctant to change it) is the final objective for end of 2020; today I am the only physical person using Bismon — Basile Starynkevitch, Aug 05 '19 at 19:44
BTW, Bismon is free software (GPLv3+). You could `git clone` it and build it and try it — Basile Starynkevitch, Aug 05 '19 at 19:45
Are you sure you are talking about this [picture](https://github.com/bstarynk/bismon/blob/master/doc/images/bismon-monitor.svg)? In any event, the picture makes sense to me. It's the sentence that seems wrong. — JimmyJames, Aug 05 '19 at 19:48
Yes, that https://github.com/bstarynk/bismon/blob/master/doc/images/bismon-monitor.svg picture is the dream to achieve in end of 2020. We are still in 2019 and I might not achieve that dream entirely (but I need to try hard enough) — Basile Starynkevitch, Aug 05 '19 at 19:49
Great. Why you don't just fix the sentence to accurately reflect what's in that picture? Why are you talking about three users using the 'same browser'? — JimmyJames, Aug 05 '19 at 19:51
Exactly as the same Basile could use three Unix users (uids) on his single laptop (and perhaps use `/bin/su alice` or `ssh alice@localhost`, `/bin/su bob` etc... in different `xterm`s) — Basile Starynkevitch, Aug 05 '19 at 19:51
It's not the same at all. Three users at their own computers interacting across the web is fundamentally different than one user with three logins on one machine. You can solve it with the same solution (like the one I have proposed.) But neither of those situations are 3 people on the 'same browser', that just sounds ridiculous to me, to be frank. — JimmyJames, Aug 05 '19 at 19:54
How do you call the cognitive entity which fills a web login form. For me (not a native English speaker), it is a "user". The fact that it is a web crawler running at Google, or a lady typing on a laptop keyboard, is an implementation detail (that the Bismon system cannot be aware of) — Basile Starynkevitch, Aug 05 '19 at 19:55
I guess leave it if you wish. I found it to be a major stumbling block in making sense of the question. I thought I understood what you were trying to do until I saw that but it was just a red herring. — JimmyJames, Aug 05 '19 at 19:56
Propose a better terminology, I need one. Remember, English is *not* my mother language. — Basile Starynkevitch, Aug 05 '19 at 19:56
'How do you call the cognitive entity which fills a web login form. For me (not a native English speaker), it is a "user".' One person can have many user ids, as you have noted. Your picture and names seem to describe people. — JimmyJames, Aug 05 '19 at 19:57
Please suggest a drawing for a cognitive entity which, 99.9% of the time, is a human being. I need a better idea. (if possible, from http://openclipart.org) — Basile Starynkevitch, Aug 05 '19 at 19:58
Instead of "In that figure, in *some* weird cases, Alice, Bob, and the left-side *static analysis expert* (that is in practice me, Basile), could be using three different tabs on the same Firefox browser (on Linux)" I would say something like "In that figure, in *some* weird cases, Alice, Bob, and the left-side *static analysis expert* (that is in practice me, Basile), could be working concurrently on the same model from different workstations." — JimmyJames, Aug 05 '19 at 20:00
You just edited a completely different section of the question which I think was fine. I feel like you aren't understanding which part is confusing. — JimmyJames, Aug 05 '19 at 20:09
But I wrote in English what I really meant. Is that not enough? — Basile Starynkevitch, Aug 05 '19 at 20:24
Writing what you mean is definitely enough. But there's the other part where you have written something I don't think you mean. Why are you leaving that alone and changing things that were already OK? — JimmyJames, Aug 05 '19 at 20:27
Open the edit mode and search for the text "could be using three different tabs on the same Firefox browser (on Linux)". That's the problem section. The section you keep changing was always OK. — JimmyJames, Aug 05 '19 at 20:29
Is that better now? I think I should use the verb "impersonate" but I am not even sure it is valid English? — Basile Starynkevitch, Aug 05 '19 at 20:33
Apparently "to impersonate" might mean what I believe it means (*se faire passer pour quelqu'un d'autre* in French). Again English is a foreign language to me — Basile Starynkevitch, Aug 05 '19 at 20:35
OK I guess. I'm not sure why that's important to point out. Once you solve the distributed multi-user problem, those kinds of things are simple. But if that's what you mean, sure. The picture doesn't really match though. — JimmyJames, Aug 05 '19 at 20:36
But you still did not gave an answer to "how to copy structured content between different tabs" — Basile Starynkevitch, Aug 05 '19 at 20:36
You are blocked from that because of security concerns. I gave you an alternate solution that is far simpler and works even if your users are on opposite sides of the planet. The model is hosted at the server, right? Why is it so crucial that the clipboard be the conduit between models? Why not manage that where the model actually lives? — JimmyJames, Aug 05 '19 at 20:40
I think it's a bad idea to try to subvert the security model. Even if you succeed, it's only a matter of time before whatever loophole you found gets closed. — JimmyJames, Aug 05 '19 at 20:43
So what do you suggest? The two different Bismon servers could of course communicate directly (e.g. by TCP/IP sockets with JSONRPC), but how do I establish the connection. Two different browser tabs still need to exchange *some* data (e.g. some hostname + port number + perhaps TLS cryptokey or some JSONRPC authentication-related data). How do I do that? AFAIK EWMH permits the equivalent thing — Basile Starynkevitch, Aug 05 '19 at 20:46
In tab `A` you have `https://bismonA/model1` open. In tab `B` you have `https://bismonB/model2` open. You select an element in `A` and copy `http://bismonA/model1/e5fd67` (a node reference) and paste that text into your `B` tab under an existing node. That link is then sent to `https://bismonB/model2` along with the node you want to attach it to. *BismonB* calls (using any number of HTTP libs) *BismonA* and retrieves the fragment (and validates it.) Then the fragment is incorporated into *model2*. The browser tab `B` gets a response that provides some data or notification of the change. — JimmyJames, Aug 05 '19 at 20:50

i336_ · Answer 1 · 2019-08-01T09:08:32.653

It took a couple of readthroughs to verify I had a good understanding of where you're coming from and what you're trying to do, and I think I get it.

It sounds like you're in the middle of the foundational design phases of a large-scale practical research project, and are building out this system's base framework.

It is very cool to see the amount of progress and success that has been made thus far. I am incidentally still learning how to be successful with the "sketch out a framework that will be big enough to scale" approach with my own ideas, but I do not have much experience yet so the resulting designs have poor coherence and terrible "oh I didn't think of that" handling :), in the meantime I make do with a combination of "just ship something" and "expect to rewrite it later".

But anyway, reading through what you describe, I perceive two overlapping, mutually exclusive fundamental goals:

You concretely (and contractually) "just" want to support Firefox on Linux
I suspect your abstract aim is ultimately to build a futureproof framework that would, ideally, work somewhat outside that scope

In the interests of acknowledging all underlying sentiment, I would like to address both ideas concurrently. My recommendations are all basically offsite links/pointers that you may well already have explored, but here we go anyway.

So, first of all, you ask for the conceptual model of the clipboard. If I domain-resolve "conceptual" I get "Web specification of the clipboard", and a quick Google of "w3c clipboard" finds that over at https://www.w3.org/TR/clipboard-apis/. Three minutes' scanning rapidly found the read() and write() sections, the multi-part data section and associated permissions discussion, and also the interesting "transcoding images" section; with everything else perhaps an hour or so's reading (or maybe less).

What is so easy to miss in that document is the Working Draft just under the title (well, I missed it initially): this isn't a standard yet, and browser support is (sadly) still all over the place: https://caniuse.com/#search=clipboard

That document also does not make clear the split personality of the Web clipboard APIs. There are technically (arguably) two.

Back in the time of the dinosaurs, Internet Explorer implemented a document.execCommand function that could cut, copy and paste... stuff. The function was ad-hoc and nonstandard, what stuff actually meant was poorly specified, and the whole kit and caboodle was handwavily copied by competing browser vendors with differences that introduced annoying glitches.

Web developers were left to the task of copying data out to the clipboard in a form that was usable (most things limited themselves to text), dealing with unscrambling bit-bombs of incoming pasted information, and smoothing over the glitches and differences. Most sites incidentally solved this by actually using Flash to handle the clipboard instead, that's how bad it was.

One of the pre-existing things that went straight into gray-area-ville when combined with clipboard manipulation was the ability to set the contentEditable property on a DOM element. This would basically put that element into a mode where you would see a text cursor and could type anything into the element (which could be anything; most people used a <div>), while at the same time the webpage could edit the HTML of the contenteditable element as well. The gray area part was the question of... what happens if you copy and paste from the clipboard into this contenteditable area? What happens then?

In the interests of interoperability, browser manufacturers decided to take the way things already worked - you could select part of a webpage, copy it, paste into a word processor and get something out - and make that process work the other way around too, so that pasting from the browser, word processor, etc back into a contentEditable area would also put something back in. Ideally the something was well-formed HTML, but the browser always made a valiant effort to shove whatever it had received into the document at the position the cursor was at, with often terrible results.

This whole area of rich text editing combined with copy/paste is basically 100% non-semantically-defined behavior. Sometimes things work; sometimes they do not work; sometimes things should technically/arguably work but go nuclear instead. You used to have have to break your use case down into tiny bits and test each bit individually; the amount of headscratching needed is much lower today than it was say 10 years ago, but there's still a lot of historical baggage that you have to wade through.

To improve things even further and fix the mess permanently, browsers are in the process (right now) of adopting new HTML5 clipboard APIs. Due to the fact that this is actually happening "at the moment", the UX model of the new way things work is still somewhat inconsistent between browsers with various things in various stages of un-implemented-ness, and you still need to use the old tricks for some situations. Here are some seemingly modern references that appear to have non-superseded, up-to-date information:

This helpfully discusses the old and new APIs together and presents a contrast between them: https://developer.mozilla.org/en-US/docs/Mozilla/Add-ons/WebExtensions/Interact_with_the_clipboard
This talks about the old execCommand API: https://developers.google.com/web/updates/2015/04/cut-and-copy-commands
This more recent document from last year talks about what Chrome has implemented in the new clipboard API: https://developers.google.com/web/updates/2018/03/clipboardapi

These recent improvements are why we've only just gotten to the point where, when I happened to have an image in my clipboard earlier and hit CTRL+V while typing this answer, the JavaScript behind the stackexchange answer box opened a little popup with a preview of my pasted image and an offer to upload it and paste it into my answer.

Having pasting of images "just work" is reasonably recent (past few/couple of years).

Here are a couple example references I found that have bitrotted a bit:

https://googlechrome.github.io/samples/async-clipboard/ reports various errors
https://alligator.io/js/async-clipboard-api/ incorrectly says only text can be copied to the clipboard (and as an aside the whole article is undated, a Web technology no-no!!!)

Your interest in this field comes at an interesting time.

Literally only two months ago (!!) Chrome landed support for copying images from the browser out to the clipboard through the new async clipboard API: https://bugs.chromium.org/p/chromium/issues/detail?id=150835.
Chrome is currently tracking copying random data of other types (not images or text) into the clipboard, but I wouldn't be surprised if this ends up being reserved to browser extensions: https://bugs.chromium.org/p/chromium/issues/detail?id=897289 (NB. I mention this link at the bottom of this answer)

If you have the attention span and the interest, it could be very interesting to open an issue (after perhaps reading through some existing issues to get an idea of style and approach) over at https://github.com/w3c/clipboard-apis/issues?utf8=%E2%9C%93&q=is%3Aissue+ and start a discussion about the very useful but complex topic of transferring domain-specific semantic information between webpages.

The bulk of the conversation that would ensue would probably very quickly swing toward evaluating the risk of invisible information ingress/egress; what you would have on your side as defense is the fact that, as best as I can tell, document.execCommand('copy') executed in a ContentEditable DOM node is specified (at https://www.w3.org/TR/clipboard-apis/#pasting-html) to, in certain situations, not "sanitize HTML", because existing systems already rely on basically what you are trying to semantically define.

I certainly would not expect you to pivot your specialization from building Bismon to driving the W3C clipboard effort for any length of time :) but weighing in as an application developer building a real-world system could be interesting, even as a drive-by once-off effort. Might not have any impact. Might. Could be worth exploring? What's kind of compelling is that things are half-implemented right now, and I think the exact bits you happen to depend on are what are still up in the air and in the process of being finalized. You might end up helping to improve the eventual implementation in all browsers.

This helps with future-proofness, both in the sense of making future browsers converge with the API environment your application would prefer to have available, and also by making browsers on other OS platforms more compatible with your system. Linux may be the contractual focus of your grant, but Windows compatibility never hurt anybody, and there will always be a small group of people who exponentially appreciate that your system works on Android tablets/phones too. The Web's focus is unilateral support of all platforms, so doing the work to canonicalize things for Linux would result in equal implementational support for other platforms too.

In theory.

If there's one thing the W3C is known for, it's that they're not fast. Your comments would be entering a "ponderation queue", as it were, and may emerge out the other end as implementational impact after some months. You're probably already well familiar with bureaucracy and Departments of Hesitancy, Um-ing And Uh-ing; this is the internet's one. :)

(I've also generally heard the IETF, or Internet Engineering Task Force, has a reputation for moving things forward a little more quickly than the W3C, but I can't find any references to IETF-related work on the clipboard API, so maybe I misunderstand this group's agenda and focus.)

With the above being said about the W3C, the clipboard API is a bit of a special case due to the fact that it's right now in the process of being implemented, so exceptionally good ideas/suggestions may well be picked up and run with by the browser vendors and shooed into the spec in the process.

In this case, the chromium bug ID talking about copying non-image/non-text data into the clipboard would be the right place to drive-by and talk about your application's focus. This bug seems to be in a similar "um-ing and uh-ing" stage though, so you'd again be leaving thoughts for consideration.

With all the above theory in mind, here are some more concrete and immediately-actionable ideas to get a shaky-but-functional clipboard model working sooner than next decade:

The first thing I actually thought could be interesting was joining #firefox on irc.mozilla.org and asking about clipboard support in there. It's possible you may be handed some of the URLs mentioned above if you just generally ask about the clipboard; asking specific information about implementation details such as mime-type handling, the copying of multiple data items, Firefox-specific implementation details, etc etc, might be more effective. Again, the clipboard API's in-progress nature means that discussions are likely to be responded to rather than not. (Chrome has #chromium on freenode, but it's a much less active channel. Last I was in #firefox, it was incredibly responsive.)
Another thing that could be useful is building an automated test framework (an external program) that uses the https://developer.mozilla.org/en-US/docs/Tools/Remote_Debugging protocol to connect to a copy of Firefox running in Xvfb/Xvnc, load test webpages, and then performs various automatic copy/paste operations. Of note is that you(r external program) would need to send keystrokes and/or clicks into the browser window. (The same could incidentally be done with Chromium too, using https://chromedevtools.github.io/devtools-protocol/.) Coupled with a system that downloads the latest beta/canary/prerelease versions of browsers, you could set-and-forget a regression testing system that "just works" in the background and beeps at you when browser changes break your flow. A lot of work to set up; you may decide the benefit is worth it.
A fairly heavyweight/brute-force option that is a bit ham-fisted but which would definitely work, and is supported by both desktop Firefox and Chromium but generally nothing else, is the concept of the "native messaging host". The TL;DR of this is that you create a browser extension with some extra bits in the extension manifest, copy some files into some specific places, and then the browser itself will launch a specific program as a subprocess, let JavaScript from the extension send JSON into the subprocess's standard input, and forward JSON data from the subprocess's stdout into events sent to JS from the extension. You could make a messaging host to handle the clipboard and incorporate that into a browser extension. However, the messaging host architecture was designed for corporate/embedded/industrial type scenarios; installing the messaging host requires setup manipulations outside of the browser that the browser itself cannot perform (copying files into the right places, and on Windows, editing the registry), so it's not a case of hosting an extension and saying "you must install this". This might be a good last resort.
You may be able to accept using a system that generally works but at the cost of a few aesthetic warts that are inherent to the methodology being used. I have an idea that is very likely to work in most browsers (including Firefox).

After you've serialized the data you wanted to export into a text or binary format of your choosing, you could create an HTML5 canvas and then loop through the serialized data, assigning 3 bytes at a time to the R, G, B colors of each pixel in the canvas. You could then export the canvas image to a PNG, and push this PNG out to the clipboard when the user executes a Copy.

Later, when someone pastes an image in, you'll be passed a PNG image, you push that into a canvas, read all the pixels and extract the R, G, B values, and hopefully, you'll get your data back.

Where things get interesting with this scheme is that, well, a) you've just pushed an image out to the clipboard, b) you're both the exporter and importer of said images, so they can be in any format you like. So there's nothing stopping you from putting the binary data junk in one corner of the image, and rendering something like the graph you got from Wikipedia in the majority of the image. You could then do some computations to identify the corner of the image, extract just that corner out into a Canvas, and go from there.

The only thing I would 100% recommend (in the sense of a "do not use this method unless you add this bit or you will be sad") is implementing a hash function like SHA-1 to add a checksum into the serialized form of the data. Then on unserialize you'd check the hash and throw an error if it doesn't match. This would remove some level of stress from your input validation functions, although if you don't sanitize the unserialized data your app would still be vulnerable to malicious users (who can craft bad inputs with correct SHA-1s).

Also - the HTML5 canvas object uses an alpha channel internally, but when you load a PNG into an Image object and then pull that Image into a Canvas, the alpha channel values are premultiplied into the RGB values - you don't get to access the alpha channel directly. So you must ensure generated images have an Alpha value of 100%. I forget how to do this (last fiddled with HTML5 canvas a while back) but I do recall it being very easy.

The "transcoding images" section of the W3C clipboard draft jumped out at me for related reasons - I wondered what sort of impacts transcoding might have on the recoverability of data. Generally I think things should be okay, because browsers seem to be implementing PNG support with respect to losslessness and pixel-perfection. In fact some poking around in the Chromium source code shows that this file seems to be implementation of "... all images are re-encoded for security purposes." referenced in bug 150835 (the link I added the NB to above): https://chromium.googlesource.com/chromium/src/+/4a2ce1efa2e3c1146ab146312acd5c1c9c02809b/third_party/blink/renderer/modules/clipboard/clipboard_writer.cc
It may be useful to find/play with discussions and demos relating to copying images into/out of the clipboard. Here are some starting points I found (both of these use the old execCommand API, because it's what works the best):
- https://stackoverflow.com/questions/15253468/get-pasted-image-from-clipboard-firefox
- https://codepen.io/netsi1964/pen/IoJbg
On a completely different tack, an idea that comes from a totally different angle: virtualize the clipboard within your webapp. Do you just want to copy+paste between pages of your application, with no need to access the data outside of it? Then implement a virtual clipboard. (I'd use different shortcut keys, since this virtual clipboard would not be accessing the universal OS clipboard, CTRL+C/V etc are generally known as "copy so I can paste elsewhere", and you'd be violating the principal of least surprise). This would have the potentially-interesting side effect of having the user clipboard be persistent, integrated with the user's account, and accessible across devices.
Depending on how much of a flow-slowdown it might impose, you could set up an export/import mechanism for this virtual clipboard that is based on file drag-and-drop.
It may be useful to go back to basics and accept having a serialization format that is text-based. This text format could be anything, from a giant unwieldy block of JSON, to something aesthetically similar to a GPG encrypted message with a clear header and footer and easy copy/paste semantics, etc etc. The big headline feature of this method is that you would be able to dump the text into a textbox and let the user copy it out themselves - as a last-resort backup for situations the browser is being weird. This last-resort should be very deeply buried, since 0.1% of users would honestly ever need it.
Perhaps a combination of approaches could be used - for example, a virtual clipboard (presented as an easy-access drawer type of thing) for the easiest integration, an "export as image" option with a corner of the image containing binary data, etc, and an "export as text" option.

One last thing - when Bismon is actually capable of doing static analysis and presenting interesting results, I'd definitely support the idea of posting it to Hacker News (https://news.ycombinator.com/). This forum which you may already be familar with is a good place to surface up-and-coming systems that are in early stages but are capable of "doing something". People sometimes post research systems in early phases of development, but it's (IMHO) best to post things when they are capable of something new and unique since this gives the best chance of going viral.

I am retired in 3 years. I have no more abstract aims, and those I have are not Bismon-related. During retirement, I will work on another research project that I have started working on during the weekend, and that project is on gitlab — Basile Starynkevitch, Aug 02 '19 at 03:28
Sadly my H2020 contractual obligation (in CHARIOT) is just to write reports. And I am wise enough to know that at my retirement, all of Bismon will go to `/dev/null`. The political reasons for that are explained in my French home webpage — Basile Starynkevitch, Aug 02 '19 at 03:37
Sorry for delay (didn't realize/expect over a week had passed), finally found the tab (and window...) this was in again. I had a look over Bismon, and (some of) the paper describing it. Would I be correct in interpreting it as an effort to take the [Code Property Graph](https://conferences.oreilly.com/strata/strata-ca-2018/public/schedule/detail/63880) idea, extend that to SSA, and make the result graph-queryable? Cooool. (I can completely relate with the frustration of designing such a system "from the reports up", as opposed to "from the data up"; it looks like a fun engineering problem.) — i336_, Aug 10 '19 at 15:19
(...Although an engineering problem completely over my head :) ). `refpersys` also seems very fascinating, and the ideas in it (or at least what I am able to discern about it) seem to mirror a few ideas I've had about whole-system design over the years as well. The caveat emptor I've found with such "globally-coherent" architectures (that so frequently gravitate toward normalizing and regulating the behavior and interactions of arbitrary, so-subsumed sub-components) is that the mental models associated with building such systems seem to be fundamentally/utterly/mutually incompatible with ... — i336_, Aug 10 '19 at 15:49
... the mental models associated with developing "locally-coherent" architectures (that "shrink-wrap" the scope of the methods used to connect a problem to its solution, to only what is required for the immediate problem and the immediate solution). Limiting the coherence requirements of the scope of a problem/solution domain seems to allow it to fit within some kind of subconscious mental threshold that permits our attention spans to model the whole problem space, and then realize a solving method that is coherent/fully integrated. (Of course, the method itself is not the solution, it ... — i336_, Aug 10 '19 at 15:49
... connects to the solution.) I am uncertain if it is possible to reason about globally-coherent architectures as closed-ended systems due to the sadly tiny limitations of attention span. "Ideally-coherent" systems that define methods that cover all permutations of the problem and solution specifications that will exist (without going a milimeter futher) seem to be the product of from-scratch rewrites, which is a bit of a hack. — i336_, Aug 10 '19 at 15:49
Apologies if the above made no sense; still developing these ideas and also decent articulation skills. Finally, one last thing that I thought of while reading your webpage - I wonder what your interest is in new/unusual hardware architectures, such as [this one](https://news.ycombinator.com/item?id=13741975) that I happened to learn about (as it hit the ground, which was kinda cool) a few years ago (I'm i336_ in that thread - maybe ignore the wall of text comment :) ). — i336_, Aug 10 '19 at 15:56

score 2 · Answer 2 · answered Jul 26 '19 at 06:46

You can use the copy and paste events which are provided on all DOM elements.

Below you can see an example where the page has two boxes. If you select any part of the top box and copy it (Using Ctrl+C or via context menu) the first event is fired and a special format "text/ast" is set in the clipboard data.

Pasting it into the lower box will set the innerText of the box to the content of the "text/ast" format. If the format isn't set the data variable is an empty string.

<html>
<head>
    <title>Copy/Paste Prototype</title>
    <script type="text/javascript">
        window.addEventListener("load", function (evt) {
            let source;
            source = document.querySelector("div.source");
            source.addEventListener('copy', function (evt) {
                console.log("copying");
                // Generate text format of AST
                evt.clipboardData.setData("text/ast", "(while (> x 0) ...)");
                evt.preventDefault();
            });


            let target;
            target = document.querySelector("div.target");
            target.addEventListener('paste', function (evt) { 
                console.log("pasting");
                let data;
                data = evt.clipboardData.getData("text/ast");
                evt.preventDefault();
                // Parse text format back into AST
                console.log(data);
                target.innerText = data;
            });
        });
    </script>
    <style>
        .source, .target {
            border: 1px solid black;
            width: 500px;
            height: 200px;
        }
    </style>
</head>
<body>
<div class="source">
    Hello World!
</div>
<div class="target">

</div>
</body>
</html>

The evt.preventDefault() in the copy-Event isn't strictly necessary, it will only prevent the default formats from being set.

Additionally you can use document.getSelection() to find out which elements are currently selected.

Does that work between two *tabs* of the same browser (HTTP served by two different Linux processes)? How exactly? Please improve your answer for that use-case! — Basile Starynkevitch, Jul 29 '19 at 13:37
I knew all you explained. But **what about copy/paste between *different* tabs and *different* web servers?** AFAIK, for security reasons, this might be difficult. — Basile Starynkevitch, Jul 30 '19 at 13:33
I just even more improved my question. Please take into account my latest improvement in your (hopefully improved) answer — Basile Starynkevitch, Jul 30 '19 at 13:38
After further testing I see where your problem is. I only tested on a single server (or rather same domain), where copy/paste works between tabs. But the data from another domain cannot be accessed. Unfortunately I don't think you can work around that behavior of the browser. — Alexander Hausmann, Aug 01 '19 at 09:38

JimmyJames · Accepted Answer · 2019-08-05T21:08:25.280

1

I'm not sure if this completely meets your needs but I think one thing that could be fairly simple to implement here is to give each part of your model its own URI. That is, every node of the tree would have reachable via a unique endpoint. You could derive the naming from the tree or use e.g. a hash. It doesn't really matter as long as these are uniquely identifiable.

Once that is in place, the trick would be to make it such that when a node is selected in the webpage, you insert the address of the node in the clipboard. When you paste, a path to such a model fragment would be expected and could then be added to the current model. It's unclear whether sharing fragments across models is desirable. But to avoid linking models, you would need to give the pasted fragment a new id when the target model is saved.

This is similar in some ways to the 'virtual clipboard' in i336_'s answer but I don't think this would violate the principle of surprise to use the actual clipboard for the paths. You could paste the links into the address line and have a reasonable behavior.

Here's an example scenario: In tab A you have https://bismonA/model1 open. In tab B you have https://bismonB/model2 open. You select an element in A and copy http://bismonA/model1/e5fd67 (a node reference) and paste that (plain) text into your B tab under an existing node. That link is then sent using a PUT or POST to https://bismonB/model2/da3b4a where da3b4a is the node you want to attach it to. BismonB calls (using any number of HTTP libs) BismonA and retrieves the fragment at that resource path and validates it. If it's OK, the fragment is incorporated into model2 on BismonB. The call ends with a response from BismonB with either the new model state or notification that it has been changed and needs to be retrieved.

When you throw multiple users into the mix, you need to have some way to avoid users stepping on each other's changes such as optimistic locking.

edited Aug 05 '19 at 21:08

answered Aug 05 '19 at 14:07

JimmyJames

24,682
2
50
92

Then my question becomes: how are URI in ` – Basile Starynkevitch Aug 05 '19 at 14:49
1

I'm not super strong with the details of the clipboard and I'm uncertain how the browser can limit you from copying from one domain to another (I presume this when you copy rich content e.g. DOM.) So, what I am proposing is you only use copy-paste of text. You could simply put the URI as selectable text on the UI and have a button (+) that opens a little text box for putting in the URI. But if you are able, you might be able to do some slight-of-hand with `ctrl-c`/`ctrl-v`. I would go this way because it covers all the cases with all the different combinations of tabs/browsers/clients/servers. – JimmyJames Aug 05 '19 at 14:56
Basically if the user pastes something in that's not a URI to a valid fragment, you ignore it, perhaps with an error message. Otherwise you retrieve the fragment from the URI and incorporate it into the model. Whether that URI was copied from another tab or from a messaging app or typed in wouldn't matter. You are just pasting in a 'pointer'. – JimmyJames Aug 05 '19 at 15:00
Are you really sure of that last sentence? I thought that security considerations (a malware site should not get your credit card credentials by cross-tab copy/pasting) are forbidding that. I am really not sure at all. – Basile Starynkevitch Aug 05 '19 at 15:01
I use Ubuntu at home and regularly copy text from one tab in FF or Chromium and paste it into others that are running pages from different domains. Usually I'm selecting text to paste into a search engine. I've never noticed any impediment to this. So, I'm pretty sure just because I do this all the time. – JimmyJames Aug 05 '19 at 15:24
But are you copy/pasting rich HTML text from one tab to another? If the destination tab is getting all the URL of an ` – Basile Starynkevitch Aug 05 '19 at 15:24
No. That's the point, URIs are plain text. – JimmyJames Aug 05 '19 at 15:25
But I need to copy/paste HTML DOM elements (rich text, not plain text). Is that forbidden ? – Basile Starynkevitch Aug 05 '19 at 15:26
I think you are missing the point of the solution i am proposing. You paste a plain text URI (free of any DOM content) into your UI. Your URI attempts to resolve the URI and if it's a valid fragement, you retrieve it from the server. There might be a CORS problem to solve depending on how you build your URIs. – JimmyJames Aug 05 '19 at 15:29
Yes. What is CORS exactly? I guess it is the cross-site copying issue I am thinking of. – Basile Starynkevitch Aug 05 '19 at 15:30
[CORS](https://developer.mozilla.org/en-US/docs/Web/HTTP/CORS) is a way to control how pages are allowed to access content from domains other than the origin of a page. I'm not sure if it's relevant to copying. I hadn't considered that. I guess there could be policies about that. In this context, you might need to retrieve a resource (the fragment) from a different domain. There are ways around that if you want to go down this route. – JimmyJames Aug 05 '19 at 15:34
Any references to the ways around? – Basile Starynkevitch Aug 05 '19 at 15:36
The short answer is that you would proxy requests from a server in one domain to another. But I think you are still thinking about rich content. I don't have an answer for that. The nice thing about using URIs is that you could 'copy' parts of the tree to another user's browser session on a computer in a completely remote location. – JimmyJames Aug 05 '19 at 15:40

practical copy/paste clipboard model of recent web HTML5 browsers?

question

To summarize my question :

3 Answers3

Linked