Challenges & Approach Discussion

TODO: Reorder sections, write a proper introduction.

Connecting (partially) incompatible networks opens up questions.

How to address a user / audience on a remote platform?
How to present their content?
How do systems communicate compatibility support?
How can interactions (i.e. a poll or reactions) be supported?
How / in what schema is content formatted?
How does the API work?
How to enable secure communication?
Finding consensus & acceptance in the community

Addressing

To send a message, the addressed receiver needs to be uniquely identifiable and locatable (with a protocol).

Address Format

URIs

Pretty much any address can be wrapped in a URI (Uniform Resource Identifier). A URI is unique and the URI scheme can describe the protocol used. Using a well-established standard seems to be a good approach. Common URI examples:

https://mastodon.online/@laurin
matrix:u/laurin5:matrix.org (or https://matrix.to/#/@laurin5:matrix.org - see https://spec.matrix.org/v1.4/appendices/#matrix-uri-scheme)
tel:+1-201-555-0123
…

Not every URI is dereferenceable per se. A URI scheme like iconet:<dereferenceable HTTP address>[:<domain-specific identifier>] might be a way to go.

Dereferenceable Address part:

In this case, an actor could send the message together with the identifier to the dereferenceable address using the iconet protocol.
The dereferenceable address format needs to be understood by every iconet participant. Therefore, HTTP seems to be a decent choice.
The dereferenceable address is the endpoint the message is sent to.

Domain-specific Address part:

The domain-specific identifier is only required to be processed by the receiver.
The domain-specific identifier could be optional for cases where the dereferenceable address is able to uniquely identify the receiver.
The domain-specific identifier could also be passed as a parameter in the HTTP header.
The domain-specific identifier could also indicate the natively supported protocol. In that case, the sender application could use the domain-specific identifier to send the message without relying on the iconet protocol. This could be useful information to link same identities with different names.
The separator between dereferenceable address and domain-specific identifier should be a reserved URI character (e.g.: :, @; !, $).

The iconet: URI scheme would specify the protocol to be used, i.e. makes clear that this is an iconet address.

Non-URI Approach

Alternatively, an email-like address format (e.g. <domain-specific identifier>@<dereferenceable address>) could be used. The advantage being that the format is well-understood by users. However, this format does not specify the protocol used and might therefore be confused with an email address.

Transport Protocol

What protocol is used to transport messages?

HTTP seems to be the most established standard and is rather easy to comprehend. A lot of libraries exist. For example, a https POST endpoint could easily function as inbox.

Packet Schemas and Wrapping Packets

What information do iconet packets contain?
Is the iconet fallback information sent to the client on request or is it sent proactively?
How are iconet packets formatted?
Is the original packet wrapped in an iconet packet’s field or vice versa?
What if the client / platform does not want to support iconet packet schemas (or addressing) but would like to implement the presentation fallback anyways?

Thoughts on RDF

Using JSON-LD could have two advantages:

It barely causes overhead compared to plain JSON: It’s parseable by any JSON parser and in its simplest form it’s just the same as a regular JSON object but with an additional @context field.
It gives field names a unique identifier which is helpful in contexts where the native protocol uses the same field names.
A schema is easily extendable.

RDF increases alignment to protocols like Activity Pub.

Presentation

When to fall back to non-native presentation of content?
What format does the fallback presentation have?
How is fallback presentation generated - on the receiver side by using the sender’s platform-native message and transforming it in a receiver-understood target format?
What, if the client understands parts of the message (e.g. an attached vCard) but does not know how to present the whole information?

For presentation on the client side, we identify three approaches which are not mutually exclusive but combinable.

Using format interpreters that support logic and which take any kind of message to create a supported presentation (flexible but complex). This is the most flexible approach and where Iconet is heading towards.
Using a very basic common default format supported by all clients, e.g. plain text (which has to be sent along the original packet or on request). This could be a useful fallback for clients that do not supported HTML+Javascript.
Using content negotiation to find out about the intersection of supported formats between sender and receiver application. Disadvantage: what happens if there are no common formats? A more detailed elaboration of the approaches is given in the subsections.

A possible example procedure combining the approaches could look as follows:

The receiving client has received a packet that it cannot present to the user. The packet has a plain text presentation attached as well (as last resort).
The receiving client supports HTML and decides to fetch a format interpreter HTML document linked in the packet. Alternatively, the document could reside in the packet. (Optionally: When fetching the document, content negotiation could also result in a different kind of interpreter that, for certain protocols, can translate the packet into a format understood by the client.)
The format interpreter HTML document can be embedded as an iframe by the receiving client. The receiving client passes the packet to the format interpreter HTML document which renders a HTML representation.

Format Interpretation

The idea of providing an interpreter is the following: If the receiver application is not able to present the received packet in the sender application’s format, the interpreter can be used to translate the information that was not understood by the receiver application.

The interpreter can be thought of as a converter that takes the sender’s native packet and generates a markup that the client can render. To support interactive content, the interpreter could be an iframe taking data passed to it to display a widget.

Ideally, no sensitive data should be exposed to anyone but the receiver in that process. Therefore, the question arises as to how the interpreter information is transmitted to the receiver application. By request (and to whom?) if the packet is not understood or in advance by the sender application?

The advantage of using an interpreter is that the sender does not need to provide (another) target format. The transformer can be developed separately. The message does not need to be re-requested if the receiver application does not understand the format but only a transformer.

In some cases, certain parts of the foreign packet might be understood by the receiving client. In that case, the receiving client is in charge of deciding if to render the fallback and how handle the packet.

Potential Examples

An isolated iframe (with no access to external (internet) resources) receives the packet that’s not able to be presented by the receiver application’s native methods.

In the iframe, the interpreter can execute JavaScript to process the data and render a presentation for the user. Outgoing communication could be channeled through an interface controlled by the client, that acts as a proxy and filters requests to external resources.

Default Formats

Alternatively, instead of providing an interpreter, the message sent could be formatted in a very generic way understood by any client (e.g. plain text) or multiple formats. See this matrix MSC as a source of inspiration for example. The MSC proposes to append a markup for a message in multiple formats, each with a mimetype attached. The client is supposed to present the first format supported.

"m.markup": [
  {"body": "<h1>Hello there!</pre>", "mimetype": "text/html"},
  {"body": "Hello there!",           "mimetype": "text/plain"}
]

Content Negotiation

In the previous section, there were multiple formats of the same message were sent in one packet. In some cases, it might be useful to perform content negotiation.

The client or the open inbox could ask to receive the message in a preferred format that both support. Downside: The inbox might not know in advance which formats the client fetching the messages supports.

A different form of content negotiation could be used when the receiving client requests a format interpreter. The endpoint that provides the format interpreters could support content negotiation for different target formats (that would not even need to convert to a markup but a client-native packet format).

tl;dr

Options:

The Iconet packet contains the sender-native packet.
- The iconet packet contains a link to an endpoint providing interpreters (e.g. iframes) that the client can use to convert and render the packet. Using content negotiation, different target formats could be requested. Caching and security considerations should be considered.
- Instead of providing an endpoint for the interpreter, the needed data could be transmitted in the iconet packet as well but reducing caching options.
- The interpreters’s code should be provided with a checksum to reduce attack vectors.
The iconet packet contains multiple content formats of which at least one should be supported by the receiving application (probably plaintext).
Both of the above

Content Interaction

How are interactions (e.g. a like or a comment) made available / communicated back?
Should iframes be allowed to communicate back themselves (via their own requests) or should they request their parent window to communicate?
Is an iconet (interaction) packet processed by the client or the inbox server?
Can interactions be authorized and how?
Use cases:
- Use Case: Bob can like Alice’s post, but Claire (to whom the post was forwarded), can’t Can authorization to access formats be delegated to receivers?
- (How) can messages be forwarded? Can formats handle ‘forwarding’ internally (not part of the standard)?
- Should we aim for standard auth mechanisms (e.g. OAuth)?
- Use Case?: A shared document is sent to Bob and a preview is rendered but Claire, to whom the post was forwarded, should not see the preview. (How) can access be granted or requested?
- A public post that is sent to Bob in an iconet packet is supposed be be commentable by Bob only.

Modes of Communication & Security

A container in the receivers application is used to wrap the interpreter and its packet presentation.

Possible proposal: Multiple trust levels

container is completely isolated
container may communicate via its parent host in a limited fashion
container may communicate by itself

The user or the client is in charge of allowing to elevate a message’s trust level.

Embedded (non-native) content MUST NOT be allowed to alter their parents’ state directly. Communication between the parent application and the embedded content must be supervised.

Here, we illustrate three concepts on how interactions or updates can reach the client.

Option 1) would support neither method but only allow trusted iconet iframes to connect to sources using the available javascript interfaces, e.g. fetch. Thus, no further specification on the iconet side would be necessary. See Sending Interaction Packets for the related discussion.
Option 2) would allow actors to send updates (e.g. a message was edited) for packets they previously sent to the inbox. This method can be thought of as push-like.
Option 3) would allow clients to poll for updates at the source, if needed. This method can be though of as pull-like.

5.1. Option 1: Iconet Iframe Self-Governing

In analogy to the previous section, the iframe connects to a whitelisted source by itself and polls for updates. The process is thus not iconet-specific.

5.2. Option 2: Push-Based

If updates to a packet (i.e. someone commented on a post) arise, the embedding application receives an interaction packet / update to a packet. The embedding application passes the payload over to the iconet iframe.

Note that if transport is not handled by a common iconet protocol, the schema and format may vary. The payload however must not be altered during transport This is in analogy to the regular packets.

{
    "@context": "https://iconet-foundation.org/ns#",
    "@type": ["Packet", "Update"],
    "@id": "<id of packet>",
    "refersTo": "<id of original packet>",
    "actor": "<sender>",
    "to": ["<addressee>"],
    "content": ["<... content data array, same as in the regular packets>"]
}

(5.3. Option 3: Pull-Based)

WIP-Level: 4

A request to ask for updates could look as follows. Note that if transport is not handled by a common iconet protocol, the the schema and format may vary.

{
    "@context": "https://iconet-foundation.org/ns#",
    "@type": "UpdateInquiry",
    "originalId": "<id of original message>",
    "sender": "<sender address of this packet>",
    "to": ["<address of the original packets sender>"]
}

The response would be a regular iconet packet with an additional updateTo field that contains the original packet’s identifier. Alternatively, the packet could maintain the same id and the packet could have a timestamp and an update count.

One option would be that the client invalidates the old message from thereon. Alternatively, the client could pass the payload of the response to the iconet iframe, using the established message channel between client and iconet iframe.

A response to an update inquiry could look as follows. If status NoChange is set, the packet does not need to have type Packet and the corresponding payload field.

{
    "@context": "https://iconet-foundation.org/ns#",
    "@type": ["Packet", "Update"],
    "@id": "<id of packet, if packet has update>",
    "predecessor": "<id of original message>",
    "status": "<either Update or NoChange.>",
    "sender": "<sender address of this packet>",
    "to": ["<address of the original packets sender>"],
    "payload": "<native data>"
}

Authentication

The authenticity of an actor could be established if communication is channeled through the parent host and the parent host’s authenticity to the original sender is given. In the end, that leads to a situation where trust needs to be established between platforms and users, where the respective platform ensures the authenticity of senders and receivers.

A separate, dedicated mechanism seems out of scope for a basic protocol. Even the whole web community hasn’t agreed on a single working interoperable spec that fulfills everyone’s use case.

Packet Transport

WIP-Level: 4

How are packets / how is information communicated across platform / protocol borders?

To transfer information across platforms, the platforms must either speak the same protocol or a so-called bridge needs to translate between them.
For communication between parties of the same protocol, the information sent must only be extended with information on how to present the information in a fallback case.

For communication between parties on different protocols, the information format, schema, authentication methods, API endpoints, transmission protocol, etc. might not align. This section takes focus on how to overcome those obstacles.

Approach 1: Bridging

There are different types of bridges to transfer packets across platforms. This matrix post discusses different types of bridges for the interested reader.

We see a couple of questions arising here:

Where are bridges hosted?
- Case 1: Every individual user hosts its own bridge.
  Bridges are a risk for privacy, since traffic has to be decrypted before it can be bridged (and encrypted again). This issue can be avoided, when users host their own bridges on a trusted device for example on the client. Also, bridging remains easier since the bridge only needs to act as the user on the remote network (puppet bridge). Downside: Every user will have to host an own bridge.
- Case 2: Hosting a bridge per room or server.
  In this case, users don’t need individual bridges. However, all users will have to trust the bridge not to abuse its power to read and manipulate users messages. Additionally, many platforms do not support bridges that operate this way. In the worst-case a bridge bot that joined a room will post a message that describes the sender by appending its name to the message body.
How many bridges have to be developed?
Do the all platforms have to develop a bridge between each other platform or should the platforms agree to one or a small set of protocols to which all are able to bridge to? In the latter case, communicating across protocol borders would involve two bridges: One from the senders protocol to the common bridged intermediary protocol and from that protocol to the receivers protocol. When taking the latter case, this would involve agreeing on the protocol to use for bridging. What requirements does such an intermediary protocol need to address?
(How) can we discover and reach out to users of different protocols?
Ideally, a user is able to find contacts on different platforms (e.g. by phone number or email address). Therefore, not only a cross-platform user index needs to be present but also an endpoint must be clear which the protocol can target to transfer a message across to a different protocol.

Approach 2: All implement the same protocol

This approach is similar to agreeing on a common intermediary protocol to which all protocols can bridge but with a different software architectural approach. In this case, all platforms implement a common (second) API that is able to send and receive messages in the common protocol.

Downside: Politically, it is extremely hard to find consensus on the protocol to use. Technical limitations covering every platforms’ needs are likely to arise. This would probably only work if law makers agree to something. The good news: The EU Digital Markets Act will force the big players to provide interoperability in the long run. See a detailed discussion of the challenges here.

Secure Communication

End to end encryption is awesome. But how do we get that right when there’s so much to get wrong? Defining and implementing an E2EE spec is hard. Agreeing on an existing one too (maybe a bit less though).

Then, in many cases this means upgrading or downgrading on security-related concerns. This opens significant challenges, from a legal, safety, and technical perspective.