Allow multiple proof locations

yarmo · Mar 16, 2022

Some service providers can have users put their cryptographic fingerprint in different locations. For example, Mastodon users could put a link to their Keyoxide profile in the "Profile metadata" section, but could also simply mention it in their bio.

Currently, DOIP only checks a single location. Could it be an idea to allow the library to check multiple locations for the proofs? It shouldn't impact security, as all the potential locations MUST be controllable by and only by the user that wishes to verify their identity.

caesar · Mar 24, 2022

Could it be an idea to allow the library to check multiple locations for the proofs?

I think this would be very good. For example:

It would enable us to add support for uploaded PGP keys as claims on Git forges
It would potentially be relevant to unifying handling of Fediverse sites
It would allow claim locations to be changed in future, for example in response to changing platform APIs, without harming backwards compatibility.

I would be happy to be involved in implementing this and have a few ideas about how it might work, but I'll need to familiarise myself with the codebase a bit more first.

yarmo · Mar 25, 2022

Multiple issues and discussions are waiting on this to be implemented. I must look into this ASAP. I'll try to come up with a PoC next week.

In the meantime, all suggestions and ideas are welcome!

caesar · Mar 25, 2022

I'm not sure if you're looking for ideas on the spec or on the implementation (or both!). Without having thought about it a lot, I have a few ideas about each – but I've never contributed to a spec for this sort of thing before so I'm sure I'm overlooking a lot about how it works.

Spec ideas

I'm not sure what is needed beyond specifying that for every provider, more than one verification method can be specified. Ideally, as much implementation detail as possible should be left up to the implementation, so the spec should only contain what's necessary to (a) ensure proofs are valid and (b) ensure compatibility / idempotence between implementations.

Implementation ideas

Without any consideration to the current implementation, if I was coding this myself from scratch (in JS or TS), my first instinct would be something like this:

Each Provider registers one or more Methods (Maybe a reserved word, and at least confusing… think of a better name).
A Method has:
- a match method to test whether it matches the claim URI (or it could be a property which is a regex to check against the claim URI)
- a verify method which performs the actual verification.
Most Methods should be able to inherit from a few basic abstract classes without needing too much in the way of logic of their own.

To perform a verification (pseudocode):

let result = false;
let matchedMethods = methods.filter(method => method.match(claimUri));
if (matchedMethods.length === 1) {
  result = matchedMethods[0].verify(claimUri, fingerprint);
  // could be true (valid) or false (invalid) but is unambiguous
} else if (matchedMethods.length) {
  let verifiedMethods = matchedMethods.filter(method => method.verify(claimUri, fingerprint));
  if (verifiedMethods.length === 1) {
    result = verifiedMethods[0];
    // could be true (valid) or false (invalid) but is unambiguous
  } else if (verifiedMethods.length) {
    // more than one match was successfully validated
  } else {
    // more than one method matched the claim URI, but none of them could be validated
  }
} else {
 // no methods matched the claim URI
}

Of course, that leaves a lot of details out. Rather than a boolean, for example, the output of Method.match() will need to be an object with details about the match. Potentially multiple methods could match for the same provider, so each method will need to have a property which names the provider, and the results will have to be filtered on that too because multiple matches for one provider might not be a problem, whereas multiple matches for different providers is an issue that will need some consideration (but is hopefully unlikely).

I'm sure there's lots of things to think about that I haven't considered

yarmo · Mar 26, 2022

Thank you so much for all the effort put into your message caesar !

Spec ideas

Agreed with what you said, and let's use a basic implementation/PoC first to see if we stumble upon some unforeseen mistakes, learn from that PoC, write the spec and then improve the implementation.

Implementation ideas

Allow me to bridge your ideas (which I like very much) with the current implementation.

Current config for Mastodon (from doiprs, not doipjs; has neater configs IMO):

[about]
name = "Mastodon"
shortname = "mastodon"
homepage = "https://joinmastodon.org"

[profile]
display = "@{claim_uri_2}@{claim_uri_1}"
uri = "https://{claim_uri_1}/@{claim_uri_2}"

[claim]
uri = '^https://(.*)/@(.*)/?'
uri_is_ambiguous = true

[proof]

[proof.request]
uri = "https://{claim_uri_1}/@{claim_uri_2}"
protocol = "Http"
access_restrictions = "none"

[proof.response]
format = "json"

[proof.target]
format = "fingerprint"
relation = "contains"
path = ["attachment", "value"]

A way to transform this to comply with caesar ideas would be:

[about]
name = "Mastodon"
shortname = "mastodon"
homepage = "https://joinmastodon.org"

[profile]
display = "@{claim_uri_2}@{claim_uri_1}"
uri = "https://{claim_uri_1}/@{claim_uri_2}"

# The metadata method
[[methods]]

[methods.claim]
uri = '^https://(.*)/@(.*)/?'
uri_is_ambiguous = true

[methods.proof]

[methods.proof.request]
uri = "https://{claim_uri_1}/@{claim_uri_2}"
protocol = "Http"
access_restrictions = "none"

[methods.proof.response]
format = "json"

[methods.proof.target]
format = "fingerprint"
relation = "contains"
path = ["attachment", "value"]

# The bio method
[[methods]]

[methods.claim]
uri = '^https://(.*)/@(.*)/?'
uri_is_ambiguous = true

[methods.proof]

[methods.proof.request]
uri = "https://{claim_uri_1}/@{claim_uri_2}"
protocol = "Http"
access_restrictions = "none"

[methods.proof.response]
format = "json"

[methods.proof.target]
format = "fingerprint"
relation = "contains"
path = ["summary"]

It's elaborate but I don't think we should try and be clever and get this more concise. Each method needs to define their own regex claim matcher and detail how the proof can be located.

yarmo · Mar 26, 2022

One thing that would be neat would be if the implementation could somehow share HTTP requests between methods to avoid wasteful internet traffic. But maybe that's the implementation's responsibility, not the config's.

If the implementation caches the response for each HTTP request and then finds a different methods using the same parameters (namely URL), then just use the cached response.

caesar · Mar 28, 2022

Sorry not to have responded to this. I wanted to put more thought into it but simply haven't had time yet. But I do have a few thoughts to share.

First, I haven't looked at doip-rs at all, but in principle I think I like the idea of provider verification methods being primarily configuration-based rather than code-based. My first instinct is always to go with an OOP code-based solution because it's so much more flexible, but if enough flexibility can be achieved with a configuration-based approach, it's much neater (and potentially much DRY-er, since the same config can be shared between different implementations).

Second, I was wondering if from the spec side, we can do something like the way Microformats works. Basically, allow enough flexibility that people can implement their own extensions (providers that aren't officially supported), and have a central "registry" (effectively a living appendix to the spec) where "official" providers can be added over time.

Finally, combining the above two thoughts, the provider registry could actually contain the YAML/JSON/TOML/whatever config (obviously the same data can be represented in multiple formats, for easy use in different languages), so that implementations can use that "official" config data to implement checks in a consistent way.

caesar · Mar 28, 2022

A further thought (and I think you have mentioned a similar idea somewhere in the past): eventually, providers could choose to host their own config at a .well-known URL.

yarmo · Mar 28, 2022

Dude, I like the way you think, allow me to elaborate.

Sorry not to have responded to this. I wanted to put more thought into it but simply haven't had time yet

And I very much appreciate you taking your time and thinking things through!

My first instinct is always to go with an OOP code-based solution because it's so much more flexible, but if enough flexibility can be achieved with a configuration-based approach

I'm not sure if this counts, but inside the library I load the config and turn it into an object. Does this still count as OOP?

allow enough flexibility that people can implement their own extensions (providers that aren't officially supported), and have a central "registry" (effectively a living appendix to the spec) where "official" providers can be added over time

Already ahead of you, but I like your mind went to the same place. With @wiktor we imagined service providers uploading these configs files on their own server so that Keyoxide clients don't even need to match identifiers or look in the database of "official" providers, each providers tells the client how to verify an identity claim. I think this is more or less what you are thinking of? Ideally, IMO the DOIP library should have no configuration files built-in.

so that implementations can use that "official" config data to implement checks in a consistent way

This idea was previously proposed by @wiktor , where he imagined we could use the Ariadne Spec config files to build an automated library test suite.

Hmmm but I do now realize these two ideas clash: not wanting "official" config files and using config files to build implementation test suites. Maybe this is where your "registry" (aka Ariadne Spec?) comes into play. Here's an attempt to get a service provider into the Ariadne Spec: https://ariadne.id/ARC-0006 (Mastodon).

A further thought (and I think you have mentioned a similar idea somewhere in the past): eventually, providers could choose to host their own config at a .well-known URL

Yup see above!

caesar · Mar 29, 2022

yarmo Already ahead of you, but I like your mind went to the same place

Yep, I like that we've had a lot of the same thoughts on this! And I admit I haven't fully read all the ARCs etc so sorry if you're having to explain things to me that are already set out online.

yarmo With @wiktor we imagined service providers uploading these configs files on their own server so that Keyoxide clients don't even need to match identifiers or look in the database of "official" providers, each providers tells the client how to verify an identity claim. I think this is more or less what you are thinking of?

Definitely, published configs at a .well-known URL should be the endgame IMO. But we have to accept also that many sites (probably the biggest ones, unfortunately) will never choose to participate, and others won't do so for a long time. So I don't think I agree that "the DOIP library should have no configuration files built-in".

Or rather, they shouldn't be built into the library, but an "official registry" (appendix to the spec) should contain canonical configuration files that every implementation will use. Should any provider which is included in the registry choose to publish their own config, the registry version would be deprecated. And anyway, implementations should check the .well-known first for a published config before using a config from the registry.

I had another thought too: maybe provider-hosted configs should be signed by a key belonging to the provider? I don't know if that should be mandatory or optional but it would be a nice feature to support either way. One more level of security. Maybe configs in the registry could be signed by a trusted key too.

Finally, thinking about the Mastodon config: I still think it would be great if we could implement a generic config for all ActivityPub sites. Perhaps eventually by supporting other generic protocols too, broad support for much of the internet could be achieved without requiring sites to opt in or to be defined in the official registry. That would be brilliant.

But the claim URI regex for such sites would basically have to literally be ^https://(.*) so we do have to think about the most efficient way to do multiple tests at the same URI… Though maybe like you said above, that can be up to the implementation.