Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Define syntax #5

Closed
a-recknagel opened this issue Sep 16, 2019 · 9 comments
Closed

Define syntax #5

a-recknagel opened this issue Sep 16, 2019 · 9 comments
Labels
documentation Improvements or additions to documentation enhancement New feature or request question Further information is requested

Comments

@a-recknagel
Copy link
Owner

There should be a human readable syntax definition. This issue will serve as a place to collect requirements, ideas, and restrictions. It will be closed as soon as the rule set has been added to the documentation, which will serve as the source of truth for the implementation.

@a-recknagel a-recknagel added documentation Improvements or additions to documentation enhancement New feature or request labels Sep 16, 2019
@maxfischer2781
Copy link
Collaborator

maxfischer2781 commented Sep 16, 2019

I guess it is a good idea to divide that into separate topics, unless they clearly interact. Here are some - it is probably a good idea to support less if that means better syntax.

It might also be worth defining where stenotype syntax is applicable - e.g. tuple syntax (A, B, C) works well in annotations, but would be ambiguous when used as an alias.

@a-recknagel a-recknagel added the question Further information is requested label Sep 17, 2019
@a-recknagel
Copy link
Owner Author

a-recknagel commented Sep 17, 2019

Thanks for your input @maxfischer2781

it is probably a good idea to support less if that means better syntax

I fully agree, in particular I'd add that there is no strict need for completeness, since corner cases should be able to just fall back to regular type annotation.

Callables/functions

I agree on the general look.

Regarding async, I think its essential to have a good shorthand for it, since async functions are among the worst ones to annotate. There doesn't seem to be a common sign to identify async stuff yet, would it be reasonable to just introduce one? I see

  • new cryptic symbol: (A, B) -> ^R
  • just incorporate current standard: (A, B) -> awaitable R
  • something a little cuter/shorter: (A, B) -> idle R

Tuples

I personally like (A, B, (C, D)) -> R syntax, with (C, D) being a tuple. The product notation, while reasonable, feels counterintuitive to me.

Optionals

I'd like to support ?A, I think it's the most readable of the bunch.

Union

Thanks for gathering all the proposals floating around. My personal favorite is A|B, with A|None == ?A. It would be nice if all of (), [], {} could be used to refer to their literal counterpart, which would make them unavailable as options in a different context.

Co/Contravariance

I never encountered this, but am in no way opposed. The +T and -T in particular looks very readable and intuitive to me.

Structural types/protocols

I'm not familiar with structural types, it sounds a bit like a generic. Is it actually commonly used in python? Regarding (custom?) protocols, I think they might be something for a later point in time, unless you think that they are essential (with the obvious exception of iterable/generator functions, which would be (A, B) -> *R).

Literal types

Optimally, anything that is not a type and not a container would just be interpreted as a literal: ('a'|'b', 1|2) -> R. This would avoid the need to write a special interpretation of quoted content and the need to learn and understand it, plus all the messiness that may accompany it like escaping meta characters.

A caveat would be that in that case it is impossible to write a function that consumes/returns specific types, since it won't be possible to interpret it as a literal. But this to me falls under the kind of corner cases that should just be written in non-steno.

where stenotype syntax is applicable

any, many, or all of the following:

  • only in files after an import stenotype statement
  • everything that fails to be parsed as a valid type annotation, given a type annotation context
  • forced yes in a lines with a # steno: on comment
  • forced no in a lines with a # steno: off comment

In order to effectively work with this project, it is probably a good idea to provide a cli tool on install that consumes a string and prints the long type annotation, given that it is in a valid context, just to quickly test what is recognized as steno and what isn't:

$ stenotype "import typing; typing.List[int]"
import typing
List[int]
$ stenotype "(int|bool, ?str) -> ?'a'|'b'"
Callable[[Union[int, bool], Optional[str]], Optional[Literal['a', 'b']]]

@maxfischer2781
Copy link
Collaborator

Special Functions

One could go with how they are used in code. That looks pretty good for docs at least.

  • (A, B) -> await R for Callable[[A, B], Awaitable[R]], e.g. from
    async def (a, b): return r
  • (A, B) -> async for R for Callable[[A, B], AsyncIterable[R]], e.g. from
    async def (a, b): yield r
  • (A, B) -> for R for Callable[[A, B], Iterable[R]], e.g. from
    def (a, b): yield r
  • (A, B) -> with R

Not sure if the more complex ones make sense with nesting - e.g. async with async for R would be an AsyncContextManager[AsyncIterable[R]].

Union

I like both A | B and A or B, with a small bias towards the latter - using a familiar word instead of an obscure symbol seems more in line with Python to me. I don't see binary operators used much outside of numpy and friends, but boolean operators are quite common.

Structural types/protocols

Structural types are basically formalised duck-typing. I have not used them for typing yet, since typing.Protocol ships since 3.8. Not a pressing matter, definitely.

Literal types

Using plain literals is tricky because of conflicts. For example, 'A' may be a postponed type A. Since types are also values, being too clever here may mean that stenotype has to inspect the actual runtime object to figure out whether it is a type or value.

  • A fixed set of literals -- strings, bytes, numbers, True, False, ... -- could be enough to handle the cases where short notation is actually worth it.
  • Are postponed types needed in stenotype at all? stenotypes are likely not valid Python expressions (see (A, B) -> R) and could just always compile to postponed types. A and 'A' would then always be a type and literal string, respectively.

I really like the idea of having a CLI tool to test annotations. Do you think it would be feasible to invert this as well?

$ stenotype --shorten "Tuple[int, Tuple[float, float]]"
(int, (float, float))

@a-recknagel
Copy link
Owner Author

sphinxcontrib-trio

good find, always nice if other people have already sussed out details like "looking good". That said, I need to take a good look at all of them to get a feel for what exactly they do and how prevalent they might be. There is a point to introducing a fixed set of words like await, with, for, ... for things that modify the annotation without strictly changing types (or does an async function have a different return type? 🤔 ).

Not sure if the more complex ones make sense with nesting

Maybe footguns should be discouraged at some point, but that would be post v1.0

| or or

I prefer pipe for terseness, but I'm fine with or too. Let's assume to go with or unless other people voice strong preference towards pipe.

Literal types

I guess I should actually look into import hooks in order to have a qualified opinion, but my gut feeling is that relying on runtime evaluation is going to created a lot more problems than it can solve. I mean, we don't have a chance to even get an AST if we have a file with invalid python expressions, do we?

Regarding postponed types, as far as I can see quoting shouldn't be necessary for non-literals in stenotypes, as long as it may always postpone custom classes, e.g. x: 'Foo' or Bar would be x: Union[Literal['Foo'], 'Bar']. Finding out if it is necessary to postpone is hard, if blindly doing it is fine then we don't have that problem.

Maybe that's what you also mean, I wasn't 100% sure.

stenotype CLI

--shorten sounds like a good thing to have, even if it means that a dedicated parser needs to be written only to support it. Also, --strict, which would allow to check whether a "valid annotation context" exists, e.g. if ?bool: pass would not be interpreted as a type annotation, def foo(var: ?bool): pass would.

@maxfischer2781
Copy link
Collaborator

maxfischer2781 commented Sep 17, 2019

Regarding the parser: If we want to use Python's AST (which is likely) then type annotations must be syntactically valid expressions -- not necessarily valid at runtime. For example, int | None is a valid expression that produces a type-error.

Even with postponed annotations, annotations must be valid expressions:

Annotations need to be syntactically valid Python expressions, also when passed as literal strings

In other words, a: B is interpreted "as if" it were a: lambda: B, not a: 'B'. Which means that a: (A, B) -> R would not be valid.

I see two realistic choices:

  1. stenotypes are always quoted strings, e.g. a: '(A, B) -> R'
  2. stenotypes are valid Python expressions, e.g. a: (A, B) >> R

A custom parser is unfeasible, since it means every other tool would fail.

Personally, I think 1 is the way to go - there are some subtle but noticeable restrictions from using pure Python syntax. At a glance, most things should be possible with pure Python syntax, though.


You can test that the ast works exactly the same with and without postponed annotations:

from __future__ import annotations
import ast

compile("""a: foo""", '<string>', mode='exec', flags=annotations.compiler_flag | ast.PyCF_ONLY_AST)

@a-recknagel
Copy link
Owner Author

I agree on 1 feeling better.

I think we might have collected enough now to start a draft, i'll try to propose something by the end of the week.

a-recknagel added a commit that referenced this issue Sep 20, 2019
closes #7

Motivated by #5, I don't think this proposal is specific enough to close it just yet.

A lot of other stuff has also made it into this PR:

 -   small fixes to the workflow file, yaml is a way too permissive format
 -   pushing the docs to gh-pages on every PR merge
 -   adding a bunch of badges to the readme
 -   restoring the "documentation" section to TOOLING.rst
 -   changing the docs look to the flask theme, because I like it better than alabaster
@a-recknagel
Copy link
Owner Author

a-recknagel commented Sep 20, 2019

The current draft of the syntax file is located here and also gets served as part of the docs here. Contributions, comments, and critique welcome.

@maxfischer2781
Copy link
Collaborator

I did not think about Generics/TypeVars yet - good to see you have included them. Having TypeVars defined automatically would be neat.

We might want to have some reserved typevars (T, S, U, V, R), treat all upper-case one-letter names as TypeVars, or have some syntax to define them on the go (:T, T where isinstance(T, List), ...).

If a stenotype is always quoted, then forward refs should work automatically, I guess.

@a-recknagel
Copy link
Owner Author

I don't know if it's necessary, but I thought that it might be necessary to require nested quotes for forward references to make them work unambiguously:

foo: "['MyType']"  # list of MyType elements

Any other identifier could work too of course, in case nested quotes look too silly. The main gain would be that we wouldn't need to think about "What if a user calls their class 'await' or 'T'?" scenarios.

I must admit that while I added TypeVars and Generics, I never really worked with them in python. Since I don't know how their usage could be improved, I left their descriptions blank.

That being said, I think that we could just start prototyping a stenotype-to-annotations parser now. If we stumble over stuff that we forgot or doesn't work out nicely we can update the syntax definition to keep things well organized and clear.

Would it be alright with you if I close this issue now?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation enhancement New feature or request question Further information is requested
Projects
None yet
Development

No branches or pull requests

2 participants