Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ref: Add HTTPReader #406

Closed
wants to merge 13 commits into from
Closed

ref: Add HTTPReader #406

wants to merge 13 commits into from

Conversation

tkaemming
Copy link
Contributor

No description provided.

@fpacifici
Copy link
Contributor

remark about error conditions: when we send an invalid query to clickhouse, it will reply with HTTP 500 with the http interface. We will have to properly parse the response in these cases as well otherwise those error messages, that now are visible in sentry, will be lost

Copy link
Contributor Author

@tkaemming tkaemming left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is "ready for review" in the sense that this is ready for some though about the questions noted inline.

snuba/reader.py Outdated
def __init__(
self, host: str, port: int, options: Optional[Mapping[str, str]] = None
):
self.__base_url = f"http://{host}:{port}/"
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Python 3. 🙌

snuba/reader.py Outdated
response = requests.post(
urljoin(self.__base_url, "?" + urlencode(parameters)),
data=query.encode("utf-8"),
)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could probably use straight urllib3, but I used requests here for consistency with the HTTPBatchWriter.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not anymore @tkaemming.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💀

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Used 97b36c7 as the pattern for this change. Probably need to take a closer look at request headers.

snuba/util.py Outdated
@@ -444,7 +444,11 @@ def raw_query(body, sql, client, timer, stats=None):
query_settings['max_threads'] = 1

try:
result = NativeDriverReader(client).execute(
result = HTTPReader(
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We'll probably need a way to find a way to switch these out at runtime, since this is a bit more touchy than the writer change. The only way that I've thought of testing this (short of Matt doing some wacky Kubernetes stuff) is to (random?) sample in requests based on dynamic configurations. Looking for additional ideas here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doing something like serviceDelegator and run both for a small number of queries? That will provide you a more solid validation in a shorter time. The fact reader is a class on its own helps:

class DelegatingReader:
     def __init__(self, native, http):
         ...
 
     def execute ():
          if random < settings.EXPERIMENT_SAMPLE:
              native.execute
              http.execute 
              compare and log discrepancy 
              return native result if discrepancy is too big
         else:
              if settings.PROD_READER = 'native:
                  return native.execute
              else:
                  return http.execute

You can add a fallback to native in case of error on http so we do not take down Snuba.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One issue with this is that we don't run uwsgi with threads enabled. I don't specifically recall why (#312) but I think we had to turn them off for some reason?

If we just want to spot check the results, we could possibly try running some sample of queries that were logged to Kafka with a consumer that runs them with both readers? This would also be outside of the request/response cycle, which I think is a benefit in this case?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We do run with threads now. We explicitly do it in Kubernetes config, and a part of the mywsgi changes we’re making that default. It’s silly to run with the GIL disabled.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well then.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Considering we are already using the HTTP interface for writes I would try to keep this experiment simple. I can think about two possible issues we are trying to protect against:

  • results are incorrect. For this we need to run both queries and compare, so either we run both queries within the api call or outside by running the queries logged into Kafka. If we are already logging into Kafka, why not writing a quick consumer that runs both queries and compare and try to consume that topic? That seems quicker than introducing a DelegatingReader in Snuba
  • performance are dismal. For this we could (I don;'t know how much of a pain it is with our Kube deployment) do canary instead ? Deploy the api with HTTP reader on just a few node and monitor those?
  • HTTP reader makes Snuba unstable. See "performance are dismal". If we could do canary we would not need a DelegatingReader for this as well.

What is concerning you the most?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not super worried about performance or stability, since I think that would show up pretty quickly in the metrics or in Sentry and a rollback would be pretty straightforward. My biggest concern is subtle inconsistencies in the data returned between the two drivers that causes weird behavior that isn't easily identified by any specific metrics close to the source of the problem.

I think the only drawback of running a canary for performance and stability sanity checking would be the quantity of work required to enable that, which I can't speak to. Maybe @mattrobenolt could provide some context here?

If we are already logging into Kafka, why not writing a quick consumer that runs both queries and compare and try to consume that topic? That seems quicker than introducing a DelegatingReader in Snuba

Yeah, I agree this seems like the lowest impact way to do that kind of comparison right now.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It wouldn't be hard if it's behind some environment variable that can be toggled. I can then just run 1 or more Pods with this on or whatever.

@tkaemming tkaemming requested a review from a team August 9, 2019 22:52
Copy link
Contributor

@fpacifici fpacifici left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you planning to also remove all the readonly accesses to ClickhousePool and replace them with the HTTPReader ?

Copy link
Contributor Author

@tkaemming tkaemming left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you planning to also remove all the readonly accesses to ClickhousePool and replace them with the HTTPReader ?

Eventually, yes — not in a huge hurry to get to this but it should be done at some point for consistency, I think.

snuba/reader.py Outdated

# TODO: Distinguish between ClickHouse errors and other HTTP errors.
if response.status // 100 != 2:
raise HTTPError(f"{response.status} Unexpected")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should include the error message in the response payload before this goes to production otherwise we would not be able to see Clickhouse errors in prod.
Also not having the clickhouse error would make things very hard to troubleshoot if something goes wrong when you roll this out.

snuba/util.py Outdated
@@ -444,7 +444,11 @@ def raw_query(body, sql, client, timer, stats=None):
query_settings['max_threads'] = 1

try:
result = NativeDriverReader(client).execute(
result = HTTPReader(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doing something like serviceDelegator and run both for a small number of queries? That will provide you a more solid validation in a shorter time. The fact reader is a class on its own helps:

class DelegatingReader:
     def __init__(self, native, http):
         ...
 
     def execute ():
          if random < settings.EXPERIMENT_SAMPLE:
              native.execute
              http.execute 
              compare and log discrepancy 
              return native result if discrepancy is too big
         else:
              if settings.PROD_READER = 'native:
                  return native.execute
              else:
                  return http.execute

You can add a fallback to native in case of error on http so we do not take down Snuba.

Copy link
Contributor

@fpacifici fpacifici left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the only blocker is the rollout plan.

def execute(
self,
query: str,
settings: Optional[Mapping[str, str]] = None,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason why writer does not take settings per query (in the write method) while reader does? If not I would add them in both for consistency in the API.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think there was a need for it at the call site, I'm fine with adding it though.

body=query.encode("utf-8"),
)

handle_clickhouse_response(response)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit on naming: since this method either does nothing to the response or throws, why not following a naming convention like raise_for_status method. We could rename this to something like "assert_successful_response" that implies it will throw.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense, will come up with something better here.

handle_clickhouse_response(response)

result = json.loads(response.data.decode("utf-8"))
del result["statistics"]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are removing these just not to expose Clickhouse internals, is that correct ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sort of, yeah — this is just to make the result structure consistent between both implementations (the native driver doesn't contain these fields.)

I think some of them might be helpful to use internally later (and can be fetched through other APIs with the native driver) but I want to avoid changing the return type too much here with this change. I also think we should probably try to avoid exposing them externally if possible.

snuba/util.py Outdated
@@ -444,7 +444,11 @@ def raw_query(body, sql, client, timer, stats=None):
query_settings['max_threads'] = 1

try:
result = NativeDriverReader(client).execute(
result = HTTPReader(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Considering we are already using the HTTP interface for writes I would try to keep this experiment simple. I can think about two possible issues we are trying to protect against:

  • results are incorrect. For this we need to run both queries and compare, so either we run both queries within the api call or outside by running the queries logged into Kafka. If we are already logging into Kafka, why not writing a quick consumer that runs both queries and compare and try to consume that topic? That seems quicker than introducing a DelegatingReader in Snuba
  • performance are dismal. For this we could (I don;'t know how much of a pain it is with our Kube deployment) do canary instead ? Deploy the api with HTTP reader on just a few node and monitor those?
  • HTTP reader makes Snuba unstable. See "performance are dismal". If we could do canary we would not need a DelegatingReader for this as well.

What is concerning you the most?

@@ -478,7 +478,7 @@ def raw_query(body, sql, client, timer, stats=None):
error = str(ex)
status = 500
logger.exception("Error running query: %s\n%s", sql, error)
if isinstance(ex, ClickHouseError):
if isinstance(ex, (NativeDriverClickHouseError, HTTPDriverClickHouseError)):
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is probably a smell that this point that the Reader doesn't raise a consistent error type.

settings.CLICKHOUSE_HOST,
settings.CLICKHOUSE_HTTP_PORT,
{'output_format_json_quote_64bit_integers': '0'},
)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Going to make this switchable, just haven't decided how yet.

This is also another thing that will require per-dataset configuration (since not every dataset is going to use the same host) but that's nothing new.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, the motivation to moving this to the module level is that the connection pool should probably only be instantiated once per process, doing this in raw_query probably would have created a lot of unnecessary connections.

@tkaemming
Copy link
Contributor Author

Also going to need to make sure this is set to use readonly before deploying: https://clickhouse.yandex/docs/en/operations/settings/permissions_for_queries/#settings_readonly

@tkaemming
Copy link
Contributor Author

This has been replaced by #819 which is a much cleaner implementation of the same ideas.

@tkaemming tkaemming closed this Mar 5, 2020
@tkaemming tkaemming deleted the http-reader branch March 5, 2020 19:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants