Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unsupported feature 'unsupported_requested_format:snowflake'. #227

Closed
coolaj86 opened this issue Apr 1, 2019 · 11 comments
Closed

Unsupported feature 'unsupported_requested_format:snowflake'. #227

coolaj86 opened this issue Apr 1, 2019 · 11 comments
Labels
jira copied to JIRA

Comments

@coolaj86
Copy link

coolaj86 commented Apr 1, 2019

Issue description

Unsupported feature 'unsupported_requested_format:snowflake'.

I'm trying to follow the PUT documentation and I'm getting the above error. I'm not clear on whether the error is claiming that the Go client doesn't support CSV or if there's something else amiss.

https://docs.snowflake.net/manuals/sql-reference/sql/put.html

Example code

  queries := []string{
    fmt.Sprintf(`ALTER TABLE %s SET stage_copy_options = (purge = true)`, id),
    `PUT file:///home/ubuntu/foo.csv @~`,
    fmt.Sprintf(`COPY INTO %s FROM '@~/foo.csv'`, id),
  }
  for i := range queries {
    fmt.Println(i)
    res, err := db.Exec(queries[i])
    if err != nil {
      return err
    }
    fmt.Println(res)
  }

Error log

0
{0xc0003e2780 {}}
1
{0xc0003e2780 {}}
2
panic: 000002 (0A000): Unsupported feature 'unsupported_requested_format:snowflake'.

Configuration

Driver version (or git SHA):

github.com/snowflakedb/gosnowflake v1.1.16-0.20190115205620-011e99b734d7

Go version: run go version in your console

go1.12

Client OS: Amazon Ubuntu Linux

@ChTimTsubasa
Copy link
Contributor

Hi @solderjs

Sorry for the delay. This error is thrown because golang driver doesn't support PUT for now.

I traced back through #223 and the only work around now is to use the external stage way that @howryu mentioned:

copy form an external s3/azure url, copy into t from 's3://<bucket_name>/'

We will put a higher priority on the development for PUT command support for Golang.

@ChTimTsubasa ChTimTsubasa added the jira copied to JIRA label Apr 9, 2019
@coolaj86
Copy link
Author

coolaj86 commented Apr 9, 2019

@ChTimTsubasa Thanks so much for getting back to me. What I did for now was buffer each stream to 25mb of validated csv, then shell out to snowsql to load the batch, and continue in a loop until the whole stream is done.

HTTPS

What would be absolutely best is if you would support a plain https GET. That would take nearly 0 effort on your part. All you have to do in your code is sidestep the azure / s3 API call and feed the http URL directly to the part where you would otherwise be getting the response back from s3 / azure. Ask one of your developers. I would be highly surprised if the code took more than 4 hours from first patch to fully tested - far less work that implementing PUT from start to finish (and far more efficient, faster, and less costly as it would allow direct streaming).

To be clear:

I just want to be able to give you a URL like this (with some option to specify a header as a bonus):

https://whatever.com/api/dowloads/file.csv?access_token=xyz123

And then receive a request like this:

GET /api/dowloads/file.csv?access_token=xyz123 HTTP/1.1
Host: whatever.com
Authorization: Bearer xyz123

And then send the data back to you - the exact same way you already handle it with s3 and azure. Almost no code require on your part - and far, far simpler than what you already have. Absolute cake to implement.

Azure Mocking

I tried just doing azure://mydomain.com/myapiroute/myfolder, but I wasn't able to format the XML response properly (even though I copied directly from the Microsoft documentation), so your server just kept hitting my server over and over and over again trying to get a response it would understand. If you could even give me a sample of what the XML response is supposed to look like I could build to that and do https by faking an azure server just for the two or three response types you need.

s3 mocking

I couldn't do any s3 mocks because it looks like you don't support the full protocol. It seems like you have the AWS urls hard-coded, not dynamic.

@ChTimTsubasa
Copy link
Contributor

@solderjs Let me figure out your requirement, and please correct me if I am wrong:

It seems that you have some cvs files on the https endpoints you hosted and want to load into snowflake.
You are trying/proposing two ways this could be potentially achieved:

  1. To have your endpoint fake a response with a format that Azure/S3 would respond with.
  2. To let snowflake support GET from a plain https endpoint.

@coolaj86
Copy link
Author

coolaj86 commented Apr 9, 2019

@ChTimTsubasa

Actually it's a csv stream - which is why using HTTPS is far better than PUT. From your point of view it's the same as if it were a file, but for me it means I just let your server know that I have it, and then when I see your request come in a moment later, I can pipe it straight to you through the network without ever having to store it to disk at all .

  1. "Faking" Azure/S3 would be a little more complex but, yes, if you gave the minimal requirements for what you need implemented on my side, I could do it. An example showing any of a Go, Mustache, or PHP style template (text littered with variables) and concrete text (all values filled out with "example.com" and "my-data-x.csv") would be PERFECT. I believe you only make two calls - directory listing and file fetch.
  2. YES! Having COPY support plain HTTPS (and a way to specify arbitrary headers) would be most excellent.

Thank you so much!

@ChTimTsubasa
Copy link
Contributor

ChTimTsubasa commented Apr 9, 2019

Can you give a rough estimation about the data generation rate for the csv stream?

@coolaj86
Copy link
Author

coolaj86 commented Apr 9, 2019

@ChTimTsubasa It's a streamed upload, so it's bounded by network speed.

Since I'm using Go the data will be piped in a way that applies back pressure. I won't upload it faster than we get it (obviously) and if we can't send it as fast as we can get, the TCP pipe will shorten the window, slowing the speed.

Most files are under 100mb and transmitted infrequently (a few times a week), so it won't matter too much. Probably less than 5% of the files are over 100mb and probably something in the range of 0.1% will approach a gigabyte. Assuming standard http deflate/gzip compression those sizes could be less than half that size.

@ChTimTsubasa
Copy link
Contributor

@solderjs Faking S3 response would be hard since we piggyback on the S3 SDK on the server side to implement this and there are tons of optimization for COPY (to make it way faster than INSERT) which make the faking way even harder.

For now I would say the best way is still using s3 uploading to external stage and then use copy command to load from stage to Snowflake if you insist using Golang driver.

Or if you can bear with other drivers, you can use ODBC(c/c++) or JDBC(java) as well, since those drivers support PUT command so that you don't have to create and maintain your own S3 bucket.

Snowflake is optimizing for batch processing which is the main goal for a data warehouse. We are trying to catching up with streaming data for ingestion but that is still under investigation now.

@coolaj86
Copy link
Author

@ChTimTsubasa I don't need anything from your end to support streaming. From your end I just need HTTPS for the ability to have you fetch a file - the same as is done with Azure and S3. That's "streaming enough".

The issue with S3 is more that I don't see anywhere in your API to specify the URL. It appears that it's hard-coded to use Amazon. There are many S3 clones available, like minio, which are built to the spec of the s3 SDK, so it shouldn't be to hard to use one of them, but they require the ability to specify the domain (which you already support with Azure).

If you could document what you expect from Azure, it looks like it would be trivial for me to reproduce. As far as I can tell it's just two requests - one to list the directory and one to fetch the file - but there must be some sort of option that you're expecting that I'm leaving out or not parameterizing correctly.

Using S3 or Azure gives me no benefit and is just extra work (on both ends).

My current work-around is to call out to the snowsql cli every 25mb and issue the COPY command there.

Plain-old, tried and true, dumb, stupid, simple, uninteresting, boring HTTPS is all I want.

@coolaj86
Copy link
Author

@ChTimTsubasa Could you expose AWS_ENDPOINT so ALL s3-compliant services work (minio s3, ibm s3, DigitalOcean, scaleway, etc)?

@smtakeda
Copy link
Contributor

@solderjs, according to COPY team, it appeared it is not easy add HTTP endpoint support. I'm forwarding the request to them anyway. Thanks for great input!

@sfc-gh-stakeda
Copy link
Contributor

A message from the data pipeline team.

We are looking at alternate means to support ingestion through APIs. You could also use current JDBC APIs for other stores.
Supporting a new store as a source of COPY is a very large undertaking well beyond http API compatibility for reading files. So realistically it is unlikely that we will take this up without a large number of customers adopting a new store other than the four we currently support across three clouds.

For further discussion, please reach out the support team to raise a ticket. Thanks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
jira copied to JIRA
Projects
None yet
Development

No branches or pull requests

4 participants