Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Single-file decoder script can now (optionally) create an encoder #2065

Merged
merged 10 commits into from
Apr 7, 2020

Conversation

cwoffenden
Copy link
Contributor

To complement the single-file decoder new scripts were added to create an amalgamated single-file of all the Zstd source, along with examples and (simple) tests.

Tested with Xcode/Clang, Visual Studio and Emscripten. Some notes and observations:

  • This keeps zstd.h as-is and amalgamates the rest of the code (common, compress and decompress). It's possible to in-line this header file too but for a library it doesn't feel like the best approach.
  • Dropping the resulting files into a small project Clang was taking slightly longer (in release builds, which I believe was the LTO, but didn't investigate much because on a larger project it didn't make a difference to multi-CPU compilation). Debug builds wasn't noticeable.
  • The resulting binary is smaller than with the many-file version. Around 3% smaller on both Mac and Windows. 3% isn't a lot but it's 3% for free!
  • The optimised Wasm build of roundtrip.c (compressing then decompressing data) is an amazing 237kB. This needs further investigation (DCE and LTO are probably doing their thing but it's worth a second look).

As few changes as possible were made to the Zstd source. It's probably worth though:

  • A better solution for the CHECK_F macro, probably moving it to the shared fse.h (or error_private.h).
  • Either moving the single library to its own contrib folder or renaming the decoder project. It's really easy to make single-file decoder, encoder and full codec now, so the instructions and demos could be expanded to show this.

To complement the single-file decoder a new script was added to create an amalgamated single-file of all of the Zstd source, along with examples and (simple) tests.
@Cyan4973
Copy link
Contributor

Cyan4973 commented Apr 3, 2020

Thanks @cwoffenden for yet another great contribution !

I only have a few minor comments :

  • zstd.h : I presume it would be better to make this file a build artifact rather than a member of the repository, meaning it should be copied from /lib when invoking the build script. Otherwise, a risk is that, over time, the version of zstd.h within contrib/ will drift compared to actual source code in lib/.

  • Should the project directory keep its current name single_file_decoder ?

@cwoffenden
Copy link
Contributor Author

I'll tidy the two review points (probably) on Monday.

zstd.h was only being copied to perform the compilation tests and not committed, but you're right in that the various files should be up-to-date artefacts. I'll take a look how this is currently done.

I think it's worth renaming, then I'll also add and document the choice of compressor, decompressor and both.

`CHECK_F` macro moved to `error_private.h` (shared between `fse_compress.c` and `fse_decompress.c`). `ZSTD_limitCopy()` moved to `zstd_internal.h` (shared between `zstd_compress.c` and `zstd_decompress.c`). Erroneous build artefact `zstd.h` removed from repo.
`CHECK_F` is now in `error_private.h`. Minor tidy.
@cwoffenden
Copy link
Contributor Author

Directory renamed and suggestions implemented.

Undefing XXH_* macros allows the `.c` to build standalone without clashes. Removing `xxhash.c` and only including the header is the correct usage (according to `XXH_PRIVATE_API`).
@cwoffenden
Copy link
Contributor Author

I think I'd call this finished for now. At some point I will:

  • Wrap the scripts with a Makefile (being able to clean would be a bonus)
  • Document options for reducing the binary size (e.g. the roundtrip encode/decode example when compiled with -Oz -g0 -flto is 332kB on Mac and 232kB for Wasm, and that's with a 32kB embedded blob for testing)
  • Test and document the difference (performance, size and compile time) from using a single-file implementation

@Cyan4973
Copy link
Contributor

Cyan4973 commented Apr 7, 2020

Thanks @cwoffenden ! This is excellent.
I look forward to the follow up PR.

@Cyan4973 Cyan4973 merged commit dde98d8 into facebook:dev Apr 7, 2020
@cwoffenden cwoffenden deleted the single-file-lib branch April 7, 2020 17:42
@Cyan4973 Cyan4973 mentioned this pull request May 21, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants