Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sidecar support #62

Closed
bpatrik opened this issue May 11, 2021 · 9 comments
Closed

Sidecar support #62

bpatrik opened this issue May 11, 2021 · 9 comments

Comments

@bpatrik
Copy link

bpatrik commented May 11, 2021

Is there any plan to officially supporting sidecar files?

There are multiple type of sidecar files: XMP, exif, etc. These files live next to the original media file and should have the same format as the data in a jpg itself. It is a good way to attach metadata to e.g: videos.

Here is how exiftool deals with it: https://exiftool.org/metafiles.html
Here is an other lib for xmp sidecar reading that has some test cases: https://github.com/PhilipF5/xmp-sidecar/tree/master/test

I also generated some:
I used exiftool with:
exiftool.exe -xmp -b IMG_5910.jpg > IMG_5910.xmp
exiftool.exe -exif-b IMG_5910.jpg > IMG_5910.exif
The output:
IMG_5910.jpg
IMG_5910

IMG_5910.exif: IMG_5910.exif.txt
IMG_5910.xmp: IMG_5910.xmp.txt

I dropped the .exif to https://mutiny.cz/exifr/ and seems to work without an issue. .xmp unfortunately not.

Context of the feature request:
We are currently replacing exif processing (bpatrik/pigallery2#277) for pigallery2. Your nice project looks really promising: bpatrik/pigallery2#277 (comment). We would like a lib that can support sidecars too bpatrik/pigallery2#156. Unfortunately, our only option is exiftool currently: bpatrik/pigallery2#294 (comment)

@MikeKovarik
Copy link
Owner

Hello. I coincidentally wanted to implement this for a long time. It is both simple and difficult :D
Let me elaborate.

The parsers for XMP, ICC, etc... are already implemented and could be easily applied to any sidecar file. But...

Reading image files is easy because the format is deterministic. Each format starts with a signature (magic bytes). Then there's a structure of segments, each beginning with another signature - denoting what's wrapped inside that segment.

With sidecars, there's no such metadata I could use for determining the format and thus picking the correct parser. Sidecar just starts with the raw data right away. I can only try each parser, wrap it in a try/catch and wait until it fails to try another.
This is absolutely valid solution, but since exifr prides itself with performance, I wouldn't want to add this (in my eyes) inefficient code to parse().

I've been thinking and prototyping for the past two days and came up with function parseSidecar(file[, options][, formatName]). Or something similar. What do you think? Would this work for you?
It'd be just for now, until some better solution materializes. But let's be honest, there's nothing more permanent than a temporary solution :D

@kagahd
Copy link

kagahd commented May 13, 2021

@MikeKovarik, why not just rely on the file name extension, e.g. .xmp or .exif ?
I mean, if somebody creates a sidecar file with a wrong file extension, it's not your fault 😏

@bpatrik
Copy link
Author

bpatrik commented May 14, 2021

@MikeKovarik thank for the quick response!

From my perspective I would also prefer the best possible performance. (We put lately quite som effort to compare multiple exif readers: bpatrik/pigallery2#277 (comment) ).

Options I see here is:

  • Try/catch everything (not the fastest)
  • Can read the first few bites a guess the best reader (eg.: if it starts with < then its probably an xmp). There is a chance that you make a bad assumption as this approach would not be a standard way of inferring the data structure. -> this can cause confusion -> lot of github tickets :)
  • I also think, using extension is a good option. If the data is not parseable based on the extension it should fail. I would keep the format an input option, so users (developers) can force a parser. It would be fast and no magic involved.

You can also take a look at https://exiftool.org/metafiles.html and try to understand how that works on different extensions. The only thing is that exiftool is really slow. During that time it can try all possible parsers twice.

Update: please not that my uploaded files end in .txt only because github does not support xmp and exif extensions.

@MikeKovarik
Copy link
Owner

MikeKovarik commented May 14, 2021

Hello. I've been writing up proper response that I'll post later. In meantime. I gave shot to the "try infer from filename if type isn't explicitly defined" way with try/catch fallback. But since I'm on a train and about to leave, there's no time for chit chat. I wanted to at least give you the work in progress code if you'd like to take a look at it before I speak my peace :D. So far my weekend seems busy.

For now, you can take a look at
https://github.com/MikeKovarik/exifr/blob/master/debug/sidecar.mjs
and
20f36ae

@bpatrik
Copy link
Author

bpatrik commented May 15, 2021

I had a look.

I think (without testing) the majority of the time is spent on reading the file and not parsing the result.
So doing some try-catches should not have too high cost. I would do some quick benchmarks on some general photos. (generate all possible type of sidecars for like 100 photos with exiftool and parse them with and without the try-catch trick)

I would still call out (or just simple not support) the try-catch trick. Inferring the datatype has extra cost. I think user usually knows the data they want to feed your lib, so providing the type should be free for them.

Unrelated: I can recommend using .editorconfig and lint (my tslint) in your project. It made my code way more coherent and caught some bad practices, anti-patterns. (like use const if possible, etc.). It is easy to set up and modern IDE-s can auto format the code to match the linting expectations. (Please take it with a grain of salt, I'm not a web developer in my dayjob)

@bpatrik
Copy link
Author

bpatrik commented May 27, 2021

Is the xmp parser function public, so that could be used in the meantime for parsing xmp sidecars, while the exifr.sidecar gets to a release?

@MikeKovarik
Copy link
Owner

Hello again after a long time. I'm sorry for the delayed response. I'm was looking for a new job and the multi-step hiring process with coding homeworks was putting strain on me.

I'm not so fond of solely relying on file the extension because it isn't bulletproof.
Even though the .txt at the end of IMG_5910.xmp.txt was added github. Someone else can not knowingly download the file and run with it. Or there may not be any file extension at all. I know these are edge cases but ideally I'd like there to be as little friction as possible. And like you mentioned, issues can arise from edge cases and missing out overly complex docs.

That's why there's something i liked about try/catching. Though absolutely not ideal. Performance: it's not so bad anymore. Thanks to async/await syntax it has been reworked in V8 some two years ago. It now runs at the same speed as .then/.catch. But of course, there's always some time spent feeding the data into parser while waiting for it to crash.

But I'd like to keep it as a last resort. As is in the last commit i presented. You can specify the type explicitly, or it will be infered from the filename extension (though the condition on line 13 should also contain && segType === undefined). Then I'll be adding some rudimentary XMP check at line 22 - "does the first few bytes include <?x" or something like that. And if all else fails, then there will be the try/catch fallback.
TLDR: it can be very efficient and you can avoid guessing.

Regarding the last comment: it's in master but not yet published. I thought you'd want to clone master and play with it, but I didn't release it as it wasn't the final implementation.
If the proposal is fine by you then there's not much more work to be done and I could release it soon.

Regarding linting: thanks for the note. I'm thinking about it from time to time. Bit i have a mixed feelings about it from other projects. It can be pain to setup the way I'd like it to work :D perhaps some day

@MikeKovarik
Copy link
Owner

Hello @bpatrik . I forgot to mention that 7.1.0 with sidecar support was released last week.

@bpatrik
Copy link
Author

bpatrik commented Jul 7, 2021

That is excellent news! Thank you! (sorry I also got carried away a bit lately, I hope you get the position, that you were wishing for :) )

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants