-
Notifications
You must be signed in to change notification settings - Fork 22
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add file upload to gemini. #14
Comments
Hey @0wwafa I'll need to check if the API now accepts files that aren't part of the Google Account files, Gemini Vision Pro the last time I checked required to be signed in to Upload files which were not accepted just with the API required on this basic app. If files can be passed without OAuth 2.0 I can add that feature. I'll make sure to keep you posted. |
https://ai.google.dev/api/files
|
@0wwafa I just tested it but no luck Seems like it's not browser-based. I'll be testing the upcoming days with this other documentation: Update 2: But not able yet to get a compatible way to send the files. |
|
Nope @0wwafa we need to use the endpoint as the GoogleAIFileManager is not compatible with the browser. |
check the rest api. |
I tested it both in python both in nodejs both in shell with CURL and they all work. |
this also works:
|
and also this: |
hmm I see the problem.. when doing a POST to upload the file it seems there is a problem: No 'Access-Control-Allow-Origin' header is present on the requested resource. but that can be managed from the back-end with a small nodejs or python program... |
Yep.. it must be done in the back-end.. in nodejs: https://ai.google.dev/api/files#files_create_text-JAVASCRIPT
|
Yes, it runs correctly on NodeJS, but this UI is browser-based with a pure frontend that's why it's not working directly and requires a similar endpoint as the one I passed you but files won't be stored because Google doesn't like it that way. |
an alternative is the inlining:
the are many mime types accepted including: pdf, png, text, mp3, wmv, mp4 etc |
The real problem with the web api is that every time you prompt the model you are forced to send everything (all the history etc). |
Yes, that's a problem because all chats at least on this app are stored in LocalStorage to provide context to Gemini, even with this sometimes it reads the message and responds something incorrectly because it got lost reading all historical messages, so storing also files would be a huge memory problem we would be sending all files all the time, with text it's hard to get it full but with files using base64 will crash the app shortly. |
memory? gemini flash has 1M token context! and the base64 inlining works. subsequently (chatting) the image can be removed leaving its answers on the image.. this works perfectly also on aistudio. |
I just found out that it's even simpler!!! "parts":[{"text": "BASE64DATA"}] it automaticalyy analyze them!!
|
Awesome @0wwafa I'll test passing the base64 inside the content this weekend, I'll let you know how it goes! About memory, I was talking on the user side (browser) keeping the historical there. |
after a few more tests (I passed an image) It didn't work well. I don't know how it really wiorks on the back-end. that is the method aistudio uses. |
here is how to do it: async function fileToGenerativePart(file) {
const base64EncodedDataPromise = new Promise((resolve) => {
const reader = new FileReader();
reader.onloadend = () => resolve(reader.result.split(',')[1]);
reader.readAsDataURL(file);
});
return {
inlineData: { data: await base64EncodedDataPromise, mimeType: file.type },
};
} |
tested and working:
|
$ node anal2.js woman_art1.jpg
|
the only restriction is that the payload can be 20971520 bytes maximum. |
it seems to work with many more file types than the ones publicized :D $ node anal2.js ../spectrogram.html HTML Structure
JavaScript Functionality 1. Event Listener and Overlay Removal:
2.
Overall, this code effectively creates a basic real-time audio spectrogram visualization using the Web Audio API and canvas drawing. Improvements and Potential Features:
|
Great, thanks for sharing your finds! I'll let you know how my testing goes. |
it works! you can add it... only caveat: not all mime types are supported. and for code or text files it's better to put the code or text file in the message as it is than passing it as an inlined file. |
Note: for files who don't have a known mime type or that are unaccepted, just use their ascii representation. |
I tested it last night, and it mostly gets errors and sometimes a file reading. I'll send an update with a selector to choose between gemini-1.5-flash and gemini-pro for you to test it, and check if there's a problem on the API call according to what you tested. |
@0wwafa I've merged the changes, if the model is Flash you can select files. There are a few bugs I can fix later this weekend like not clearing the files after sending the prompt but you should be good to play with it and give feedback or suggest fixes for processing files. |
The rest api is tricky. |
It's great to know that you got the analyzer working! Let me know if you test the update I sent, if you want to include part of your analyzer to improve passing the base64 feel free to share the code or open a Pull Request |
I will publish my code when it will be "decent" :D |
Please add file upload (text, images, pdf, etc)
The text was updated successfully, but these errors were encountered: