mirror of
https://github.com/run-llama/sec-insights.git
synced 2026-07-01 20:24:03 -04:00
Deploying backend to Cloud #4
Reference in New Issue
Block a user
Delete Branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Originally created by @swamichandra on GitHub (Sep 7, 2023).
Not a defect, but an ask. The backend components are tightly coupled together and assume everything runs locally. Any attempt to allow for deploying the backend to AWS or GCP or Azure and allowing for respective cloud native services to be used?
@sourabhdesai commented on GitHub (Sep 8, 2023):
Hi @swamichandra , thanks for your question!
While the codebase is setup to run well locally, we've also made efforts to make cloud deployment easy for cloud providers we've chosen.
Render.com is another cloud platform similar to AWS, GCP, etc. but easier to use in many ways.
We actually have all of our backend infrastructure components defined through infrastructure-as-code in the
render.yamlblueprint file. Render has documentation here for connecting to therender.yamlfile within your repo for deploying to their platform. You would also need to setup the Environment Groups in Render according to the way they're defined in therender.yamltemplate here. There are also some additional components to setup in terms of the two AWS S3 buckets and the Vercel deployment of the frontend. The YouTube tutorial video has a section on the system design that may be more informational.If you'd like, we should be able to do a more in-depth write-up of how to go about cloud deployment. Let us know if that would be useful for you!
@swamichandra commented on GitHub (Sep 9, 2023):
@sourabhdesai Thanks for the response and pointing me towards Render. I created all the env groups. llama-app-db and llama-app-cron are up. But the llama-app-backend service would not deploy and come up. See the following error where it fails. What does the DB not up to date mean?
@sourabhdesai commented on GitHub (Sep 9, 2023):
Hi @swamichandra , great to see your trying to setup the deployment for this!
Some context: The backend codebase manages database migrations with Alembic. This helps keep the database tables up-to-date with the latest ORM table definitions in the backend codebase (as defined here).
When trying to run this on Render.com, their service automatically sets the
RENDERenvironment variable totruefor you (see render docs on default env vars). This gets checked inmain.pyhere and if its set to true (as in, it's actually running on Render.com and not locally), it will run the alembic migrations for you.For some reason, it seems like the migrations are not being run on service startup for your deployment. I'd take a look at the following:
alembic/versionshere.RENDERenvironment variable yourself in any of the environment groups or service-level environment variables@sourabhdesai commented on GitHub (Sep 9, 2023):
You may also be able to get it started by locally running
alembic upgrade headagainst the Render provided database. Just make sure to copy the database URL from the DB's webpage on Render and set it as theDATABASE_URLenvironment variable value locally before running it.At least that way, you'd be able to locally see what the error is and perhaps iterate towards a solution/workaround faster.
Once the DB migrations have been applied to your DB on Render, your service should start just fine
@swamichandra commented on GitHub (Sep 9, 2023):
@sourabhdesai The RENDER environment variable was the issue. I had to remove it from the env groups. The container now comes up fine.
Now to the next challenge and issue. When I run the the download_sec_pdf.py it only downloads the data locally. While the documentation says the docs would be uploaded to the S3 bucket. I don’t see it uploading anything to in the S3 bucket nor see any in the script that uploads to the bucket. Then I tried to run seed_db.py. Errors out when trying to upload to S3.
Any other way to load the data into the bucket and the DB?
@sourabhdesai commented on GitHub (Sep 9, 2023):
@swamichandra that's great! Glad you got that part working.
So for loading the data, you will be able to use the cron job service that has also been deployed on Render. The cron job is defined in the
render.yamlfile here.As you can see, all it does is it runs
make seed_db_based_on_envwhich is defined in the backend's Makefile here.The seed script that runs is really all you need to load in the SEC filings data into the service. That will pull the PDFs from SEC's Edgar API, upload them into S3, upsert references to them into the DB, and embed them if they haven't already been embedded. Of course, you will first need to have manually created a public S3 asset bucket that your AWS credentials have read/write access to.
Once you've got the S3 bucket and the Cron job setup, you should be able to manually trigger a run of your Cron job and have it do all the data loading for you. You can monitor the run on the Render dashboard.
@swamichandra commented on GitHub (Sep 10, 2023):
@sourabhdesai thanks for being super responsive. I ran the cron and now see the files in the S3 bucket. But the cron threw the following errors. The frontend does not show any options for the year or company name. Looks like the db is still not configured properly.
@sourabhdesai commented on GitHub (Sep 10, 2023):
@swamichandra great to see the progress you've made!
Right, so you may want to update the
CDN_BASE_URLvalues in therender.yamlhere to the public URL for your bucket.Alternatively, you may also hook up Cloudfront to your S3 bucket and use the URL for the Cloudfront distribution for
CDN_BASE_URLinstead.@swamichandra commented on GitHub (Sep 10, 2023):
@sourabhdesai Thanks again. Very close. It was the S3 bucket policy I had to update to allow public access. Now the cron works and I’m seeing it successfully.
I have the front end also provisioned on Vercel and it comes up. I doubled checked and the NEXT_PUBLIC_BACKEND_URL points to the backend url on Render. But I don’t see the company names getting pulled up in the UI when I type. Wondering if I’m missing a step.
@sourabhdesai commented on GitHub (Sep 10, 2023):
Hmm can you check if the
/api/document/endpoint on your render deployed backend is showing any documents? If not there may have been an issue loading the documents into thedocumentstable of your database 🤔 might be worth trying to run the seed script locally against your render DB and usingpdbto step through the part of the script where it loads rows into the DB.@swamichandra commented on GitHub (Sep 10, 2023):
I see this (small snippet).
[{"id":"3aae3da2-6969-4fa2-adfb-cf0173afc62e","created_at":"2023-09-10T03:57:06.398480","updated_at":"2023-09-10T03:57:06.398480","url":"https://d687lz8k56fia.cloudfront.net/sec-edgar-filings/0000078003/10-Q/0000078003-23-000088/filing-details.pdf","metadata_map":{"sec_document":{"cik":"0000078003","year":2023,"quarter":2,"doc_type":"10-Q","company_name":"Pfizer Inc.","company_ticker":"PFE","accession_number":"0000078003-23-000088","filed_as_of_date":"2023-08-09T00:00:00","date_as_of_change":"2023-08-09T00:00:00","period_of_report_date":"2023-07-02T00:00:00"}}},{"id":"38b8f00b-5ea3-4514-8a5f-26a10d50bf7f","created_at":"2023-09-10T03:57:06.384585","updated_at":"2023-09-10T03:57:06.384585","url":"https://d687lz8k56fia.cloudfront.net/sec-edgar-filings/0000078003/10-K/0000078003-21-000038/filing-details.pdf","metadata_map":{"sec_document":{"cik":"0000078003","year":2020,"doc_type":"10-K","company_name":"Pfizer Inc.","company_ticker":"PFE","accession_number":"0000078003-21-000038","filed_as_of_date":"2021-02-25T00:00:00","date_as_of_change":"2021-02-25T00:00:00","period_of_report_date":"2020-12-31T00:00:00"}}},{"id":"df97d890-85d6-4179-846c-4bcff8e8c35e","created_at":"2023-09-10T03:57:06.412024","updated_at":"2023-09-10T03:57:06.412024","url":"https://d687lz8k56fia.cloudfront.net/sec-edgar-filings/0001633917/10-K/0001633917-21-000018/filing-details.pdf","metadata_map":{"sec_document":{"cik":"00016@sourabhdesai commented on GitHub (Sep 10, 2023):
Hmmm so it seems your DB has the documents loaded in.
It may just be that you need to add your Vercel frontend URL to the CORS access list in this part of the
render.yaml. Once you've added your Vercel URL to that list your frontend should be able to make the requests. Hopefully that does the trick!Thanks for going through this process! This definitely has given me some insight on what I should include if I were to write up a deployment guide for the project 🙂
Also FYI, looking at your snippet, it seems like document URLs loaded into your DB are still referencing our project's Cloudfront CDN url. So the PDFs are actually being loaded from our CDN distribution instead of your S3 bucket 😅 may want to update the
CDN_BASE_URLin yourrender.yamland re-run the cron job to avoid confusion as to where your PDF docs are being loaded from. You may need to truncate your documents table in the DB before doing this to avoid duplication of these documents in that table.@swamichandra commented on GitHub (Sep 10, 2023):
Is there a way or a script I can run to reinitialize the DB from scratch? Drop the db incl all the tables and recreate them? I cannot truncate the Documents table. Get a referential integrity error.
@swamichandra commented on GitHub (Sep 10, 2023):
@sourabhdesai Should the CORS value be singular? If I provide a list like what you have I get a deployment failure with a
“error parsing env bar BACKEND_CORS_ORIGINS”When I provide the value below for CORS it fails. Build works only for a single value:
'["http://localhost", "http://localhost:8000", "http://localhost:3000", "http://127.0.0.1:3000", "https://xxx.onrender.com", "https://swami-companygpt.vercel.app/", "http://secinsights.ai", "http://www.secinsights.ai", "https://secinsights.ai", "https://www.secinsights.ai"]'
Still I have no luck in having the vercel frontend pull up the data from the database. I still see this CORS issue in my browser,
Access to fetch at 'https://xxx.onrender.com/api/document/' from origin 'https://xxx.vercel.app' has been blocked by CORS policy: No 'Access-Control-Allow-Origin' header is present on the requested resource. If an opaque response serves your needs, set the request's mode to 'no-cors' to fetch the resource with CORS disabled.@sourabhdesai commented on GitHub (Sep 11, 2023):
Yes! You should be able to run
alembic downgrade baseagainst your DB to downgrade the DB migrations back to the starting point when there were no tables (see alembic docs on this here).@sourabhdesai commented on GitHub (Sep 11, 2023):
Hmm it should be able to take in a List 🤔 totally speculating but try removing the trailing
/from"https://swami-companygpt.vercel.app/"?I've also seen that sometimes the browser caching can get in the way of CORS policy updates. After updating your CORS allow list, may be worth clearing your site-level browser cache as a sanity check.
@swamichandra commented on GitHub (Sep 11, 2023):
@sourabhdesai Removed the trailing slash. Still getting the error parsing env var "BACKEND_CORS_ORIGINS" error.
BACKEND_CORS_ORIGINS : '["https://llama-app-backend-7jtv.onrender.com", "https://swami-companygpt.vercel.app", "https://llama-app-backend-7jtv.onrender.com/api/document"]'
@sourabhdesai commented on GitHub (Sep 12, 2023):
@swamichandra Hm I wasn't able to replicate this on my end. I have set that line in my
.envtoBACKEND_CORS_ORIGINS='["https://llama-app-backend-7jtv.onrender.com", "https://swami-companygpt.vercel.app", "https://llama-app-backend-7jtv.onrender.com/api/document"]'and its not giving me any error when trying to parse the en variable. Were you able to figure this last part out?@swamichandra commented on GitHub (Sep 14, 2023):
@sourabhdesai I’m going to redo all on Render and reprovision. I’m not able to point out why the list of CORS value throws an error.
@swamichandra commented on GitHub (Sep 15, 2023):
@sourabhdesai Good news. I deleted all the services in Render and recreated them. Including the AWS S3 buckets. Reran the cron to repopulate the documents. Backend comes up fine.
One major thing I did was to remove the single quotes (‘) at the start and end of the BACKEND_CORS_ORIGINS list. Now the backend starts without any errors.
The error I’m facing now is I get a “ Failed to load PDF file” error message. I see in the browser dev tools the
037f8f29-cfe2-4cbb-bec9-abc2c037ca45:1 Access to fetch at 'xxx/sec-edgar-filings/0001018724/10-K/0001018724-23-000004/filing-details.pdf' from origin 'xxx' has been blocked by CORS policy: No 'Access-Control-Allow-Origin' header is present on the requested resource. If an opaque response serves your needs, set the request's mode to 'no-cors' to fetch the resource with CORS disabled.
I added to BACKEND_CORS_ORIGINS the S3 URL. What could I be missing?
@jigneshsolanki commented on GitHub (Sep 16, 2023):
@swamichandra you have to give bucket cors in localstack.
awslocal s3api put-bucket-cors --bucket ${S3_ASSET_BUCKET_NAME} --cors-configuration file://./localstack-cors-config.jsonfor more you can check doc of localstack: https://docs.localstack.cloud/user-guide/aws/s3/#configuring-cross-origin-resource-sharing-on-s3
@swamichandra commented on GitHub (Sep 16, 2023):
@jigneshsolanki I’m running the backend on Render and not localstack.
@sourabhdesai commented on GitHub (Sep 16, 2023):
@swamichandra Most likely what you're seeing is that your frontend is trying to access the server that is hosting your PDF files (whether that's the S3 bucket configured in website mode or through a Cloufront CDN) but that service doesn't have CORS configured to allow access to your frontend.
Depending on how you're serving your PDFs to the frontend, either of the above articles may be helpful.
@thunderwilson commented on GitHub (Sep 18, 2023):
Very informative thread. I'm sure I'm not alone in saying I would love for a full deployment tutorial! Great repo!
@sourabhdesai commented on GitHub (Oct 1, 2023):
Closing as there seems to not be any further questions