Ahppee?
The internet has a lot of data on it. A lot a lot. Most of that data is made, formatted, and displayed for human consumption. But under those, lots of websites have APIs - Application Programming Interfaces. Typically built using REST, these interfaces are made so that you can use data from these websites a lot easier in code. In my case, I'm using a lot of the Twitter API to be able to download a lot of Tweet data that lets me easily manipulate it, without having to program something that reads webpages like a human would.
Let's Talk about Rate Limiting
Many (most?) APIs I've ever used have had some kind of rate limits built in, mostly to stop any single user from hitting the API with so many requests that the underlying services get bogged down. The idea being that they are giving you a service - usually for free - and they need to be able to support that service without throwing a ton of money and hardware at it. In the case of Twitter, there's probably 10s or 100s of millions of requests against their API all the time, so for free versions, they limit how many requests you can make in 15 minutes.
ugh ratelimits
When I was looking at the 4ish million Tweet IDs from my last post, I knew I needed to retrieve the tweet data for those IDs from the API. I looked at the Twitter API docs, and initially saw the GET /statuses/show/:id
endpoint, which let me get the Tweet data for a single ID. The docs for this showed that I could make 900 requests every 15 minutes, 3600 per hour. This bummede out - there was no way I wanted to wait that long for my data, especially since I probably wouldn't ever catch up to real time.
Maybe Not
I was disheartened. So I spent a bit (like an hour) looking for other hacks and at other open-source projects, hoping there was a better way. Eventually, I found a post on Medium (that I lost now...) that basically went over the 3600 figure I posted above, but pointed out another endpoint I never thought about.
GET /statuses/lookup
The reason I never saw this is because (in my opinion) it's named a little poorly. The documentation says
Returns fully-hydrated Tweet objects for up to 100 Tweets per request, as specified by comma-separated values passed to the id parameter.
That's what I want! So now I'm up to 360,000 a second. Which is way better.
what's the point?
Maybe actually read more thoroughly than I do.