Nice work. I was going to basically cover the same topics in my second part but it looks like you beat me to it. If you'd like to collaborate with me on the next portion feel free to contact me on Discord (veritas#0001) or email (f@nullpt.rs)
Hmm, can you clarify how did you manage to send request in your original article? I copied as curl from network tab and tried to sent the request from curl. However, there was no response. Are they using fingerprints or cookies shenanigans?
I think they are just reusing technologies commonly used in China on TikTok website so they don't write it twice. In China many frontend developers write code for WeChat mini program. WeChat mini program is just a typical HTML website with extra APIs and the restriction that code has to be uploaded for review before they are deployed, and `eval` (and anything else that allows executing string as script directly) is disallowed in order to prevent sneaking code past review. Many developers developed their own JavaScript VM and interpreter to bypass that restriction, and that same code happens to be reused on all HTML based platforms even outside WeChat.
So they won't let you do eval but don't mind a whole VM? This doesn't sound right, unless eval touches something deeper and might pose a security risk, while the VM runs inside the normal sandbox. Can you elaborate a bit, or hint at further resources? It sounds fascinating.
I’m fairly sure that it is impossible to tell from arbitrary code whether it is an interpreter or not. This is just something inherent in Turing machines, they can emulate other Turing machines.
There was a Part 1 of this post I believe (it could have been by a different author). It revealed that TikTok sends fingerprinting data about your device during every get request, under an unseeming header, encrypted to look like a long random string.
Is this not standard practice for sites/apps like these? Seems a little overboard, user agent fingerprinting is hardly a secret, and (unfortunately) not considered a dark pattern.
You'd be surprised to see China's underground content farms. They went great length to automate content generation by decrypt & exploit App's private APIs
Wasn’t there also an article on here that showed the sub farms, basically tons of usually girls in a factory floor with cubicles that were streaming for some coin. Not sure if it was China specific.
And specifically, the idea is to make sure that any kind of effort at writing a bots-and-fraudulent-likes-as-a-service platform is such a moving target that it becomes uneconomical to maintain. It's never perfect, as efforts like the OP's attest, but the incremental advertiser trust (and brand trust from a valuation perspective, as we've seen with Twitter's bot problems!) from being ahead in this "arms race" is likely considered worth the cost of hiring obfuscating-compiler engineers.
Seems like the same approach as using a different kind of lock on your door: the groups with resources will simply already have the tools they need to get through it and it only really stops people who aren’t at that level yet
There's not a huge performance hit, there's a portion that runs on page load and a smaller portion that runs on each HTTP request (which isn't too often).
Bots just run copies of Chrome these days, you're not protecting anyone with this obfuscation if the bots you're targeting can just run your code and automate the UI.
With the obfuscated fingerprinting demonstrated by the virtualized code, I reckon they're doing something more malicious.
At first I thought this looked kinda scary, but it makes more sense as bot prevention, and obfuscating as much as possible to hinder manipulation of data. Consider services that manipulate view stats or followers etc for payment. This ruins both tiktok, instagram and reddit revenue models.
Yes, but that's not the only thing running under the VM--the entire user client is in there. This is not proof that they're evil, in fact it's identical to an anti-boting technique that was just on the front page yesterday [1].
There's JS minification - renaming variables to single letters, eliminating dead code paths, minimising whitespace, etc - that's exceptionally common and will be the default for most frontend workflows.
That is not really 'Obfuscation', at least to the degree that TikTok is doing.
I think the gap between "minimizing" and (un)intentionally "obfuscating" is quite simple: "Sourcemaps". Usually available from all common tools, like Webpack, terser, ..., and the intermediate solution for preserving accuracy.
Not really. gzip doesn't know that it can eliminate enormous swaths of unused code paths. I've seen enormous code bases drop from multi-megabyte downloads (even using gzip compression) into mere kilobytes from minification strategies.
It's because minification is far more intelligent than it's made out to be: Your code may pull in a dependency like lodash for a single function in a number of places. gzip would be able to notice that and reduce the total amount of data sent over the wire by de-duplicating and compressing lodash but minification would notice that you're only using a single function and only that function would end up being included in the minified source (and similarly de-duplicated).
At least in webpack, tree shaking and minimization are distinct, but BOTH include dead code elimination.
For webpack tree shaking is the removal the declarations of exported values that are never imported. Further, imported values that were used used in functions such exported can be removed (although removing the last import from a file entirely requires knowing the imported file has no side effects).
Separately to that the minimizer (terser) has its own dead code elimination process, that performs more in depth analysis. Terser cannot do the tree shaking part itself because of how webpack structures modules that Webpack was unable to concatenate, and because code slitting might make that live in a different file. But if webpack can eliminate unused exports and ensure that split files only access each other via exports, then terser can handleremaining dead code elimination within each module.
> Imagine someone did that in a codebase that you had to take over !
This is typically done at build time, so there's no or little impact to developers writing code.
Minification is the frontend equivilent to compiling code. We typically don't think of building a Go program as 'obfuscating' it when compiling it down to machine code even though the resulting artefact is harder to read than the original source code.
So basically, you need to figure out the encryption key protected by the VM in order to decrypt the request and response messages that you can observe through the operating system? Is that correct?
At the business / management level, who decides to green light a project like this and what are their reasons? Is it regulatory obscurity or denying the competition easy understanding of the code?
> Using these accessors, the VMs become able to do anything that JS can do.
In fact the source language is likely to be JS itself; a JS-to-some-sort-of-vm-bytecode-to-JS compiler is made. I know that Tencent has a similar VM; an interesting aspect of that VM is that the instruction set is dependent on the code being compiled (and the opcodes are dynamically generated and shuffled when compiling), so unused instructions are not generated.
I, too, was disappointed that this was not a continuation of https://www.nullpt.rs/reverse-engineering-tiktok-vm-1, but as someone who could really use a way to interface with TikTok's API (for legitimate non-bot reasons that allow users to interface with TikTok differently), I'm all for more eyes on this problem.
Interesting! I wonder, would this virtual machine be hand crafted or auto generated? When I look at obfuscated code like this, I always wonder if the code authors couldn't run their generator a second time and come up with a completely new format that existing reverse engineering efforts wouldn't work with?
I don't know what they're trying to obfuscate but it must be worth hiding to allow such inefficient javascript to run on clients around the world. I can't think of any non malicious reason to develop such a system for a website about silly videos.
> I can't think of any non malicious reason to develop such a system for a website about silly videos.
Let me give you a few examples of cases where obfuscated or inefficient code exists:
1. re-captcha is "obfuscated", and so difficult that computers aren't meant to be able to compute it. It is used to rate-limit login attempts to secure people's accounts, among many other uses.
2. Cloudflare runs some obfuscated "are you a human" javascript to fingerprint you for the purpose of DDoS protection. This one is arguably less noble, but still not obviously malicious.
3. Games obfuscate code all the time to make cheats harder to develop and maintain
4. Youtube, netflix, etc have obfuscated code (DRM code) because legally they have to make an attempt to protect copyrighted works from being downloaded etc, or else they lose access to said copyrighted works. Tiktok also allows using copyrighted music I'll point out.
Depending on your viewpoint, perhaps all of those are also malicious, but I think many people view them as non-malicious, though perhaps dumb.
That said, I personally wouldn't give tiktok the benefit of the doubt here. All the other large social media companies maliciously capture as much data as they can about me in order to sell it to brainwashing companies (ahem, advertisers), to the point where I think it should be illegal, so I don't really expect tiktok to be any different.
I'd consider all of these to be "dual use" technologies, in a way; I appreciate recaptcha's functionality, but dislike that the data is being collected by data sponges like Google. Cloudflare has a better track record but they also have much more control, so I'm split on their use of this tech.
I accept DRM for online games where one modified client can ruin the fun for everyone else, but this isn't a multilayer game. It's a video player with comments.
As for DRM, that's just plain malicious in my eyes. I need to go through the effort of pirating Netflix content I pay for because my OS isn't in the magical white list. It's not as if DRM is preventing piracy either, because pirated content in 1080p is available the day a show goes live on Netflix, sometimes even earlier because of bullshit International contracts. There's no legal requirement for their DRM either, that's all part of the contracts they sign with media companies who are just as responsible.
I don't think Tiktok cares about copyright law in the slightest, as far as I can tell they just started using copyrighted music and dealt with license deals after the fact. Bytedance knows the MPAA isn't going to get the app banned, not without running the risk of American artists losing the lucrative Chinese market, so I very much doubt that they'll use this for any kind of DRM.
Perhaps this is just an in house recaptcha alternative or a way to block youtube-dl, but so far I haven't seen this stop redistribution of the videos on their platform in any meaningful way. Tiktok videos get downloaded all the time and even youtube-dl has support built in these days so I don't see what it's trying to accomplish.
but so far I haven't seen this stop redistribution of the videos on their platform in any meaningful way. Tiktok videos get downloaded all the time and even youtube-dl has support built in these days so I don't see what it's trying to accomplish.
The very rare times I've had to watch a video on there, I didn't even need to run their JS. The URL was just in the source of the page. I don't think they were trying to stop that, given that they clearly could've tried to.
I have a very short JS whitelist, and it's not getting lengthened just to watch a video. I'd sooner extract the URL and stick that in my own media player.
Typically the obfuscation is done with a special compiler as it allows easily to add or alter obfuscation layers. This also allows to serve different obfuscation blobs to different clients to slow down reverse-engineering even further.
>I wonder, would this virtual machine be hand crafted or auto generated?
Every proper obfuscator with virtualization should be auto-generated. Of course, when I reverse it, then my scripts are also automatized as much as they can be.
A lot of it is probably to prevent automatedly ripping content out wholesale and using it for other social media platforms. And any related automated data mining.
And then we can use a small software to view tiktok videos, without the need of one of the vanguard/blackrock financed, absurdely and grotesquely massive and complex, web engines (blink/geeko/webkit).
As a thought exercise
Botter: attempts to spin up phone vm for botting Okay tiktok start up
TikTok: What hardware am I on?
Botter: insert common phone hardware here
TikTok: Okay, than this should work hyper firmware/bytecode specific virtualization commands and syscalls
(segfault)
I skimmed the original post but still don't really grasp how these VMs are deployed. I assume they are running server-side (client-side would be a violation of App Store policies at least, right)? Is a dedicated VM spun up for each user session? Or just a sandbox in which various services run?
As others have already touched on, it’s rather disingenuous to pretend that you are “picking up where he left off” after copying his title and labeling your article as part 2, no? Especially considering the first author made it very clear he intended to publish further work on the topic. Further, this article reads a lot like an attempt to beat him to the punch and to hijack his series. But correct me if I’m wrong?
You're right that copying their title exactly was probably not the right thing to do, and may cause confusion. I'm not trying to hijack anything, I think there's enough room for both of us to research this at the same time.
Thanks for your response, seems reasonable to me. That being said, this post is at best impolite, and at worse intentionally "scooping" another reverse engineer. Seems like you guys are chatting now though, so I'll leave you to figure it out.
It's sad how much human effort is spend in creating obfuscation schemes, and then in analyzing them. Or creating proprietary things and then reverse engineering them.
Sometimes I wonder if we could just make everything open. No obfuscation, no captchas, just a neat API for everything. Of course that wouldn't work ceteris paribus, everything else unchanged, due to bad actors, spam, or just competitors who want to take your work. But if you'd change the incentives - make society non-adversarial, non-profit oriented - then all that gating and obfuscation would become unneccessary.
You could say the same thing about locks, police, military. Those are way more wasteful and also just exist due to bad actors. I don't think there's a way around this, it'd be like wishing for no predators in nature so that natural evolution could just do the "good parts" and not have to optimise for survival.
"police, military. Those are way more wasteful and also just exist due to bad actors"
I agree to your main point, but I would like to point out, that the definition of "bad actor" can be different, including "they are bad because they have something we want, so we need to deploy our military towards them"
I often wonder the same thing. A similar aim was expressed in at least one of RMS' essays.
The other comments here remind me of how pessimistic and lacking in imagination the tech crowd tends to be.
Because earlier attempts failed, does that make an endeavor "impossible"/"naive"?
People cite "human nature", but forget that there exist and have existed systems of living, radically different from what they are used to, which gave rise to completely different "human nature" - or perhaps merely gave it a healthier environment, so it manifested very differently. (This "problem" is frequently refuted in anarchist texts.)
I'll leave some links which I think are relevant to this quest. I hope they provide some answers, or at least provoke further searching.
I hope we find a way. And for those looking and trying to move beyond the status quo, I supply a maxim to bear in mind when facing the unimaginative and the complacent - "It's always impossible until it's done."
> But if you'd change the incentives - make society non-adversarial,
No. It is far better to recognize that there is atm no simple way we can rid the world of bad actors, i.e. not naïvely hope that some rather simple technical adjustment would somehow make society non-adversarial. Having fallen into that trap at least once I for one has come to realize the profound wisdom that the founding fathers possessed: the best we can do at this point and in the foreseeable future is to create checks and balances.
Which in this case means making it difficult for bots and other bad actors.
I agree that there is no simple technical solution, but I would argue that we need a social breakthough rather than technological advancement. Our economical and political system has been roughly the same for the last two or three centuries: Nation states, representative democracy, an economy based on wage labor. And although not every country is democratic, some variation of this system has spread almost everywhere.
In the meanwhile we had incredible technological advancements. Electrification, modern chemistry, computers, the internet. What was the last comparable political and social disruption? I'd say the civil rights movement (1960s), or women's suffrage (1920s) were very important. But apart from that, we are mostly running the same social OS since the industrial revolution and the birth of modern nation states.
The ideas of the founding fathers of the US and other thinkers of that era were great in their time - republicanism and a bourgeois revolution - compared to the absolutism that came before. But were are new ideas of this caliber? What is going to be the disrupting new 'tech' in politcal theory?
(I realize it seems silly to jump from TikTok to systemic criticism, but I think in the fringes of the tech sector is a really good place where you can see fast moving technology chafing against fossilized society. And a lot of people here have the desire to make the world a better place - my point is that the way to go is not to invent radical new technology, but maybe by inventing new political ideas.)
Ok this got deep quickly, but fine by me. I agree there are plenty of things to criticize with our present systems, be they democratic or autocratic. I mention "autocratic" only to show that one can't easily dispel that latter form of government yet (on a global scale it is verily on the rise again). Wage labor economy: yes, true representative democracy: no. That ideal only exists to a some degree in a relative few countries that have no beef with the current sole superpower.
As for disruptions in social/technological matters, there is no denying that we live in interesting times, but as always it is perilous to ascertain if most relevant things are the same, or some things are the same, or no such things are the same as before (sameness is notoriously difficult to assess, at the extreme one might argue that all discernible things are ultimately if not categorically different).
We can however unequivocally state that the argument (that things are now categorically different) has been heard before, many times, and has always turned out to be largely false - self interested deceptions even. Recent history has the "new economy" around the turn of the century claiming that everyone would from now be rich because internet, even more recently the crypto currency claims of a new social order. Going back a few hundred years we had the emerging laissez faire economists claiming that a totally free economy without state interventions would lead to world peace, a claim parroted by the socialists (socialism would lead to peace). Later the women's rights movement made the same claim - if only women would be in charge then ...
Make what one will out of this, to me it is but one long list of falsehoods and half truths, and I've personally swiveled back to a position which is close to that of classical economy: humans are essentially (in a statistical mean sense) self interested, and will from that simple fact come into conflict with both other humans, the environment and the common good - and will skew everything in their perceived favor unless there are appropriate checks and balances. This is the ultimate factual circumstance that I see no sign of changing in the foreseeable future - through technological breakthroughs or anything else. We are still essentially in the age of the founding fathers in that regard.
As for peace, the only path that has proven to be successful is the ad hoc and "organic" creation of ever larger political units: neighboring villages that were at each others throats during prehistory made friends under the rule of warlords, warlords became friends under the rule of a feudal power, feudal powers became friends after the conquest by empires, empires morphed into global trade organizations that requires global legal frameworks to work efficiently. That in my mind is incidentally the breakthrough for global peace that one might hope for in the next few hundred years - if we can keep the belligerents on all sides in sufficient check while global inequalities in power and wealth are evened out through ever more trade and international cooperation.
But right now the odds seem to be that we blow everything up before that TBH.
Sadly we do not live in a post-scarcity world, competiting for resources in an adversarial manner is the driving force behind most individuals, corporations, and on a larger scale - nation states.
Also, in a slightly better light, people don't actually know what "best" is. Sure, there is a ton of ugly competition, but there is also some good forced exploration out of that to try alternatives which frequently can arrive at higher global maxima.
Note: this is not a continuation of the work by the same author. This is a new author who took the original and did further research. I think it's a bit disingenuous to call this by the same name, "part 2".
I think it's actually more truthful to admit that this is the continuation of an earlier post than it is to say that this is new work. This is just the continuation of the process the first post documented.
The relation between this article and the one it's based upon is clearly indicated in the first paragraph. I don't think this is disingenuous at all.
Yeah, but that was 2 years after the original author died. This is a month after the original publication.
I just feel it would have been more tasteful to choose a title different than the one that the original author is obviously going to use for their next blog post.
How could Apple properly review something like this? Isn't it one of Apple's selling pitches that they'd review each app for malicious activity before it makes it to the app store?
So, a tricky piece here is that this appears to be behavior of the TikTok web site. Obviously Apple makes no attempt (nor claim) to review the behavior of every web site accessible in Safari from an iPhone. And other native apps can embed WebKit-based web views into their apps.
The good news is that the scope of "malicious activity" is (at least in theory) much smaller when you constrain it to what web sites can do, as opposed to the scope of what can be done by executing ARM instructions and making syscalls.
The bad news is that the scope of "things web sites can do" keeps growing and is fingerprintable.
> How could Apple properly review something like this? Isn't it one of Apple's selling pitches that they'd review each app for malicious activity before it makes it to the app store?
They couldn't. Apple does not perform any meaningful review of apps for malicious activity, do they do it for rent seeking.
Yeah I'd also like to understand why they are doing this.
Everyone expects these sites to scrape as much personal information as possible (China did not invent that, they are following), but beyond that any additional imagined state-ran initiative would be server side, right? What is worth hiding in the front end beyond preventing people re-using their code? (which would be overkill to use a VM for, as light obfuscation would be enough)
iOS requires user permission for each app to make local network requests (Settings -> Privacy & Security). So in theory this shouldn’t be possible, unless ByteDance came up with a way to bypass it that made it through Apple’s review process.
I imagine TikTok could get pretty far simply asking users for permission. A significant number of users would probably allow it. Fortunately watchdogs would probably get it into the news pretty quickly.
You could log into any OS X machine by just pressing enter at the login screen enough times, for over a decade. Apple has trash security and they've stopped even saying what their security updates fix.
1. That wasn't a bug for "over a decade," as far as I'm aware. It was only introduced with an OS update
2. I could be wrong here, but it required you to already have created a root user with a blank password, which isn't possible unless you know how to trigger another bug that does it.
It's just a reflection on how Apple's security can't be trusted, regardless of the possible vector. Big companies like Apple need vulnerabilities for deniable intelligence and national security access, but anybody can find and use these to achieve a breach.