The need

  I don't know about you, but I've been living with ad blockers from the moment they arrived. Occasionally I get access to a new machine and experience the Internet without an ad blocker and I can't believe how bad it is. A long time ago I had a job where I was going by bike. After two years of not using public transport, I got in a bus and had to get out immediately. It smelled so bad! How had I ever used that before? It's the same thing.

  However, most of the companies we take for granted as pillars of the web need to bombard us with ads and generally push or skew our perceptions in order to make money, so they find ways to obfuscate their web sites, lock them in "apps" that you have no control of or plain manipulate the design of the things we use to write code so that it makes this more difficult.

Continuous war

  Let me give you an example of this arms race. You open your favorite web site and there it is, a garish, blinking, offending, annoying ad that is completely useless. Luckily, you have an ad blocker so you select the ad, see that it's enclosed in a div with class "annoyingAd" and you make a blocking rule for it. Your web site is clean again. But the site owner realizes people are not clicking on the ad anymore, so he dynamically changes the class name to something different every time you open the page. Now, you could try to decipher the JavaScript code that populates the div and write a script to get the correct class, but it would only work for this web site and you would have to know how to code in JavaScript and it would take a lot of effort you don't want to spend. But then you realize that above the horrid thing there is a title "Annoying ad you can't get rid of", so you write a simple thing to get rid of the div that contains a span with that content. Yay!

  At this point you already have some issues. The normal way people block ads is to create a quasi CSS rule for an element. Yet CSS doesn't easily let's you select elements based on the inner text or to select parents of elements with certain characteristics. In part it's a question of performance, but at the same time there is a matter of people who want to obfuscate your web site taking part in the decision process of what should go in CSS. So here, to get the element with a certain content we had to use something that expands normal CSS, like the jQuery syntax or some extra JavaScript. This is, needless to say, already applicable to a low number of people. But suspend your disbelief for a moment.

  Maybe your ad blocker is providing you with custom rules that you can make based on content, or you write a little script or even the ad blocker people write the code for you. So the site owner catches up and he does something: instead of having a span with the title, he puts many little spans, containing just a few letters, some of them hidden visually and filled with garbage, others visible. The title is now something like "Ann"+"xxx"+"oying"+"xxx"+" ad", where all "xxx" texts appear as part of the domain object model (the page's DOM) but they are somehow not visible to the naked eye. Now the inner text of the container is "Annxxxoyingxxx ad", with random letters instead of xxx. Beat that!

  And so it goes. You need to spend knowledge and effort to escalate this war that you might not even win. Facebook is the king of obfuscation, where even the items shared by people are mixed and remixed so that you cannot select them. So what's the solution?

Solution

  At first I wanted to go in the same direction, fight the same war. Let's create a tool that deobfuscates the DOM! Maybe using AI! Something that would, at the end, give me the simplest DOM possible that would create the visual output of the current page and, when I change one element in this simple DOM, it would apply the changes to the corresponding obfuscated DOM. And that IS a solution, if not THE solution, but it is incredibly hard to implement.

  There is another option, though, something that would progressively enhance the existing DOM with information that one could use in a CSS rule. Imagine a small script that, added to any page, would add attributes to elements like this: visibleText="Annoying ad" containingText="Annxxxoingxxx ad" innerText="" positionInPage="78%,30%-middle-right" positionInViewport="78%,5%-top-right". Now you can use a CSS rule for it, because CSS has syntax for attributes equal to, containing, starting or ending with something. This would somewhat slow the page, but not terribly so. One can use it as a one shot (no matter how long it takes, it only runs once) or continuous (where every time an element changes, it would recreate the attributes in it and its parents).

Feedback

  Now, I have not begun development on this yet, I've just had this idea of a domExplainer library that I could make available for everybody. I have to test how it works on difficult web sites like Facebook and try it as a general option in my browser. But I would really appreciate feedback first. What do you think? What else would you add to (or remove from) it? What else would you use it for?

Comments

Be the first to post a comment

Post a comment