[Archive] 10 ways to make boring articles fun!
Editor’s Note: I wrote this on 27 October 2016 when I was 16 on a shared blog that I had with a friend, so please excuse my immaturity. You can find the original post here. Not really interesting, it was my first time exploring Markov Chains, which is kind of similar to the “auto detect next words” on your phone’s keyboard.
But the sentences and words it generates are usually nonsensical since it’s a very “dumb” mathematical model.
Nowadays, I’ll probably use some form of Machine Learning instead to try generate “funny sentences”. The subreddit I mentioned, SubRedditSimulator, later had a “sequel” sub called SubredditSimulatorGPT2 that used GPT2 to generate more comments that somewhat make sense. As of the time of writing, now there is GPT-4, and chatgpt has blown up last year, and can easily make much better parody sentences. I’m not aware of any subreddit simulator that uses GPT-4 now though, perhaps because it’s deemed not worth it for the processing power, which is a shame.
Upon browsing Reddit, I found an interesting subreddit called SubredditSimulator. This subreddit consisted of bots which simulated redditors by collecting the data on various subreddits. The bots were powered by Markov chains , along with some other artificial intelligence algorithms for determining the topic. Naturally, I had to try implement this myself for fun.
Oh yeah, the topic is clickbait by the way, you won’t be reading boring articles. You’ll be making fun of them. Mocking them. Partronising them. Ha ha. Silly lil boring articles. (Don’t you dare run the end program on this article. This article is fun.)
SubredditSimulator essentially is a smart Parody generator.
“Parody generators are computer programs which generate text that is syntactically correct, but usually meaningless…”
What I created is closer to what is known as Disassociated Press.
“Dissociated press is a parody generator (a computer program that generates nonsensical text). The generated text is based on another text using the Markov chain technique.”
Additionally, for testing purposes, I decided to make it so that my program could also generate random names based on the same technique.
It is also interesting to note that this could possibly be used to generate plausible sounding text that could bypass spam filters.
To start with, I’d need a few things.
- Database of names OR and article
- A C++ compiler (C++ being my language of choice)
My C++ compiler was MSVC’s compiler. I got a database of common female American names from wikipedia, here. As for the article, I got one about accounting from the New York Times website from July 2016.
PS: Don’t click the link. If you click the link they’ll be able to trace me down. Copy and paste it but DO NOT CLICK the link.
http://www.nytimes.com/2016/07/24/business/a-profit-bump-for-companies-and-tax-transparency-for-investors.html?rref=collection%2Ftimestopic%2FAccounting%20and%20Accountants&action=click&contentCollection=timestopics®ion=stream&module=stream_unit&version=latest&contentPlacement=1&pgtype=collection&_r=0
Extremely boring, right? (So sorry editor of NYT if you’re reading this D: , your articles are way better than mine)
Technical stuff
Okay, okay, I know it’ll be boring to some of you, but there’s bound to be that one nerd who’s into this kind of thing.
According to Wikipedia, “a process satisfies the Markov property if one can make predictions for the future of the process based solely on its present state”. What does this mean? Well, it basically means that something has a Markov property if you can ‘guess’ what the next state will be.
Let’s say that you have a list of names :-
- James
- John
- Joseph
By looking at the data we can figure out that the letter ‘J’ is once followed by ‘a’ and twice followed by ‘o’. Thus, we can conclude that there is a 2 out of 3 chances of ‘J’ being followed by the letter ‘o’ and 1 out of 3 chances that ‘J’ is followed by the letter ‘a’. The process then repeats itself (‘a’ is followed by ’m’ 1 out of 1 times, so it is a 100% chance of being followed by m, etc.)
Current Next Occurrences Chances
J a 1 1/3 = 33.34%
J o 2 2/3 = 66.64%
Sum = 3
Current Next Occurrences Chances
a m 1 1/1 = 100%
Sum = 1
...
While not entirely accurate, it could generate plausible sounding names.
Results
Here are some of my favourite unique names my program spewed out after eating up the common female names from Wikipedia.
- Maletra
- Parue
- Tictinnda
- Sucticicalerilliamintindonely
- Mintiy
- Chillla
- Isa
- Lula
- Andia
- Zophlamma
- Eminarty
- Dinaelie
And some sentences after processing the NYT article. Honestly, I can’t even tell the difference
- Utilities granted the options were Companies’ tax benefit.
- So to light by the world of being 40 percent, it anticipated; this tax rate may be under the balance sheet," he said.
- The shift makes the costs and it has changed.
- “Some companies now pay.”
- “Some companies are going to light by the fewest options,” Equilar said.
- Then you don’t know about this proposal is what the greatest impact at far below their earnings.
- Employee stock options at a company’s net profits and are therefore rosier than they would add 20 cents a company’s balance sheet," he said.
- According to deduct from its financial statements made it in the company actually pays.
- It simply shifts the change will give investors a company’s financial filing indicates that is dull?
Well, that was a fun weekend project. If you’re wondering where the program is, my code is crap and I’m too scared of being bullied by the elite coding gurus of the web for my bad practices. If I release my program without source code, then the elite coding gurus will bully me into releasing it. Maybe I’ll send it for a code review and then release it or maybe I’ll release it when I’m not feeling like a chicken. Peace out.
PS: If you liked my post, please leave me a comment, or share this post, or recommend it to Google. I would really really appreciate it! :)
And if you didn’t like it…. Send me a comment! I will definitely listen to your feedback! :)