Random Development Things
Things that I find interesting.
So whenever I develop things, I come across interesting scenarios that I usually would think is nice to share. Usually. I just post bug fixes and the like onto my blog, but heck, I want to be able to continuously update it with notepad (instead of in a browser).
Now the first is about open source software. I'll make it very plain and clear. I have no use in publishing source code online. Usually, there's an intent for contributing to open source software, and I have done it before, but the world is not as "open source" as you may like it to be. Usually, there's some reason for open source. With licenses such as the GPL, its about making EVERYTHING open source - like a midas touch, though instead of turning everything into gold, the GPL turns everything into open source.
There are some very popular applications that aren't open source, and it really makes me rethink about publishing code freely online. Minecraft, Google, Facebook, Sublime Text. These are all services, applications and games that are NOT open source.
Furthermore, I see my code as stable for at least a year for each project (and in fact, I have not changed the majority of the applications on this site for years and they still work the same). After a year is the point where there would be another project or service that covers whatever I have been doing. So my code isn't worthy enough of being open sourced.
I also think that the open source community tends to be hypocritical when focused at individuals. See here: http://blog.extramaster.net/2014/05/open-source-people.html
The stuff on this site
Please note, that I create the tools for myself for my own personal use only. I usually have to download YouTube videos due to slow internet connections, and SlideShare presentations that involve public data. The YouTube age bypass extension is something I use because I'm not 18 yet, and as the content on YouTube is extremely tame as compared to sites like LiveLeak and Australian News programs. I just happen to have published them online for my friends to use. Such as with the "Happy Street" hack.
Anything online that I offer should be considered as something that I use personally, because, well. I do use it myself. In fact, I use the YouTube downloader daily to download EthosLab videos. The "Happy Street" hack was more of a proof of concept, because of the incompotent team at Gozillabs, though they got their act together after 10 months ;)
Python vs PyPy
I don't usually reveal my projects when they're on a conceptual level, but only during its actual development. Right now, data-mining is something that I'm attempting. And I just wanted to say that with StackOverflow's 28.5GB data dump, PyPy is 2x faster then CPython (Python), in terms of sequential data processing.
For some reason, disk usage is faster in PyPy then in CPython, which I thought was interesting to share - I only thought it was for loops and other data structures - Processing files should be limited by the Operating System and the Hard Disk speed, but apparently its not.
PyPy is so much better then Python.
Also, going through millions of lines of XML data with Python takes hours and hours on end. You'd expect 28.5GBs to not mean much, but then you may run into memory issues, and the fact that line-by-line processing is not the best idea. To top this scenario off, the XML dump is not a relational database, hence questions/answer pairings must be manually processed.
Using byte-code loops to parse the data is much faster then doing a line-by-line iteration by a large factor.