top of page
Search
Writer's pictureMosopefoluwa Ogunlaiye

A step by step guide to practical data analytics: The splitspot project


During the innovate for Africa Fellowship, we had a program called the "hard skills lab" . There were different courses under this program. However, I chose the data analytics course because of the already kindled love that had been sparked through personal research and learning of the subject. This program paired us with a real startup company for a data analysis project after we had gone through hours of training materials. I was paired with a real estate company in Boston called Splitspot. The task was to come up with a predictive model for house sales in the Boston Area. The purpose of this project, was to give us a taste of what a real life data analysis task looks like and also to test our practical knowledge of the subject.


This was my first attempt at any data analysis. Coming from an accounting and auditing background, I was told that I had the advantage of being familiar with numbers and spreadsheets. An analytical mind and the ability to probe further on issues by asking relevant questions, are also qualities I possess that gave me an advantage. A year ago, while searching for alternative and more fulfilling career paths, I learnt some programming languages to better prepare me for the future. Specifically, I learnt Java Script and Python programming languages. I learnt them in the context of web development. These programming languages gave me an advantage because I realized that the Python programming language is very needed in modern day data analysis. This made Python libraries like Numpy and Pandas easy to understand.


However, having all these advantages didn't mean I was free from challenges. I soon realized that gathering relevant data sets is a daunting task as there still aren't too many data sets on some subjects. I also discovered that where there is data readily available, the data contained a lot of irrelevant information and sometimes wasn't in the right format. Thus I was introduced to data wrangling, which simply put means preparing data so it can be suitable for analysis. I finally stumbled upon a crime incidence report from data.boston.gov which was a good data set for this purpose as it contained crime incidents according to neighborhoods, date, time and frequency of occurrence. This data set shed light on which neighborhoods were desirable due to low crime rates. It further shed light on the pricing of houses in the Boston area as the locations with high crime rate wouldn't cost as much as environments with low crime incidents. Another challenge I faced was with statistics. Even though I had a finance background, it wasn't the same as being a statistics major. I had to relearn things like Linear regression which I had previously learnt during my university days before I could do a proper analysis.


By this time, the data file (CSV) had already been loaded into Anaconda, which is the environment used for analysis with Python (I have always found it weird that this programming language and its environment are named after snakes because of my intense dislike of anything that slithers and then hisses with a forked tongue). I used Python libraries like Pandas and Matpotlib for data manipulation and for visualization respectively.



23 views0 comments

Recent Posts

See All

コメント


Post: Blog2_Post
bottom of page