Going after my dreams!


How I decided to become a Data scientist

Where should I look for jobs?

Today after finishing my Capstone project I am opening totally new page for myself. I am starting my job search and now will share most useful links for Data Science job hunt. I’ll walk you through the pros and cons of each job search engines so you can decide which job search engines YOU should use to get the most responses. My goal is to help you to know where to post your resume online to get the best results.


How to choose your own training environment for Data

1. Jupyter Notebook

Jupyter Notebook is an open source application which helps to create and share documents that contain code, equations, visualizations and text. It supports languages of Python, R and Julia. Their main use is in computational physics and data analysis. Jupyter notebooks are more focused on making work easier to understand.


How to predict Hotel booking cancellations!

After going over Hotel booking demand dataset on a Kaggle Kernel we see data of two types of hotels. One is a resort hotel, and the other is a city hotel; both have more than 200 rooms and are classified as four star hotels.


Folium for creating heatmap# Enter your title here

I was given a project to predict house prices and analyze location’s effect on house prices. I found Folium which helps to create a heatmap with Choropleth() function showing average price for each zip code. For obtaining data I imported necessary packages and load the data. Image for post Introduction to Folium Folium is a powerful data visualisation library in Python that was built primarily to help people visualize geospatial data. With Folium, we can create a map of any location in the world as long as latitude and longitude values are known. The maps created by Folium are interactive which means we can zoom in and out after the map is generated. Folium builds on the data wrangling strengths of the Python where the data is manipulated in Python and after that visualised in a Leaflet map via Folium. Installation of Folium Before using Folium, we need to install it on the system by any of the two methods below: $ pip install folium or $ conda install -c conda-forge folium My project is based on King county dataset and I used King county zipcode data which is open source data. To get the average price per zipcode I calculated each zipcode’s price mean. After that I loaded the data and filtered zipcodes.( Image for post Now time to call Folium for help. This wonderful tool will give us interactive functionality. It will let us drop markers on the map, build heatmaps, build heatmaps that change with time and much more. Isn’t it awesome! Once the map object is created with Folium, we display the choropleth map using the .choropleth() method. This method binds the data contained in the data frame with the geometries of the json file. Image for post Image for post As we can see from our map which is based on King county dataset it is observed that the Bellevue, Seattle and Mercer Island have the highest average house prices. Let’s see whether the top zip codes will match with the most expensive areas. For above reason we will create list of zipcodes, lat and long and then we can create markers. Image for post Image for post As recognized from the above map Northwest part of the county has higher average house prices than the rest. In additon, some zip codes average house price are remarkably high compared to the rest. It can be concluded that zipcodes can effectively predict when performing Multiple Regression Analysis on King county house data.


Handling missing data for Bank marketing dataset

Last couple of months I was enhancing my knowledge in Data Science and here we go I just finished my third project. For my project I chose famous dataset ‘Bank Marketing’ from UCI Machine Learning Repository. The aim of this project is to predict whether customer is subscribing bank deposit or no by applying several machine learning classification models. Of course I started with Data cleaning and againI have faced with handling the missing values. There are different approaches to this problem. Firstly, we should accept that there is no exact one way to deal with missing data. I have come across different solutions for data imputation depending on the kind of problem and it is difficult to provide a general solution.