New York City Taxi Fare Prediction – Analysis and Prediction

Tinkering with multiple machine learning algorithms and methods to predict Google’s New York City Taxi Fare Prediction challenge.

In this article,i will be using different machine learning techniques to predict fare_amount.

you can learn more about this dataset on Kaggle. 😇

Note: RMSE(Sq. root(mean_squared_error)) will be used to measure score for our model(s).

Pre-processing :

not exciting,i know 😑

1. Random Forests (Regression)

RMSE: 3.4

    • Data size = 1mil (1,000,000)
    • test_size = 0.20 (20%)

    Insights:

      • None

      2.Dense Neural Network with Keras (DNN)

      RMSE: 3.26

      • Data size = 1mil (1,000,000)
      • test_size = 0.20 (20%)

      Insights:

      • Was able to minimize loss up to 14.3 —-> 3.7815340802378077.It actually seem to perform better in submission set then test set.
      • Dropout is useless.tried different values, only resulting in higher loss stalls.

      3.XGBoost

      RMSE: 3.13708

      • Data size = 20mil
      • test_size = (0.01%)

      Insights:

      • Achieved my best score – 3.13708 – 175th Rank / 650(as of 28/08/18).
      • GridsearchCV is used to used find out best hyper parameters.took some advice from kaggle kernel with Bayesian optimization too 🐧.
      CategoriesML

      Leave a Reply

      avatar
        Subscribe  
      Notify of