How many times did you get confused with datetime data? Have you been struggling too much every time data appears and you rigorously search on Google to find a solution or post your question on StackOverflow? The idea of this review is to provide a useful list of the frequently appeared issues with datetime. A gift for the new year 2021, working faster and with fewer problems.
Year-end 2020, I am thinking about what is worth reviewing. There is not much confusion to say about integer, float, text, but datetime data is like a frog or an ocean with so many different systems and formats dancing to confuse you.
I write this post mentioning you when I actually think of myself as part of it. I wrote from the struggling of myself as an analyst handling different data types. Even if you have the documentation, it is still difficult and takes time to skim and scan, try, fail, and try again.
A new year comes, a new resolution is born, I decided to organize, wrap up about time data in a handy way to make analyst lives like mine easier.
Before joining any battle, it is always good to know where you are, which enemies you are dealing with. Similarly, knowing what Time types we are having?
Types of Time?
In Python or SQL, the types of time we often see can be divided into a simple 2 categories: Datetime, Timestamp. Other materials or sources divide Datetime into smaller categories, but I prefer to keep things simple at this first step.
Datetime is what you can see in this kind of display: “2020–01–01 01:01:01”.
Timestamp is what you can see in kind of number format: “1577829600.0”.
Knowing you are having Datetime or Timestamp enables you to search for further hints to solve your problem already. If you are interested in knowing smaller elements, I pin one official Python document about Datetime module.
UTC or GMT?
Mentioning UTC or GMT, they belong to datetime type. The idea behind relates to timezone and standard time.
Conceptually talking, UTC (Coordinated Universal Time) and GMT (Greenwich Mean Time) are different. According to timeanddate.com, UTC is a standard time, while GMT is about the timezone.
In practice, we somehow use UTC and GMT for the purpose of specifying the timezone. From personal experience, I saw “timezone.utc” in Python more often than GMT.
UNIX or EPOCH?
At this point, we are mentioning about Timestamp type. I will just quote a nice recap from textmagic.com:
“Although UNIX time and epoch time are often used synonymously, they mean different things”
In this part, I will collect useful Python codes to convert time using datetime module in Python, and Pandas package.
For datetime, Python already has this module, you just need to call it out. For Pandas, if you haven’t install Pandas, you need to install it first, and then call it out.
import datetime as dt
import Pandas as pd
Convert datetime to timestamp
In some situations, you really need to convert datetime to timestamp. For example: I build a model in which I need to use a function in which the parameter only allows me to put timestamp format. In this case, I need to find a way to systematically /automatically convert datetime in my model.
Convert 1 Jan 2020 hour 0, minute 0 to Epoch timestamp:
Note that inside dt.datetime() you can specify up to micro second and time zone if any. Originally, you can look into documentation from Python:
datetime.datetime(year, month, day, hour=0, minute=0, second=0, microsecond=0, tzinfo=None, *, fold=0)
With a little trick, you may want to have a list of timestamps which later will be input for your model shown in the below image.
Convert timestamp to datetime
This point would be helpful when you want to convert your timestamp to datetime so that your data is more human-readable.
I once again use the timetamp “1577829600.0” to make an illustration in the code cell below:
How to get the time of the present
This is simple as it is. Depending on the time types you want, pass the code:
To get datetime:
To get timestamp:
In Pandas, changing the data type of a column to datetime
Very often you have to convert the data type to datetime. In many cases, the display of data looks like datetime, but it is understood by the system as ‘object’. It is important to convert the data type to datetime as it truly reflects the essence of our data, but more importantly, it influences our later analysis which requires datetime as the input. There are several useful ways to do this.
Force data type to datetime when you load the file
It is always simple and useful when you load the file into Pandas dataframe, you specify the data type using parameter ‘parse_dates=[column index]’.
For example, I have data.csv file in which columns index 2 is the datetimedata type:
Change the data type of a column in Pandas data frame to datetime:
We have a Pandas data frame call df, column ‘Date’ which is now ‘object’ type, and we want to convert it to datetime. There are 2 ways to conduct it.
Change datatype using series.astype and overwrite back to the original data frame. This method is that it is easy to understand and compute. The downside of this method is that some report this way only works with Windows but not on Linux.
Using pandas.to_datetime method applied to our data frame, passing the code as below:
This method has many interesting parameters that will be very powerful depending on your need. Here is the link to the official documentation.
Useful methods in Pandas to analyze time series
Resample method applied when you want to tidy your data for analysis by time. One point to note about resample is that it requires datetime data in data frame row index. This method is worth checking out HERE.
Photo is one example of how time series helps me in the analysis. Using groupby Year to see the price trend of different properties yearly.
In other analyses, you may want to explore the pattern of data by time of a day, or by day of a week. For example, you want to know at what time of the day your audience tends to view your writing/videos. “groupby” method will do the task.
HERE is the link. The tricky point is that you have to specify inside groupby: groupby (datetimeseries.dt.hour) to let the Python understand that you want it to group by hour.
I truly believe that if we know how to work with datetime, we can exploit many insights and save our energy with the established method with datetime datatype.