Data Science in Logistics
By Kat Campise, Data Scientist, Ph.D.
Each year, billions of packages are shipped to customers throughout the world. At every supply chain touchpoint, from a customer initiating an order to the final delivery of that order, large quantities of data are collected: customer information, GPS data, the number and types of items, carrier data, delivery information, etc. Logistics centers on the design and implementation of the interaction between people, products, and processes.
Amazon is a salient example of an efficient logistical flow: buyers and sellers interact on the Amazon website, customer orders are relayed to their product warehouses which prepare the items for shipping, the warehouse coordinates with the contracted delivery partners, products are shipped, tracked, and then delivered. While this is an overly simplified summary, and there is much more that occurs on the backend, it’s a consistent pattern throughout supply chain logistics.
The applicability of data science to most, if not all, industries is evident. But, logistics is a specific sector where data scientists can make a significant impact in several areas such as waste reduction, optimizing delivery routes (which can translate into lower delivery costs), selecting carriers that deploy best practices in mitigating the environmental effects of CO2 emissions, accurately forecasting the supply and demand cycles, and ensuring that hazardous materials are handled with the utmost care. Data drives our world, and data scientists are in a position to use that data for the benefit of humankind.
What Can I Expect to Earn as a Logistics Data Scientist?
The employer’s location, your education and experience levels, and how they define “data scientist” are factors that influence your potential earnings as a data scientist within the logistics sector. It’s well known that data scientists tend to earn more in larger cities, e.g., New York, Los Angeles, Seattle, and San Francisco; but, the tradeoff is a higher cost of living. Confusion between the “data analyst” and “data scientist” definitions continue to creep into data science job descriptions. Some employers may be consciously misclassifying as a way to pay a lower salary. Others may simply not understand the difference in required knowledge and experience between the two.
A useful heuristic (or rule of thumb) to keep in mind as you research data science jobs within the logistics sector is, per Glassdoor, the yearly salary range is between $100,000 (level 1 data scientist) and $183,000 (senior level data scientist) with the average gross compensation being roughly $140,000. There really isn’t a distinct salary difference between industries. Whether you choose logistics or healthcare, the salary range still hovers within the figures listed above. However, averages are sensitive to outliers, and there are large companies that don’t pay their data scientists a six-figure starting salary. Yes, the potential high salary is alluring. But, it’s not guaranteed.
Remember, you’re interviewing the employer as well. Jumping into a job just because you’ll have the data science title increases the likelihood of you ignoring or misapprehending whether the employer is the best fit for your current work-life balance and your future financial and career goals. As a matter of fact, throughout 2018, a slew of articles and blogs were published that bemoaned the reasons why data scientists were leaving their jobs. All too many would-be data scientists are drawn into data science because it’s the “hot” job (for the moment) and research firms, as well as marketing departments, push splashy headlines about AI and the impending “lack of data scientists” by some future year, e.g., 2020, 2024, etc. A majority of those headlines and many of the research reports lack the word “qualified” along with leaving out a thorough description of what a qualified data scientist actually is. As such, do your due diligence when hunting for data science jobs in logistics (or any data science job, for that matter).
Does the employer have a clear understanding as to how a data scientist can provide value to their company? Are you mainly running SQL queries and building Tableau dashboards or are you building predictive models and translating those into algorithms which you are then codifying through a programming language? Will you be tested on your data science skills as part of the interview process (this is highly likely)? If so, what are they testing, exactly? If it goes back to SQL and Tableau with some Excel thrown in, but there isn’t anything included that examines your process for building, say, a machine learning algorithm that will be transferred into a larger production environment (this is just an example and is not wholly representative of every possible type of data science “test”), then you are probably interviewing for an analytics position as opposed to a data science job.
How Data Science is Revolutionizing Logistics
Although predictive analytics has been around for quite some time, the new flood of data (and what to do with it) that’s carving its way through various channels has prompted a revival via machine learning algorithms. But, not all industries have completely latched on to data science beyond the usual analytics phase. There is, indeed, some forecasting within data analytics, but to a limited extent. Big data is a messier affair: unstructured, structured, semi-structured, choosing which features are the most impactful (feature selection and feature extraction), pulling millions of data points (or more), data prep, selecting the right algorithm or set of algorithms, testing the algorithms accuracy, etc. Summarily, the supply chain and logistics industry has been slower than other sectors when it comes to utilizing the power of data science. As will be described below, this is changing.
In a global marketplace, where vendors and customers are dispersed throughout the world, and just about every action we take can be digitized through IoT devices, logistics and supply chain companies now realize that the massive amounts of data they collect can be leveraged to improve efficiency, lower costs, and boost revenues.
Self Driving Trucks
Even though Amazon has appeared to master the art and science of speedy product delivery, there is a truck driver shortage in the U.S. The emphasis here is on “truck drivers” as opposed to your local Uber or Lyft driver. At some point in the future, when Google or one of the other massive tech companies refines self-driving cars to the point where they can account for the unpredictable aspect of human decision-making (e.g., intoxicated drivers, pedestrians, etc.), self-driving trucks will be the next step.
Machine learning and AI are now part and parcel of being a data scientist. In terms of data, specifically, human drivers automatically sift through a variety of analog inputs (if they are good drivers): when to speed up, when to slow down, monitoring the cars around them, following local traffic laws, changing lanes, parking, being mindful and avoiding distractions, etc. Machines do not (yet) know how to do this on their own. It comes down to data scientists and engineers supplying all of the relevant algorithms (if X happens, then do Y) to work in tandem. This is easier said than done. It’s expected that AI will expand beyond the algorithms implemented by the data scientist and make decisions on its own as to when an action such as slowing down is the optimal choice.
We’re not at this point yet. In fact, the companies that are in the process of engineering self-driving cars have slightly shifted their focus to implementing “assistance features” that still require “human oversight or interaction.” For instance, if the driving assistant notices that you’re distracted and not paying attention to your driving, it will notify you. At first glance, this may seem like a step backward; but, it isn’t. Ultimately, human drivers are teaching the assistant about how humans operate while driving. This data can then be used to create safer self-driving vehicles, including trucks that transport pallets of products.
Smart Warehouses and Market Forecasting
Transportation is just one aspect of supply chain logistics. Products are stored somewhere — a warehouse — until they are shipped to their next destination. Perishable goods (e.g., produce, meat, pharmaceuticals) and hazardous materials require a specific ambiance including temperature and packaging. Furthermore, although a certain percentage of spoilage is accounted for beforehand, product wastage and damage not only cut into profits, but they can be life-threatening to the end customer.
Logistics and supply chain data scientists can help devise a smart warehouse system whereby automated alerts are set up for warehouse temperature sensors. The temperature can then be automatically adjusted for different areas of the warehouse, or authorized personnel can manually shift ambient conditions.
Additionally, by using predictive analytics, a skilled logistics and supply chain data scientist can provide more precise market forecasting to track supply vs. demand thus decreasing loss due to an oversupply or undersupply of inventory. Such forecasts can also be automated and extended to vendor analysis, e.g., predicting which vendors are the most reliable with regard to payment, pickup, delivery, complying with national and international regulations, etc.
Skills and Tools of a Logistics Data Scientist
Although logistics data scientists deal with a wider variety of data from multiple input sources (e.g., sales, customer service, enterprise resource management systems, warehouse management systems, vehicle tracking data, geo-location data, carrier information, sensor data), the expected skill set is the same: advanced math and statistical knowledge, data mining, data cleaning, algorithm construction, and expertise in a programming language (usually Python or R).
As with all industries, there is a set of particular tools or software programs that are unique to the sector: enterprise resource planning (ERP), supply chain management, transportation management, and materials requirement software are a few examples. If you have absolutely no experience with any of the software particular to this industry, but you do have the necessary data science skillset, then it’s likely to be more difficult for you to get your foot in the door as a logistics and supply chain data scientist. However, it’s not impossible. The issue here is getting past the human resources gatekeeper (assuming you’re applying for the job as a “cold” applicant through either the company’s website or an intermediary job site like Indeed or Glassdoor) who likely receives your application or resume after it’s been scanned by an algorithm for the “right” keywords.
How to Become a Data Scientist in Logistics
Once you have achieved the requisite data science skill set, usually this means you have a degree in a STEM discipline and X number of experiential years (which varies depending on the employer), then taking additional coursework in supply chain management or logistics is highly recommended.
The least expensive way to gain knowledge of this sector — other than being an intern — is to take one or more courses offered by a MOOC. Coursera and edX both offer courses in supply chain management. You can audit the courses for free or pay a per class fee to earn a certification. Notably, edX has developed a MicroMasters® Program in Supply Chain Management where you’ll take six courses, taught by MIT professors, including Supply Chain Analytics, Supply Chain Fundamentals, Supply Chain Comprehensive Exam, Supply Chain Design, Supply Chain Dynamics, and Supply Chain Technology Systems. The current cost is $1,080 ($200 per course), and you’re eligible to earn transferrable graduate credits if you decide to complete a masters degree at one of the preapproved institutions. Just a quick side note: logistics is intertwined with supply chain systems. You’ll be learning about both and how they interact with one another.
Alternatively, if you’d prefer to acquire more extensive expertise in logistics and the supply chain, there are Masters of Business Administration with a Supply Chain Management and Logistics concentration available. If you’re not yet a data scientist, and you’re considering a data science degree program, then taking elective courses in business, logistics, and supply chain management will move you closer to quickly entering this industry. Finally, as a data scientist, you should expect your learning to continue. With global competition nipping at the heels of every industry, companies cannot rest on their laurels; they will need to create or use advanced technologies to stay ahead of the competition. Also, as we get closer to automating complex, data science-like tasks through the use of AI, data scientists will need to expand their knowledge into areas that are currently beyond AI’s capabilities.