Data Science in the Financial Industry
By Kat Campise, Data Scientist, Ph.D.
The financial industry has a major impact on our daily lives. Everything we purchase — whether products or services — filters through the various financial institutions where the transactions are stored and analyzed. Businesses of all sizes apply for loans, individuals use their credit or debit cards, and speculators, traders, and investors use various techniques to either boost their returns or minimize their losses.
The rise of smartphones and digital payment options (both via desktops and as phone apps), e.g., Apple Pay, PayPal, Venmo, etc., have helped to significantly increase the amount of financial data being shuttled from the point of sale to an analysts (or a data scientists) desktop. Consequently, the data sets are massive. If you love working with hard data and have a penchant for advanced mathematics and statistics, then a career in data science within the financial industry may be just the right fit.
What Does a Financial Industry Data Scientist Do?
To be forthright, a financial industry data scientist isn’t altogether different from data science in any other industry: statistics, calculus, linear algebra, SQL, R, Python (or familiarity with an industry-specific software), machine learning, Artificial Intelligence (AI), etc. are all skillsets you’ll need to keep pace with the dizzying data influx. More than likely, you’ll be working with a team of data scientists or analysts as the amount of hard data — along with differences in what the financial institution needs from the data — can be overwhelming. It may be more accurate to ask what a data scientist won’t be doing in the financial industry.
Notably, the average pay for a financial industry data scientist usually isn’t more or less than any other data science career sector (currently, the average is roughly $136,000 per Glassdoor. There may, however, be more financial incentives that are attached to accurate market predictions. The possibility of additional compensation isn’t set in stone, but it’s worth researching should you be offered an interview. On that note, there are several areas of emphasis for financial data scientists with each relying on high levels of mathematical acumen.
Between identity theft, customers who don’t follow robust security methods for creating and guarding their passwords, and hackers constantly poking around to find weaknesses in database and data transmission security protocols, fraud detection is a crucial data science task. This is where machine learning and, eventually, AI are heavily deployed.
Be prepared to perform behavioral analysis along with natural language processing techniques (e.g., semantic analysis) to determine detectable anomalies within both numerical data and language (emails, social media monitoring, recorded calls with customer service, and so forth). Indeed, FICO, the U.S.-based credit scoring service, has developed their own platform: Cognitive Fraud Analytics. FICO’s platform combines behavior profiling (at a granular level), supervised and unsupervised machine learning, along with AI-driven adaptive analytics to quickly detect and take action on perceived threats of fraudulent activity.
On a side note, there tends to be a gray area differentiating machine learning and AI. Most experts in the field of AI consider machine learning to be a subset of AI. A quick rule of thumb that separates the two is this: AI will arrive at its own decisions and then execute an action without being programmed to do so; AI is getting to a point where it can consistently create its own algorithms. Meanwhile, machine learning offers recommendations for human action and is still human dependent when it comes to generating algorithms. The marketing hype of everything being AI is not correct. As a data scientist, you should clearly understand the difference between each since that knowledge informs the precision of your models.
Training in cybersecurity, machine learning, and AI engineering, are essential facets of being a successful data scientist in the fraud detection arena. Certainly, the cybersecurity portion can be learned on the job if you already have experience as a data scientist or as a data analyst in the financial industry.
Every trader or investor is looking for even the slightest edge in being able to accurately predict where a particular market is headed. Regardless of whether they are managing their own money or hundreds of millions of dollars in assets under management (AUM), no one is fond of losing money — especially due to human error. There are a plethora of techniques — along with a significant amount of data points as input values — that are used by algorithmic trading models (below are just a few examples):
- Trend following: trendlines, channels, moving averages, relative strength index (RSI), moving average convergence divergence (MACD), etc.
- Index Fund Rebalancing
- Time Weighted Price Average
- Arbitrage Opportunites
- News Monitoring
- Mean Reversion
As a data scientist, you’ll need to know the industry jargon and the underlying mathematical models used in the analysis. These are in addition to being able to merge the correct data points with the models while building an algorithm for auto-execution. The other issue is being able to auto-execute in milliseconds or less as markets move supremely fast. There is a caveat: high-frequency trading (HFT), specifically, has recently slowed down in terms of popularity. According to the Financial Times, the reduction in volume and volatility are the main cause of the reduction. However, algorithmic trading is still being used throughout the financial industry, and there continues to be a need for better, faster, and more intelligent algorithms for a variety of subsectors.
The best bet for entering this sector of the financial industry is to already have a background in finance and/or experience as a financial analyst. Trading experience would also be a huge plus. Those who are brand new to the data science profession may find themselves being hired as a data analyst or a junior data scientist at an investment firm prior to being promoted to a full-fledged data science position.
Without customers, there is no business. Traditional brick and mortar financial institutions are now competing with digital (online only) banks and other financial enterprises. But, the creed for any successful business is summed up in a single statement: know your customer. Distinct preferences are both a group and individual phenomenon. As such, businesses need to “listen” to their customers by collecting and analyzing data through various means: social media analysis, content marketing, customer service call logs and emails, surveys, tracking click-through rates, the success or failure of sales calls, etc. Marketing and advertising must maintain a mixture of science and creativity.
The science aspect is brought in via a data scientist being able to provide an accurate analysis of customer behavior and profile existing and potential customers. Profiling is merely a way to understand who your customers are and what they are likely to buy. For example, there are particular generational traits when it comes to risk-taking. Younger generations can be more prone to a higher-risk tolerance. The cryptocurrency (crypto) market is a perfect illustration of this trend. Gen X’ers, Millennials, and Gen Z’ers have higher participation rates in crypto than the Baby Boomers. Does this mean that there are a total of zero Baby Boomers who have incorporated risk into their investment profile? No.
As a data scientist, you’ll need to dig deeper to determine other correlating factors such as the possibility that there are fractions of within-group participation rates that may not follow the high-risk trend. Location, past behaviors, transaction history, websites that are visited, income, education, are just a few additional pieces of data that can be collected for analysis.
Some of the data points may correlate with the likelihood of purchasing a new product or service (e.g., robo-advisory services, algorithmic trading, etc.). You may find a certain portion of Gen Z’ers hate the robo-chat and want to speak with an actual human being. Product development heavily relies on both market and behavioral analytics to assess the probability of creating a new offering. Even after a product is launched, the enterprise needs to figure out how to best market the product. All of this requires data analysis including inferencing and prediction.
Furthermore, recommendation systems are a large part of automating the marketing and advertisement process. As you can probably see by now, there is a pattern of knowledge that a data scientist needs to have in their toolbox: machine learning and AI. Also having an interest in assessing human behavioral and communication patterns is most helpful for the customer segmentation aspect of the financial services industry. You’ll likely work as the data science liaison for product development and marketing. So, having experience in either of those is helpful, but not necessarily a requirement.
Automating Risk Analysis
Realistically, risk analysis occurs throughout all industries. Even at the individual level, we are either consciously or unconsciously weighing the risk of taking an action. Bigger risks, e.g., loaning someone millions of dollars, require more data to establish trustworthiness.
Financial markets provide an array of different risk levels at any given time, and the input values for risk are also varied: competitors, political decisions, the weather (especially for commodities), personal and business credit scores (creditworthiness), customer preferences, new technologies, and so forth. But, more data doesn’t mean that it’s high-quality data. With so many factors to consider, it’s no surprise that automating risk analysis is yet another area where machine learning and AI are gaining traction.
Technological innovation is now at a stage of development where algorithms can automatically assess whether or not a customer is likely to repay a home loan. At each stage of the risk analysis process, machine learning is capable of producing recommendations for decision points. Data preprocessing, feature selection, risk classification, and mapping if-then scenarios to risk patterns are quickly becoming reliant on machine learning and AI algorithms. Data scientists and machine learning engineers are expected to build these models for gathering, analyzing, and making predictions based on an increasing volume and velocity of data. This does not mean that the human aspect will be completely disregarded (or deleted).
Indeed, humans should still be able to override the recommendations as numerical scores don’t represent a total of human experience. While one customer may obviously have poor money management skills (thus representing a high risk of defaulting on a loan), another may have had a prior medical condition that completely derailed their financial situation. Exceptions can be algorithmically incorporated into machine learning or AI-based risk analyses. The key here is for all data scientists to maintain an active awareness that the models they are building can be used for both positive and negative actions towards other human beings. You’re not just dealing with numbers and algorithmic constructs. Human lives are affected by the underlying statistical models.
In terms of the expertise required for risk analysis, if you’ve worked within the financial services industry, you’ll already have a fundamental understanding of the mathematical models being used. On the machine learning side, academia continues to push forward with experimental techniques such as particle swarm optimization, derivative-free optimization, and other multi-swarm algorithms. Keeping on top of new findings within machine learning and AI should be on a data scientists to do list. Granted, you must balance innovation with practical applicability and creativity which goes back to risk assessment. Will incorporating new techniques yield better results? Or will you be going down a rabbit hole and expending both time and money (namely, your employers) by merely chasing the next shiny data science tool?
To briefly summarize, data scientists in the financial industry utilize the same skill set as all other industries. However, you can specialize by taking elective coursework in finance, economics, and cybersecurity. While all data science jobs have a math-intensive factor, the world of finance magnifies this fact while also inserting doses of natural language processing since news announcements have a certain impact on financial markets. With the added crypto market still in play — despite morose predictions from traditional financial institutions — opportunities for financial data scientists continue to be good. As you peruse through and apply for possible finance data science jobs, utilize your analytical skills to discern between employers looking for an analyst or a data engineer vs. a data scientist. The world of data science is still in the definition stage, and although data scientists are still at the center of the statistics, programming, and business knowledge Venn Diagram, emerging technologies affect employer expectations of both the value and applicability of having a data scientist as part of their organization.