© FTAV montage via Midjourney, and we don’t need the irony of that pointed out

It’s an algorithmic mystery box that inspires fear, awe and derision in equal measure. The simulacrums it creates are programmed to pass off retained information as knowledge, applying unwarranted certainty to assumptions born of an easily bypassed ethical code. Its output threatens to determine whether huge numbers of people will ever get a job. And yet, the CFA Institute abides.

OpenAI’s release of GPT-4 has caused another angst attack about what artificial intelligence will do to the job market. Fears around AI disruption are particularly acute in finance, where the robotic processing of data probably describes most of the jobs much of the time.

Where does that leave the CFA Institute? Its chartered financial analyst qualifications offer an insurance policy to employers that staff will behave, and that their legal and marketing bumf will be produced to code. But CFA accreditation is only available to humans, who pay $1,200 per exam (plus a $350 enrolment fee), mostly to be told to re-sit.

If a large-language model AI can pass the finance world’s self-styled toughest exam, it might be game over for CFA’s revenue model, as well as for several hundred thousand bank employees. Fortunately, for the time being, it probably can’t.

Presented with a Level III sample paper from the CFA website, ChatGPT flunks the very first question:

No! Wrong! It's A.

The question above is about Edgar Somer, a small-cap fund manager, who’s been hired by Karibe Investment Management. His value strategy did 11 per cent at his last employer and he wants to market by saying: “Somer has generated average annual returns of 11 per cent”. Not flagging here that he’s changed firms is the bad bit, whereas presenting a composite performance of similar portfolios is totally fine. D’uh.

Next question:

No! Completely wrong!

This question relates to Somer retweeting a story about a celebrity getting fined for failing to properly report investment gains. He adds, presumably in quote tweet: “A client of mine had similar gains, but because I kept proper records he faced no penalties. #HireAProfessional”.

Judged on #TasteAndDecorum there’s plenty wrong with the above but, by the rulebook, it’s acceptable. No client is named and by measures of transparency and professionalism there’s no violation, which makes ChatGPT’s regulatory over-reach comparable to that of its predecessor ED-209.

Next question:

Yeah, OK. That’s correct. Damn.

Next:

LOL, what an idiot!

The scenario here is that before joining Karibe, Somer bought some shares for his personal account in a tech small-cap that went up a lot. Everything was disclosed properly when clients were put into the stock, but Somer gets edgy about the size of his own exposure. So when a client places the highest limit-order buy in the market, Somer considers filling it himself.

He absolutely shouldn’t do this! Not because the client would be disadvantaged, however, because they wouldn’t. The issue here is that he’d personally benefit from the trade. At a minimum, the conflict would need to be disclosed to all parties, which is a thing computers seem quite bad at acknowledging.

Section two of the exam is Fixed Income and the questions are all very involved. You’ve probably read enough already of late about duration risk so we’ll spare you the details and offer an overall assessment.

ChatGPT was able to accurately describe spread duration in relation to callable and non-callable bonds. But it picked the wrong portfolio to suit a bull market and used garbage maths to overestimate by threefold an expected six-month excess return. And when its own answer didn’t match any of the options given, it chose the closest.

For the final sample question (about whether to stuff a client into covered bonds, ABS or CDO) ChatGPT claimed not to have enough information so refused to give an answer. Such cautiousness might be a good quality in an investment adviser but it fails the first rule of multiple choice exams: just guess.

Overall, the bot scored 8 out of a possible 24.

Note that because GPT-4 is still quite fiddly, all the screenshots above are from its predecessor ChatGPT 3.5. Running the same experiment on GPT-4 delivered very similar results, in spite of its improved powers of reasoning, because it makes exactly the same fundamental error.

The way to win at CFA is to pattern match around memorised answers, much like a London cab driver uses The Knowledge. ChatGPT seeks instead to process meaning from each question. It’s a terrible strategy. The result is a score of 33 per cent, on an exam with a pass threshold of ≥70 per cent, when all the correct answers are already freely available on the CFA website. An old fashioned search engine would do better.

Computers have become very good very quickly at faking logical thought. But when it comes to fake reasoning through the application of arbitrary rules and definitions, humans seem to retain an edge. That’s good news for anyone who works in financial regulation, as well as for anyone who makes a living setting exams about financial regulations. The robots aren’t coming for those jobs; at least not yet.

And finally, congratulations to 44 per cent of CFA Level IIII candidates on being smarter than a website.

Further reading:

The CFA, Wall St’s toughest qualification, struggles to regain stature (FT)
The CFA’s questionable refund refusal (FTAV)
The sun isn’t shining and it still sucks to be a CFA candidate (FTAV)
The AV CFA Meme Competition: the winners

Copyright The Financial Times Limited 2024. All rights reserved.
Reuse this content (opens in new window) CommentsJump to comments section

Follow the topics in this article

Comments