Jekyll2023-10-22T14:02:55-04:00https://mss423.github.io/feed.xmlMax SpringerApplied Mathematician at the University of MarylandMax Springermss423@umd.eduMitigating Biases in Machine Learning2023-05-17T00:00:00-04:002023-05-17T00:00:00-04:00https://mss423.github.io/blog/2023/02/fair-cluster<p><em>This article was originally written for AI Hub. <a href="https://aihub.org/2023/05/17/mitigating-biases-in-machine-learning/">See the post here.</a></em></p>
<p>Machine Learning (ML) is increasingly being used to simplify and automate a number of important computational tasks in modern society. From the disbursement of bank loans to job application screenings, these computer systems streamline several processes that have a considerable impact on our day to day lives. However, these artificially intelligent systems are most often devised to emulate human decision making – an inherently biased framework. For example, Microsoft’s Tay online chatbot quickly learned to tweet using racial slurs as a result of the biased online input stream (Caton and Haas 2020), and the COMPAS tool often flagged black individuals as more likely to commit a crime (even if two individuals were statistically similar with respect to many other attributes) (Flores, Bechtel and Lowenkamp 2016).</p>
<p>Crucially, these issues are not the product of a malevolent computer programmer instilling radical beliefs, but rather a byproduct of machines learning to optimize for a particular objective, which can inadvertently leverage underlying biases present in the data. These biases are most often the result of two (not mutually exclusive) pervasive issues in data science: bad training data and objectives that are blind to demographic (or comparable sensitive information) considerations. The former is nicely depicted again by the Tay chatbot: training on text from anonymous online forums is unfortunately not a good representation of how the general population talks. Additionally, when models are trained to optimize for an objective that ignores demographic considerations, discriminatory practices can inadvertently be learned, as was the case for the COMPAS algorithm. The algorithm was trained to predict future criminal behavior but inadvertently learned to weigh an individual’s race heavily in its decision-making process.</p>
<p>To provide a clearer understanding of how these ‘‘blind” objectives work, we here restrict our attention to the process of clustering data. This involves grouping objects so that those in the same cluster are more like each other than those in other clusters. Data scientists use cluster analysis to garner insight from data by merely observing which cluster a point belongs to and assuming it shares the typical properties of that group. For instance, if we represent a patient’s medical data as a set of relevant numbers such as height, weight, and age, we can visualize the data as points in a three-dimensional space. We can then use standard methods to cluster the data by identifying data points that are close together in this space. To do this, we minimize an ‘‘objective function” that measures this closeness. However, this process may inadvertently segregate men from women, as these groups are likely to be similar in terms of height and weight. This is a toy example of the inherent biases that clustering algorithms learn which can have a significant impact on the treatment that a physician provides if not recognized.</p>
<p>While under-representative data has a (somewhat) clean solution in improved collection procedures, mitigating bias as a result of objective minimization is less clear. In an effort to counteract this issue, researchers have recently turned to propose algorithms that ensure some quantified form of ‘‘fair representation” at the cost of a degradation in the optimality of a clustering. Consider the following figure that depicts a traditional clustering result versus what one would deem ‘‘fair”, where the two colors are representative of some protected attribute such as gender or race. We see that the standard (blind) solution may yield clusters of data points that disproportionately segregate the data based on an implicit attribute, leading to potentially discriminatory or even illegal behavior by the algorithm. A more desirable solution in practice might yield clusters that contain equal representation of the two protected attributes while also mitigating the intra-cluster distances between data points.</p>
<p align="center">
<img src="https://aihub.org/wp-content/uploads/2023/05/fair_cluster-1536x1328.jpg" alt="faircluster" width="500" />
</p>
<p>In essence, balancing solutions that minimize our objective with those that yield equal representation is the crux of modern fair clustering work. While in the past few years a considerable number of studies have examined fair flat clustering (partition data into disjoint clusters), the major problem of hierarchical clustering has been largely neglected. Like the flat variant depicted above, hierarchical starts with a collection of data points whose similarity is measured as the distance between points – the closer two points are, the more similar. Now to build a hierarchy, we first let each point be its own cluster and subsequently merge these with neighboring clusters up to a certain size. We can then repeat this process on our enlarged groupings until we are left with one that encompasses all our data. A major advantage here is that we do not need to impose a restriction on the number of clusters on the data, rather we try to group things at a wide variety of granularities. For this reason, hierarchical clustering is common in the division of a geographic region or for phylogenetic trees.</p>
<p>In our recent work (Knittel, et al. 2023), we examine the notion of fairness in hierarchical clustering due to its pervasive utility in data science. Specifically, we demonstrate that a relatively intuitive algorithm that strategically folds clusters onto one another according to their protected attribute representations can produce a fair clustering with only a modest decrease in the optimality of the solution (as measured by the objective function). Such results are highly nontrivial as well as vital to demonstrating the utility of fairness constraints.</p>
<p>Although technological innovation is accelerating daily, it is important to be cognizant of, and mitigate, potentially harmful algorithmic practices that can perpetuate biases. We hope that our work serves as further proof of concept that incorporating fairness constraints or demographic information into the optimization process can reduce biases in ML models without significantly sacrificing performance – a result that is critical where automated decision-making can have a harmful impact.</p>Max Springermss423@umd.eduThis article was originally written for AI Hub. See the post here.Matrix Multiplication: Resurrections2023-04-10T00:00:00-04:002023-04-10T00:00:00-04:00https://mss423.github.io/blog/2023/03/matrix-mult<p><em>An understated breakthrough in AI research.</em></p>
<p>On October 5th, the research group DeepMind resurrected a 50-year line of work on the computational efficiency of matrix multiplication – a fundamental calculation at the heart of all procedures in computer science. In their latest Nature publication, DeepMind demonstrated that an artificially intelligent (AI) system can make mathematical discoveries and break barriers researchers previously thought were impenetrable, showcasing a staggering achievement for AI as it continues to surpass human intuition.</p>
<p>The problem of matrix multiplication is at the core of all computer processes. A matrix is a grid of numbers that can abstractly represent essentially anything, and their multiplication is representative of some change in that representation. For example, analyzing an MRI scan is reliant upon thousands and thousands of matrix multiplications. As such, it is of immense value to find the fastest method for conducting this higher-dimensional arithmetic – one which will reduce the cost of running on a computer and, as a result, save energy.</p>
<p>The costliest, or most time-consuming, mathematical procedure done on any classical computer is multiplication, which is inherently a recursive summation of numbers. Therefore, the cost of any large-scale computation is dictated by the number of multiplications one must compute. Consider the simple equation</p>
\[x^2 - y^2\]
<p>which contains two multiplications (tucked into the exponents) and a subtraction. To reduce the cost incurred in calculating this value, we can rewrite the expression as</p>
\[(x+y)(x-y)\]
<p>to obtain the exact same value with only one multiplication, thus incurring a lower cost.</p>
<p>This is precisely the insight Volker Strassen, a German mathematician at the University of Konstanz, had in 1969 when he recognized that the standard approach to computing the multiplication of two matrices is suboptimal. His algorithm which utilized the above idea required only a fraction of the cost to implement when compared to the naive method. While Strassen’s algorithm is by no means fast, it broke the barrier and left computer scientists with a fascinating new question to explore: can we do even better?</p>
<p>It’s been 53 years since Strassen invented his algorithm and we still don’t know the answer to this question. Or rather, we didn’t know until the publication of AlphaTensor in the journal Nature just a few months ago. This algorithm builds upon DeepMind’s prior work that trained an AI, dubbed AlphaZero, to play Chess and Go at a superhuman level via reinforcement learning: a teaching method that uses rewards and punishments for each move selected in the game to correct behavior. By reformulating the matrix multiplication task as a sort of three-dimensional board game where moves are representative of different calculations, AlphaTensor learned to play the game by receiving a reward for winning in as few moves as possible. Over the course of the learning procedure, the AI sifted through over 1033 possible ways of multiplying to find the fastest sequence of moves which resulted in a novel algorithm. During the learning process, AlphaTensor even rediscovered Strassen’s algorithm before identifying further improvements to the procedure. Within a matter of minutes, this artificially intelligent system was uncovering the best-known algorithms before electing to discard them in favor of further refinements that no human had previously imagined.</p>
<p>After learning to play this game-formulation of the desired computation, AlphaTensor discovered an algorithm that either matched the best-known human designed procedure or improved upon it in all problem settings. For example, multiplying a 4-by-5 matrix with a 5-by-4 matrix requires 100 individual multiplications in the naive algorithm, and the best human-designed algorithm to date reduced this to 80 – AlphaTensor discovered an algorithm requiring only 76. While this improvement may feel marginal, when the procedure is implemented across thousands upon thousands of matrices, the improvement becomes an exponential discrepancy.</p>
<p>Much like in 1969 when Strassen opened a world of curiosities on a seemingly uninteresting problem, the DeepMind group have decidedly pushed the goal posts forward and invited the research community to continue this effort. More importantly still, this result establishes AI as a ubiquitous tool in accelerating our incessant prodding at several unresolved problems in the field – a symbiotic approach to computer science.</p>Max Springermss423@umd.eduAn understated breakthrough in AI research.First Principles2023-01-11T00:00:00-05:002023-01-11T00:00:00-05:00https://mss423.github.io/blog/2023/01/first-principles<p><em>An ode to Bill Withers.</em></p>
<h1 id="first-principles">First Principles</h1>
<p>There’s a notion in mathematics of constructing a proof from “first principles”: the most fundamental facts that we hold to be true. These arguments are elegant, lacking in obfuscating assumptions or unnecessary bulk — rather they reveal an inherent nature from absolutely basic and immutable building blocks.</p>
<p>Far from a virtuoso with his instruments, Bill Withers had no option but to leverage these rudiments in writing the stories of his life. With a rare gift for distilling an emotion or story down to first principles, Withers created a collection of songs that feel timeless and universal in just <em>15 years</em> as a studio artist.</p>
<h3 id="aint-no-sunshine">Ain’t No Sunshine</h3>
<p>In 1971, Withers’ released his first studio album <em>“Just As I Am”</em> which featured <em>“Ain’t No Sunshine”</em>, a deceptively simple bluesy ballad about an addictive love. Leveraging just four chords over the course of two minutes, Withers burst on the scene with one of the most visceral and elemental blues tunes written to date.</p>
<p>The core idea of the iconic melody in the middle of the song is simple. The <em>“I Know”</em> figure is three sixteenth notes long. <em>“I”</em> is one sixteenth note, and <em>“know”</em> is two sixteenth notes. That odd-length figure goes in and out of sync with the sixteenth-step pattern underlying the rest of the groove. It’s a polymeter of the most basic type, but the execution is hair-raising. Withers falls in and out of step with the beat, perfectly encapsulating the mind whirling love he both longs for and feels trapped within.</p>
<p>Utilizing such a simple repetition that emerged organically in the original recording session due to Withers not having written lyrics for another verse, we cannot help ourselves but get ensnared in this intoxicating and jarring groove: utterly and hopelessly caught in something we really aren’t sure we wanted, or hope to escape from.</p>
<p>The song is so raw that people often misattribute the tune as a cover of an old folk tune, surely written by Muddy Waters or Bessie Smith in the Great Depression? But, if it never felt new and it never gets old — <em>well that’s a folk song</em>.</p>
<hr />
<p><strong>“You gonna tell me the history of the blues? I am the goddam blues. Look at me. Shit. I’m from West Virginia, I’m the first man in my family not to work in the coal mines, my mother scrubbed floors on her knees for a living, and you’re going to tell me about the goddam blues because you read some book written by John Hammond? Kiss my ass.” — Bill Withers</strong></p>
<p align="center">
<a href="https://www.youtube.com/watch?v=y3_Ym672_lU" title="Aint no Sunshine"><img src="https://media.pitchfork.com/photos/5e88885549503a0009419c0b/4:3/w_1280,h_960,c_limit/Bill%20Withers.png" alt="bill1" width="500" /></a>
</p>
<hr />
<h3 id="lean-on-me">Lean on Me</h3>
<p>There’s nothing inherently sentimental about C major: the most commonly used scale in music, it has neither flats nor sharps: the undeniable starting point for understanding the theory of harmonics. The scale is so natural in fact that Gounod asserted “God only composes in C major.” With Bill Wither’s second album <em>“Still Bill”,</em> he proved this long held conjecture by way of the first single: <em>Lean on Me</em>.</p>
<p>When asked about how he wrote the evergreen song, Withers highlighted “I didn’t change fingers; I just went one, two, three, four, up and down the piano.” For songwriters, a comment like this can be downright infuriating. We all begin by painstakingly learning the major scales on our respective instruments, strumming out those faithful chords in a personal mire — surely any of us could have written the song. Why him?</p>
<p>Yet, when you hear him perform, those four notes hit like a train and the answer is clear. On the first note alone, you can feel the exasperation of a life lived in coal-mining West Virginia and an uncertain but hopeful future. With the subsequent notes you feel this overwhelming weight of a life of manual labor and service, knowing damn well that the world doesn’t owe you anything. As Withers’ plays this elementary bar, you can <em>see</em> his shoulders sink into these exact emotions. As the melody undulates up and down, the piano feels like the endless cycle of a monotonous life — yet throughout, Withers’ is there with you offering that painfully obvious solution we so often forget to offer our community in times of need: “I’m here too. Would it be so bad if we did it together?”</p>
<p>—/—</p>
<p>There are so few songs in the great American songbook that have resonated across generations on such a primal level and its almost an a priori assumption that most of our greatest artists will never write one.</p>
<p>Bill Withers wrote <em>two</em>.</p>
<hr />
<p><strong>“The things that [virtuosos] do are too complicated. There’s an almost inverse ratio between virtuosity and popularity. Simplicity is <em>directly</em> related to availability for most people.” — Bill Withers</strong></p>
<p align="center">
<a href="https://www.youtube.com/watch?v=dtC1W-6hwIU" title="Lean on Me"><img src="https://www.rollingstone.com/wp-content/uploads/2015/04/bill-withers-2015-rs-feature.jpg" alt="bill2" width="500" /></a>
</p>
<hr />
<p><a href="https://open.spotify.com/playlist/7fsCkTKv6RZUmfiGu9qMvI?si=44a0592969a0419d">First Principles Playlist</a></p>Max Springermss423@umd.eduAn ode to Bill Withers.