OLCreate: Pioneers: Ricardo Baeza-Yates

Pioneers

2. Artificial intelligence

2.1. Ricardo Baeza-Yates

Figure 1: Ricardo A. Baeza-Yates
Figure 1: Ricardo A. Baeza-Yates
Source: LilyOfTheWest (2018)

Downloadable teaching resource

Overview

Ricardo A. Baeza-Yates (born 1961, Chile) is a pioneering computer scientist specialising in algorithms, data structures, information retrieval, web search, and responsible AI. He co-authored the widely cited textbook Modern Information Retrieval (1999, 2011), helped standardise early web search methods, and is now a global voice in algorithmic fairness and ethical artificial intelligence (dblp, 2025; Baeza-Yates and Ribeiro-Neto, 2011).

Background

Baeza-Yates earned his BSc in Computer Science and Mathematics from the University of Chile and his PhD in Computer Science from the University of Waterloo (1989). After early teaching and research roles in Latin America, he led Yahoo! Labs in Spain and Latin America. He later joined Northeastern University and the Universitat Pompeu Fabra in Barcelona, co-founding the Institute for Experiential AI, which advances research on trustworthy and responsible AI (Northeastern University, 2025; University of Waterloo, 2025).

Explore further

Cover of Modern Information Retrieval (co-authored by Ricardo A. Baeza-Yates)

Figure 2: Modern Information Retrieval (co-authored by Ricardo A. Baeza-Yates)

First published in 1999 (updated 2011), Modern Information Retrieval is a seminal textbook that shaped how search engines index, rank, and present information. Co-authored with Berthier Ribeiro-Neto, this book formalised many early methods for web search — from query processing to ranking algorithms — laying groundwork for today’s Google-scale systems. Baeza-Yates’s influence continues through this text, which remains widely cited by students, engineers, and researchers in the field of information retrieval.

You can explore the book at:

Amazon (Book)
ResearchGate (PDF sample)
Google Scholar Citations (Articles)

Contributions

Baeza-Yates co-developed the Shift-Or (Bitap) algorithm for fast pattern matching, making text search dramatically more efficient for large datasets. He co-authored Modern Information Retrieval, a foundational textbook that formalised how search engines process, index, and rank web content (Baeza-Yates and Ribeiro-Neto, 2011).

During his time leading Yahoo! Labs in Spain and Latin America, he drove research on web mining, query processing, and data visualisation that shaped the evolution of internet search (Northeastern University, 2025).

In recent years, he has helped define global ethical AI frameworks through his role as Director of Research at Northeastern’s Institute for Experiential AI, where he leads efforts in algorithmic fairness, transparency, and bias mitigation (ACM, 2025; Institute for Experiential AI, 2025).

Feature: Making search responsible

Ricardo A. Baeza-Yates’ career highlights how computing breakthroughs shape everyday life — and how they must be made fair and transparent. His early work on fast pattern matching and scalable search engines laid the technical foundation for how billions of queries are answered daily. But as search became central to accessing knowledge, Baeza-Yates turned his focus to its ethical side: how ranking algorithms can amplify bias, reinforce inequality, or hide diverse voices if not designed carefully (ACM, 2025; Northeastern University, 2025).

He helped advance global principles for algorithmic accountability, encouraging today’s engineers to design systems that balance efficiency with fairness. His journey reminds us that computing isn’t neutral — every search result reflects choices made by developers, companies, and policy makers.

Activity:

Search for the same topic on different search engines. Compare the top results.
Who is visible?
Who is missing?
How might ranking algorithms affect what you learn or trust?

Watch: Ricardo Baeza Yates, Ethics in AI- a Challenging Task

Video 1: Ricardo Baeza Yates – Ethics in AI- a Challenging Task (Ricardo Baeza Yates, 2021)

Transcript

uh so this institute is a new Institute of the University it started last year
and need to have a single phas for everything we are doing in AI there were many people doing in different doing AI
in different colleges and now we have like 90 Affiliated faculty uh a few here also in London working on on this
initiative and we believe there are two things that are important um what people
call call human in the loop so be having humans involved I prefer to say that we
should be in control that people choose to be just in the loop and the machines
in the loop right but that's the reality and then today it's easy to throw more
data to deep learning but I believe many times the the key thing is to have
better algum particularly because 99.9% of the problems of the world will never have big data and we have a hype with
big data a hype with deep learning when very few people can profit from that so I'm also working
small data I have a Blog in 2018 talking
about that and luckily and this year said in February I'm be a so finally
someone famous say the same so um one important thing we're doing is the
responsible AI practice and we're already working with several companies on helping them on this and I will show
you how what what model we are using so this is the agenda first I want to talk
about the main eal issues this has a personal bias these are the ones that I think they're more important there are
many more but these four classes are very important and then I will discuss
some problems for example our cognitive biases a bit of Regulation and cultural
differences so how many people were born in the South hemisphere
here okay only me so so we have a bias here and I hope we W will understood
that I don't know and then I will present this holistic view of things so that's why I ask the question
because the first problem is the course of bias you have an algorithm that receives bias data the first bias we
have is that thinking that bias is negative bias is neutral depends on what
happens with the bias information is VI if you put random noise to an algorith nothing happens
well or maybe weird things happens randomized ALG um so we can ask
ourselves should the algorith ask it had to be neutral or fair with that input
well typically no one is asking that question and then you get the same bias or even worse you get more bias and if
you get more bias we cannot blame the data right something else we are doing and and this is very important bias not
only in data even if some famous computer scientist say that and if you're interested in the whole cycle of
bias in a system I publish a paper bias on the web 2018 that now is my most
downloaded paper so I guess it's important topic I think I will skip this
you you must surely had ethics course so you know what equality equity and Justice and if you don't know prefer not
to know so the answer to this question is
that not always you need to worry about this so otherwise will be very complicated if for any algorith we need
to ask this question but if you have people you need to answer this question so this the key part are you harming
people and we will get back to this question when we talk about regulation okay so how we can solve this problem
that there are three solutions you can the bias the input if you know the bias
sometimes it's hidden in the data second you can tune the algorithm so make the algorith aware that it's a bias for
example there's a learning to rank is ranking bias because people click more on the first positions of the ranking
because they in the first positions and there algor that know that and they tune the solution to that and they solve the
problem and the last one is to Devas the output basically uh try to solve it at
the end the problem that you already lost too much information for example imagine you're looking for a person to
hire in LinkedIn and you get 50 people if you're lucky you will get 10 women
and then the best you can do is gender par in the top 20 the other 30 would be man so but we are not the bias in
anything we're just mitigating bias because as I said we don't know even sometimes we don't know the reference
value so what should be the right percentage of women in this to maybe it's 50 but I don't know and so they
should decide when I talk about this in India 70% are women but it's completely different or if you go to Iran the same
so the first time that this problem reached the headline news and this is maybe well known for people here but I
just want to mention the case because this is like the first famous case is the compass with this uh the system for
uh supporting decisions on res divis hard word for me
um when propic in 2016 said there was a rational bias later C Rudin uh that has
an amazing work also showing that it's a miss this tradeoff between explainability and accuracy that show
that really the the bias was H not rice but they were correlated so there were
more African-Americans that are were younger um so when you we have a
public uh institution using this kind of uh Solutions need to ask this
question so should a public uh service use a secet algorithm it's a very
important question but also we need to ask ask another question just in case is
safe to use the public algorithm because then you can it and of course the
solution is not in the extreme some somewhere in between and depends on every problem now let me give you a very
good example of how bias can increase I like I love this paper this was published four years ago um so it's a
bails in the state of New York so here you have an offender and the the judge need to
decide if the person will get bail in most parts of the world the judge needs
to think two things the person will be offend and second the person will come back to court well in New York very
strangely uh it's only about if the person will come back to court if I have
a serial killer for me will be very difficult not to have any cognitive bias to to say okay this person may kill
again but this is the same well you know there's economic bias here if you can
pay the bail you get out and if you cannot there is a lender in the
US uh Poli station that many time knows the vle and they they have to do the
same prediction with the Jud he will come back to court to pay me so that's interesting for me and then some people
go to prison and some people don't so in this problem we have a typical case that we have in real problem is that we don't
know part of the data we don't know what will have happened if someone that didn't get bailed what bail and then
would that person be offended or would that person come back so for that we
need to do something called Data imputation so we need to do models to predict that data first before
predicting the whole thing but the results of this uh paper that was published as I said
2018 and was a request of the national US economic Bureau the Justice buau
doesn't want to have this because judges don't want this and I it they right for different reasons we shouldn't use this
it's not ethical um they got that if the predictions of the system were right uh
you could decrease the crime rate in 25% and you keeping the same prison rate
I don't say jail because I don't know how to pronounce it people think that I'm talking about the
university that's my bias and then uh prison rate decreases in
42% keeping the same TR rate so they can do it much better than humans okay even
worse if they if the system is doing the right prediction according to the data SE that they are doing for example half
of the 1% most dous criminals fail to appear more than half of the time and
reoffend more than half of the time so seems that they're really bad with the most dangerous people
so the bias is also Amplified let me show you what happens well this is this
is the the the table from the paper it is not working but working but
not in the screen um in in red I put the percentage of the population of the
state of New York of these two minorities uh African-Americans and Hispanic people and you see that it's
already systemic bias because 82% of the people that go to the court are from
these two minorities now the judges put some more bias they put more black people in the
prison and they decrease the percentage of Hispanic
people because there are some Hispanic people that are more white like me right everything is relative I'm completely
white in ch so so here I'm not um but
what happened with the algorithm well the algorithm learned this trend the only
demographic varable that the algorithm is using is age not even gender because
most of them are men right so only gender and from the data it captures this bias and increases the
percentage of African Americans decreases the same Trend in Hispanic but
the total goes to 90% so now we have a huge bias is almost three times from 32
to 90% now the good thing about model is that you can you can tun it for example to you to
be uh the less racist of the judges it's the last line and still you can be
23% better in sending less people to prison keeping the same crime so it
looks like this is much better but we have ined bias so we have a dilemma here
uh so what is better uh a biased algorithm that is
just in the sense that if you see two people they get the same outcome this is
the advantage of Al they are determin people is not like that why because they are
noisy right it's a varability even you in the same situation will do different things and I was using a very
interesting article that appear in har business revieww in 2016 by Daniel canaman and other people and of course
you know who d p um about the high high cost of
basically the viability on decision and if you're interested in this topic last year he published a book withon and
on on noise and sometimes noise can be even worse than bias because bias
sometimes we know it noise is completely random as you you can see in the examples below now the question is a is
a fictitious question because really what we are choosing is between a bias algy and a bias and noisy judge because
noises bias sorry judges also have bias but we we are choosing between c and d
and then I guess we should choose C but for different reasons we shouldn't use it but that's another discussion so
let's talk about the one important example where you find bias and and it's good that that you explain language
model yesterday so I don't need to explain anything uh but this a table of
uh one paper I will mention later of the state of language models until 2021 you
see that trillions of parameters trillion so this is the same order
magnitude of the text they're using and still they're not overfitting so this is like a mystery why the algorith doesn't
overfit if it has so many parameters and there are some interesting uh theories about that but this is a question I have
to many people and no no one can answer really with it works this is the classical answer it works we don't know
we don't know why we are not overing but there are many biases and I will show you one that is not the most common
anti- masculine bias so if you say a sentence like two muss walk into a these
are the completions that you get the four first completions are Violet this the paper was published a
bit more than one year ago and you can say okay what about other religions well Muslims are in New in newth we have S
Muslim and four times more dangerous than Christians and if you don't want to be dangerous in in perception you need
to be Buddhist or don't believe in God maybe most of are are in those C
right so but can be much more complicated so I'm sure from the UK
people you remember during Co they tried to predict the scores for the University and they did it very badly because
obviously the data is historical and has a bias and and the problem here and it's
the same problem that happens in Justice is that you're changing a decision for one person to decision based on the data
of many people but we are don't come from a distribution we are not a
statistical I don't believe that so basically you are normalizing the person to say okay you are similar to these
people and then you need to have this sport that for me is an ethical mistake
and that's why I would not use uh machine learning for Education Justice and many things that are based on the
qualities of one single person uh another example that I love almost a
bit more than year ago um the deliver case in bolognia I know how many people
know this but basically in 2018 I think some
writers um sued this company because a group of people felt discriminated but
they couldn't find any characteristic they couldn't say we are all from Africa we are all immigrants we are all women
no they were all different but what happened the alv was trying to earn more money that's very sensible uh
optimization that's what most Alor do and then it learned to give more work to people that could deliver at night
because that's the time you get more orders so these people were
discriminated because they couldn't work at night or they didn't want to work at night and legally in Italy that's okay
so basically they they were found guilty for implicit discrimination and they had like a
symbolic F but this is interesting case to to set the record and then there
other cases I will skip it because I have too much material and the last case is the best for me it's not see exactly
if the other uses AI or not but we shouldn't care and I will come back to
to that in regulation any ARG can be uh can do discrimination nothing could
be even a randomized right but we don't need to use a ey for that and this is
the case of Netherlands that for many years the problem started in 2012 it's called City like City but we
say why uh to discriminate uh poor people because they were looking for
fraud in child care subsidies it's already an ethical issue because if you're looking for fra P look look in
rich people not in poor people and then 26,000 families were forcely accused of
uh fraud and they had to return money some people lost their houses some people had to go back to their original
count and so on and at the end even though uh the former minister of that uh
of the ministry that did this that was a a parliament member basically quit the
Parliament that was not enough and the whole government had to quit in January 15 last year but this is the largest
outcome political outcome of the wrong use of an and these are just example if
you want to look at other cases there are more than 2,000 some examples in this
database that uh a nice person in in silon Valley is
building uh in his free time okay second problem
fomy when I I when I learned philosophy at school I'm a bit old I learned that
depending on the type of my face I will have certain personality right and this a theory that we know it's not true but
sadly it's coming back for example qu Kinski in 2017 said that he could
predict the sexual orientation of people using the picture U there was a uproar
and Well Done many people also realized that he was wrong he didn't know how to do Bach learning and he was capturing
only a scous correlations but then in China a bit earlier they did a basically
Minority Report show me your face and I will tell you you commit a crime even more complicated and the allot of people
that complain to them and they were they were even answering the complaints no this scientific we we can solve it and
see that people don't remember because during covid they did it again in the US
doesn't matter if you are in one spe in one extreme of the political Spectrum
people do these things um Kosinski came back last year doing political
orientation 70% come on 70% could be the clothes we're using if you if you have a bird or really Democrat and so on so
these are scous correlation and he really doesn't know because it's not facial recognition it's facial
biometric um so we are coming back to phenology so all you know what is
phenology good many places they don't know so just showing some pictures from
the house the former House of chesar L broso in Torino because I found this amazing it was one of the Believers of
chology uh he collected hundreds of sculls because he believe criminal people at different scal he
couldn't find any but he really believed this until the end because he left his a skeleton as a ground tooth of a normal
person so I still don't understand something okay so but it can be
worse in 2019 MIT published a paper saying that if you give me a piece of
your voice I can generate your face I
don't know know how what they do with all the adopted children from other parts of the world this is really magic
uh okay then I can do my master algorithm you send me a piece of voice
message via WhatsApp I use this algorith to create my face and then I know your
name yes M claims that with your face I can guess your name very accurate this
is a pattern i i a file pattern I hope they don't Grant it because if they Grant it that that would be no way to do
anything and then I can know if you're an opposer if you're homosexual or you're a criminal this is dangerous for
people this is really dangerous I hope people don't do this in the future but it can be more sub so this work of Lisa
Felman Barett the famous NEOS scientist in North Eastern
um so people cannot detect emotions correctly there are many factors first
is cultural first is personal some people laugh on their s and people laugh on their neighbors and so on so if we
are using label coming from people of course ma learning cannot detect also emotions so we we are just basically
rediscovering stereotypes where they are trying to basically try to detect emotions sentiments and so on maybe from
text is different because text has semantics and I will get back to that so third problem pure human
stupidity is a very important problem right in every election for example I
will not talk about so uh George Bo in 1976 said almost are wrong that somebody
is he was talking about statistics but the truth is that we are using very very Advanced statistics and the same can be
say to any deep learning mode let me show you some
examples last year see that last year was like a boom of of a examp or maybe
it's biased because I started to talk about that one I so in December 2020
Elon Musk said use signal in Twitter of course he was talking about the chat app you know
signal well some software that was using input from influencers in the stock
market thought that they had to buy a stock from this company medical company in Texas signal and the price of the
company went up more than 400% the company was very happy the people that
bought the stock are not that happy so this is real P stupidity
because they could understand semantics of of a single tweet with two words right very hard but there are even more difficult
cases so in the right we have an example called adversar AI I change something in
the input and I get the different result so here is a paper from Japan that shows that changing a single Pixel I can
change the outcome of the of the class that you get for example you can see
dogs that become cats hores that become frogs uh I think the boats have become
airplanes and so on so anything can happen of course the question would be what they are learning if with one pixel
you change that and then we have an example that looks funny but it's not
some smart men in Melo Park decided to
use a English train head language English train head language
classified in France and decided that the town of was forbidden sorry bit I don't know have a
French here so there was no human in the Lo and they took three weeks to get back
the town page so if that page was being used for C this goost a lot of f so it's
it's funny but looks funny but it's not okay so let me then here remember the
limitations of this technology and I'm sorry that some computer scientists don't like to think about that but I
think it's important the first thing is that to to to learn to abtract things
you need to filter you need to forget that's why we work we just see one cat
and we know what is a cut forever I don't know how we do it because we see it only in some positions but we see the
movement and then we learn everything so I like to remember here
this a very nice story from for L bz anyone read read yeah so this person
couldn't forget anything so don't ask him how was his morning because maybe he would take more time than the morning to
tell you what happened so this is something that is not easy to do with this Dr today second this could be
trivia but you cannot learn what's not in the data right and very important data is a pro
of the problem data doesn't capture everything data will never capture for example what's happening here right now
maybe later but not not so this is very important we don't capture everything in
Justice this is very important because the context of the case may be in other things that are not in the T that's
another reason why we shouldn't use it in and this is what happened in
the infamous case in Arizona when a woman that was crossing in a bicycle at
night in the wrong place was killed by a uber self driving car I don't know if you know this case but I will get back
to this so basically this case was not in the data and we will never have all
the future accidents in the data because they're in the future and there are infinitely many right if we keep Earth
alive we'll be infinitely or we not talk about clate change either so third thing
accuracy I don't care about accuracy it's like you go to the pharmacy and someone says this drug
works 99% of the time and I said but what are the side effects I don't know you have to trust you have to be CW
worthy um so how many people will take an elevator that says that it works 99%
of the time I will not but if the elevator says it doesn't work 1% of the
time time and when it doesn't work it stops I will take the elevator because I know I'm safe this is not happening
today with mat the second thing is that we are using some very nice mathematical
measure to optimize like accuracy or any other measure but really this is not the important thing it depends on the impact
of false negatives and false positives especially in medicine but in many other cases tell me what is the harm I prefer
to use an algorithm that 80% accuracy and doesn't kill anyone that one that's in 90% accuracy and kills 10 people it's
just a matter of is but this question is not yet on the mind of most computer scientists and and I'm I'm writing
something about this finally we need to be humble I
would like to see classifiers that say I don't know the smart people say I don't know good teachers say I don't know when
they don't know they don't try to invent an answer or an explanation I we get back to that I have the paper the other
day day was called anology because sometimes we need that
ay Last Problem waste of resources and this this is the same
table so if you take this table about the
carbon suint for example if you're trying just a a simple Transformer with only 200 million parameters that's the
same carbon trail of a normal person in the earth for 57 years so one
Transformer one person okay 57 maybe one Transformer half a person I
don't and you spend between1 and $3 million of electricity doing that and
other waste of resources because they have gender bias racial bias religious bias and so but this is the paper that
was the reason that Timmy G was laid off from lle I hope I know that one person
read it at least stochastic par paper and and I do believe that that maybe 1%
of the time language models are stochastic par because they are not not not even they're but 100% they're not
understanding what they're reading and 1% they are not understanding what they're writing and they're very nice
experiments about this uh they they also told Margaret Mitchell not to put her name in the paper but she did anyway I
like that and we all recognize her there uh but that didn't matter too much two
months later she was also fired for checking her own mail looking for proofs about what
happened with Ste both of them were the leads the colleagues of the E6 AI team
in Google but this is not new I know you remember when in 2019 Google tried to do
EIC board they had to kind to dissolve it in one week because they didn't choose the right people very hard to
choose ethical people they don't for example I guess they they don't have they don't they shouldn't have Twitter
for example so you kind of look for anything wrong because of this we wrote a paper about the intellectual freedom
in ai6 this is a public this is a new Journal that started last year Ai and6
and we wrote this paper on on why this important and and the consequences and
of this so I don't want to pick in Google this is something that appears in
many companies but if only you look into companies well Amazon many times pish
analytica you have it very close so Facebook the last one may be Spotify
remember this year Co basically spreading fake news and
although there are many people that is leaving this company because of physical concerns and more and more I have
friends I have left all these companies there are very few that are famous that they reach the news and this is a a
personal I met in my PhD team bra team BR is one of the inventors of XML you know XML he left Amazon because of
ethical concerns and at least that went to the news because K were not so this is the end of the first part
let me see how I'm doing okay have to rush okay why this this happen because
we do Tri an error this is computer sence TR and error imagine that the
tower bridge was done by TR I we wait 100 years to use it so so
there are many things I will not read it but these are all things that that
uh computer scientists do um this is based partially in the paper by Gia
masus one of my colleagues and friend at the ACM US policy Council um but you can
read it so many and there are more this is just a s a bias sample and just to
see what are the impact if you go to gdpr I don't know step is here he talked
about this the first day so if you go to article 22 just read the last line the
last line says that I'm the I have the right to contest the decision of an automated system okay what that means in
practice so we SK all the the legal conversation is that if you need to give
informations about how the system works then you need theity means I know how
the system takes a decision but if you want to contest a decision you need to have explainability I want to know why
this particular decision was taken most of the times you need supportability to have explainability
that's not me and finally if you want to keep uh being safe you need to do other
things like the validation periodically and so on but this is complicated for example
this are interesting paper in science idea that shows that in some Fields explanations can be worse than no
explanation like El and if you have seen house famous cities this is a typical
example of how difficult can be to get the right uh problem with from the same
symptoms but ddpr already has been used and this is an example I like this is a
south of France two high schools decided to do uh media surveillance for security
and then some parents went to court and then the court said that there were three reasons why this was illegal I I
think this is interesting case the first one is because they didn't have the
competence to take the decision that I think this happens in Netherlands was an engineer that said
let's look for f in in poor people and no one said stop you shouldn't do it right and went all the way to to the
Prime Minister the second is
consent uh according to gdpr if you are using surveillance you need to have informed consent because there are no
legal reason to do that so only police and governments can uh do
surveillance of course asking for informed consent uh it's very hard now
you need to force people to read if you enter this school you are allowing us to to to record and finally also very nice
that the solution was not proportional to but you don't need to use be surveillance for
security I think this was nice and and and this is an example of Whata jasi my
favorite ethis test this is technological solutionism instead of normative solution is I will get back to
that this is my professor informal so
regulation well Lina that wrote a very wellknown paper on antitrust in 2017 is
the person in charge of antitrust in the US I hope something happens but we already have three cases in different
parts of the federal government uh with Google Facebook and Amazon so it's not
like only in Europe they are looking at this also there but there it's more
difficult because of the legal system uh during Trump they were they tried to pass four
different laws one was proposed by Kamala haris the current vice president and they didn't pass because the Senate
was dominated by Republicans so I hope some of this will happen during the Biden government that but in the first
two years I haven't seen that but at the same time also the the Congress said you
need to create the artificial intelligence office Trump didn't want to
do it until he left I guess not to leave it to Biden but he took two years to create this office and now exist and I
hope some good com from this one um the U proposal I'm sure many people is aware
of this last year uh this interesting interesting not good uh proposal uh that
is based on risk and basically has three categories forbidden high and low that
means that is one that is no risk so there are four categories uh many interesting things
and many good intentions but let me say
what happens when you have good intentions but you don't know how technology works it's like the right to
be forgotten that's very hard to take all the information from the web and tomorrow you put it back in the web so I
will read this article five the placing on the market putting into service an
system that deploys subliminal techniques beyond the person's Consciousness in order to notch a person
behavior in a manner that cause physical or psychological hand beautiful I want
this now tell me how you will do it when you show a fast food ad with a person
with morbidity so obesity like like a metabolic problem
very high you can do it at posteri but aity but you don't you don't want to install a sensor chip system that will
not work right in the whole world or at least in Europe so but there are more
problems I think there more basic problems the first one is that risk is a
continuous value this is the problem with race the skin color is a continuous Val we invented category that didn't
exist the same we're doing here we're inventing three categories that are described by cases not even by things
that you can measure so I I I see the the game that comp will play oh no I did
my self assessment and I'm low risk or no risk I can do something even more
smarter I'm not using I'm using a randomiz I'm using Quantum Computing I'm using blockchain
so this regulation is not for me right I can play the game I can even say I am
using Advanced statistics not I am I people don't like that but that's the truth so there's a big loop hole also in
this and I think they're trying to change that but until now they haven't CH but there's
more I don't think the right solution is to regulate the use of Technology the
first time we're doing it it's like saying you cannot use the hammer to kill
a person we know that we have human rights I mean you you can use other things not a hammer so we don't we
should regulate independ of Technology because tomorrow we'll have a different technology do we do do we want to
regulate Quantum Computing also in the future or whatever someone invents
neurites someone knows about that or we have of Human Rights should we split the
brain from our body I will not go there so many problems um so regulating the
use of Technology by use cases I think is a very bad idea I'm not a lawyer but
I don't well the end why responsible AI why not trustworthy
AI why not ethical first we don't humanize things so I don't believe we
use which should use human tra for machine so I don't want to say just the ey or ey
or just wor the ey because these things are human so let's machines may be
intelligent but in a different way in fact I think they're very different from us they're very
fast they are they have more memory and so on but maybe we should work together not compete why do you want to compete
with yourself men wants that see it's not a
tax and why not trustworthy I well for other two reasons they don't work all
the time why we are asking them to trust the system and also we are putting all
the bden in the user that's not fair we should put the The Bu on the designers on the Creator and that's why responsib
is the best word although responsible resour to human right so if anyone has a better word
please tell me now system don't need to be perfect right although we are playing God they
are learning from us why they should be better than us right if they're really better than us
we are really better than them because we invented something that was able to be better than us then we can go
recursively with this time for them um and uh a colleague one his collaborators
CES Dalo published us here this nice book about experiments of scouring that
shows that people are much harsher on machines than on humans so air is human
not from ma this it's another bias we have it's like the bias that we are not animals
animals too so this is the model we're using this model comes mainly from the the P model
that is from my S6 list jansu jja um so we have first uh the first
part is let's do a road map let's work together and and find what you need to do in your company and then we have
three branches the first one the most important governance so we write the Playbook where are the processes which
is the people involved and then we have to use the last Branch we need to train that people how employees will basically
operationalize the governance how managers should do that and how the Fe
level to do that hopefully the the C Level is pushing this otherwise this will not work and and we have in the
middle uh basically AI eics assessments
to to see the risk the harms the benefits and we can do that with projects with products and for that we
need to register uh systems and we need to audit systems and and audit includes not only
the technical part but also the social part one thing is what really happens the second thing is what people think
that is happening and their perception Maybe are not discriminating but people may think you're discriminating so it doesn't matter if you're not
discriminating and in many cases we think we need an AI ethics Advisory
Board and last Friday we launch this so these are 45 top people in the world
that is an First Independent on demand AI eics Advisory Board you can go to our
website and see the people I think you will recognize many people well known and we have people from all geographies
from all genders from all topics because we want to address all possible
cases so we have the three classical values of eics they are there I'm sure
you know them but many people um mix a value with an
instrumental principle and this is important because people think that principles are the values and this is
not the cas right and here on the last column I have
I don't know if there were 18 or 23 we have up to 32 instrumental principles
that help to achieve these values and in every business we need to check what
what are the rights set of principles so this is the beginning this is the road map do you have principles we found a company that had two different set of
principles when they saw that they were doing very well sorry we need to go back to Step Zero because you canot work with
two set of principles we already have well you know eics is about conflict but you have a conflict from the start you
have two sets of principles and basically two different units have developed two different set of princi so we learn a lot about how bad this can be
but we need to go back to square the but now they're very happy because they really saw that that they Happ before uh I have been working on this on
this something that's not published it's my own thinking I'm not a philosopher but trying to find the relation between
things uh I said these are the basically
the six more important instrumental principles and I was able to push the first one in
the new version of the ACM principles so this one this I think this is more most
crucial one I call it legitimacy and competency so you did check that the
system to ethically exist so you did that and second you have the competence
in everything to do it you can decide that first one very simple you have the
technical expertise you have the domain expertise and so on so this is basic and
then the others are if the system exists uh there are already some uh at
least one book that appear last year from mer nman on how to do responsi ey I don't agree completely with everything
he uses trustworthy certification I don't like this two words for different reasons but but I think he's doing a
great work on at least to push this to to the public and the future will be how we can use a i to do
responsive with a bootstrapping yeah so ethical risk assessment I'm sure
you know about this but let me tell you my favorite uh dilemma because I hate the trolley problem I will never find
that problem in my life but this problem we have it today so this is the set of people killed by CS and I don't know
exactly what is the number of people that we will save using surviving CS but I'm convinc we will save a lot of people
mostly men we would say them but they will not play Fast what is the problem
the problem is that the people that will be killed by S driving cars is not the subset of this is that
this and like the woman that died in Arizona if you have a kid running too
fast and the mother never predicted that the kid will run too fast kids will die all people that were too slow will die
and maybe ad so basically we are affecting a vulnerable people and we
need to have a solution so it's okay to say yes just let me use the metaphor 900
900 men and kill 20 woman and kids just
to put the extremes I don't know I don't have an answer but this is a societal problem I need to solve and this is not
Sol okay so then we need to register there are cities in Europe already doing that they're very public I don't know if
now they can be gained I don't know if they have worry about the second question uh there are needs to audit
algorithms most audits are done against the will of companies but last year uh
the the team in North eem publish the first paper done with the will of the company and they agreed to publish the
results before doing the audit and this is a software to hire people and the question
was are we fair with gender according to the recommendations of the US government
this recommendation may become low soon and they found that yes they they were
satisfying the recommendations of the federal government however many people
complain why this is the S problem because if you audit the
algorithm you are legitimizing the use of the algorithm and this algorith was used video games to the sze which people
could be have and many people said that that's P science because there's no scientific proof that using a video game
to show that this is the best engineer in something or whatever and the best n in something
so if we do all this we are legitimizing the algor so we need to be
careful accountability let me go back to the arisona case because I think this is
a very interesting case who is responsible most people would say Uber
right Uber was the responsible if their car they hire the people that did
develop well Uber very fast settled with the family in less
than a week we don't know how much they pay but then the family didn't F them
and then the suddenly the Arizona government learned and you can imagine
from whom that the woman that was as a backup driver was watching a
video because she was bored because AI works until doesn't work and we Cann predict when doesn't
work and then Theon government Co say I will not Sue the woman because she is
also responsible so they they went after the woman of course the woman was Hispanic was receiving minimal salary so
was a vulnerable person and she was at the end basically uh find guilty and had
to be one year at home with this ring in the so at the end the person that was
less responsible suffered the most so accountability is a pending issue yeah
this is a multidisplinary challenge this is not about engineers it's is not about philosophers it's is not about sociologist it's about all
working together but when we want to work together we need to listen to the whole
world not to part of world so we have another problem this a very interesting
uh map from uh placing Canada of the legal and ethical polarism in the world
almost dominated by three things so common law pran law and Muslim
law this is the one the work here uh the Muslim law is more interesting in the
sense that ethics everything is ethics and the law is a subset of Ethics why
for CH because of I guess two lawyers in the US it's not like that when something's legal it's not already part
of the ex which for me is like czy but that's how the history
went uh so message for you the nor should learn from the south for example
I'm not philos but I know a little bit about Ubuntu ubun says I am because we
are and I think wienstein will be will really agree here because he says that
the je thinking has to be in the context of more people I think it's the same
idea said in different ways and there's a very nice uh essay Byam that says the scure wrong
the person is a person through other person and I believe that and with Co we leave that the group should be more
important than any single individual so to
end there are no Virtual Worlds everything is a mirror of us uh internet is a huge amplifier of our good things
and our bad things sadly today the rich profit from Ai and the poor suffering and I have
many examples of that that didn't mention to be fair we need to be aware of our own biases and I have been
working on biases for more than 10 years so I train myself to look for small details even in myself and if you notice
something something please make me aware and also of your eics for
example uh I think this is also with sign the wrong words are shaping out
sentience Consciousness intelligence artificial okay I stop
that uh ha Henderson asked can ever be
ethical and it's not about only humans they shouldn't be ethical because this is a human TR but then David low ask the
obvious question so no sorry says the obvious answer that we sometimes forgot that someone has to say we cannot have
without without a today we have a problem with a right see the state of
the world so the current affairs I'm not worrying I'm not worried
about AI worry about every leader that we
select this is like I said I will not talk about politics but then because otherwise this
will happen this is the tting test okay and they're not laughing
because they destroy us they're laughing because we destroy ourselves and this is the pity and if you have
read Harari although although Harari really doesn't understand how much he Lear works I think he has some valid
points but he's very very negative uh someone said that if you live in silon Valley you have to be optimistic so I'm
a pragmatical realistic optimistic I tried so questions even B question

In this keynote lecture, Baeza-Yates explores how search engines and recommendation systems can unintentionally reinforce social biases and echo chambers. He explains why algorithms are never truly neutral, illustrates real-world impacts of biased ranking, and shares ways researchers and engineers can design fairer, more transparent systems. This talk connects his lifelong work in search algorithms with his more recent leadership in responsible AI.

Thinking further

How does the way we design search and recommendation systems shape what knowledge people find — and what stays hidden?
What responsibilities do engineers and data scientists have to identify and reduce algorithmic bias?
Should governments regulate ranking algorithms the same way they regulate news media? Why or why not?
How can future search technologies balance personalisation with exposure to diverse perspectives?

References and further reading

ACM (2025) ACM Principles for Responsible Algorithmic Systems. Available at: https://www.acm.org/.../final-joint-ai-statement-update.pdf (Accessed: 3 July 2025)

Baeza-Yates, R. A. and Ribeiro-Neto, B. (2011) Modern Information Retrieval: The Concepts and Technology Behind Search. 2nd edn. Boston: Addison-Wesley. ISBN: 9780321416919.

Baeza-Yates, R. A. (2021) Ethics in AI a Challenging Task. Available at: https://www.youtube.com/watch?v=vh1BRBKRwXo (Accessed: 3 July 2025)

dblp (2025) Ricardo Baeza-Yates. Available at: https://dblp.org/pid/b/RABaezaYates.html (Accessed: 3 July 2025)

Institute for Experiential AI (2025) Ricardo Baeza-Yates. Available at: https://ai.northeastern.edu/our-people/ricardo-baeza-yates (Accessed: 13 July 2025)

LilyOfTheWest (2018) Ricardo Baeza-Yates portrait. Available at: https://commons.wikimedia.org/wiki/File:Ricardo_Baeza-Yates_portrait.jpg (Accessed: 31 July 2025)

Northeastern University (2025) Profile: Ricardo Baeza-Yates. Available at: https://www.khoury.northeastern.edu/people/ricardo-baeza-yates/ (Accessed: 3 July 2025)

University of Waterloo (2025) Alumni PhD Directory. Available at: https://uwaterloo.ca (Accessed: 3 July 2025)

My OpenLearn Create Profile

About this material

Diverse Computing Pioneers