153

Assume we have a data frame in Python Pandas that looks like this:

df = pd.DataFrame({'vals': [1, 2, 3, 4], 'ids': [u'aball', u'bball', u'cnut', u'fball']})

Or, in table form:

ids    vals
aball   1
bball   2
cnut    3
fball   4

How do I filter rows which contain the key word "ball?" For example, the output should be:

ids    vals
aball   1
bball   2
fball   4
294
In [3]: df[df['ids'].str.contains("ball")]
Out[3]:
     ids  vals
0  aball     1
1  bball     2
3  fball     4
  improve this answer   
  • 9
    How would you invert this to find all the rows that did not contain the string? – user4896331 Mar 1 '17 at 9:50
  • 43
    @user4896331 - df[~df['ids'].str.contains("ball")]~ negates the condition – Amit Verma Mar 1 '17 at 12:57
  • If it was a specific word, to negate, could you also use: df = df[df.id != "ball"] – Brian Apr 26 '17 at 17:59
  • @Brian - Yes, in the above df you can try df = df[df.ids != "aball"] to see it in action. – Amit Verma Apr 27 '17 at 2:03
  • @Amit: I need to access columns by id instead of name. However trying str gives me an error [AttributeError: 'DataFrame' object has no attribute 'str'] Does new pandas not support it or is it because of number based access? – Sameer Mahajan Oct 23 '17 at 10:01
94
df[df['ids'].str.contains('ball', na = False)] # valid for (at least) pandas version 0.17.1

Step-by-step explanation (from inner to outer):

  • df['ids'] selects the ids column of the data frame (technically, the object df['ids'] is of type pandas.Series)
  • df['ids'].str allows us to apply vectorized string methods (e.g., lowercontains) to the Series
  • df['ids'].str.contains('ball') checks each element of the Series as to whether the element value has the string 'ball' as a substring. The result is a Series of Booleans indicating True or False about the existence of a 'ball' substring.
  • df[df['ids'].str.contains('ball')] applies the Boolean 'mask' to the dataframe and returns a view containing appropriate records.
  • na = False removes NA / NaN values from consideration; otherwise a ValueError may be returned.